-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broken tinygrad responses #500
Comments
Why is this a "bad" output? tinygrad and MLX are using slightly different models. It's one of the magical things about exo: different models are interoperable. |
in the tinygrad screenshot it hasn't answered what I've asked in the second prompt at all. Try having a conversation with 1B using MLX and then tinygrad, I'm just getting nonsense and/or irrelevant responses after the first, it's like it's not receiving the prompts correctly or something. Didn't realize the models were different, if this is just down to the models being different then obv not an issue, does feel like more than a slight difference though |
The example you gave has context of the previous part of the conversation. The example you gave seems totally reasonable. |
@AlexCheema Yea, this looks like a context bug to me, and makes an argument for spending some time reconciling the different caching methods between these implementations, and fully utilizing the caching for inference over a chat session rather than stacking prompts within the API. This will also fix some bugs that happen during a long session due to passing too large a context into the inference model This kind of change will dovetail well with some other inference engine compatibility generalizations, so it seems like a good thing to do |
Agree, that seems like a good strategy to fix this and probably a bunch of other unknown bugs. |
I'm getting bad responses from tinygrad (except for the first). Running on an M2 Mac Mini, I've hardcoded the inference_engine_name value to use tinygrad instead of MLX. Seeing the same thing happen for 3B too, I haven't tried other models.
Will update if I find more info.
MLX output:
Tinygrad output:
The text was updated successfully, but these errors were encountered: