Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken tinygrad responses #500

Open
roryclear opened this issue Nov 25, 2024 · 6 comments
Open

broken tinygrad responses #500

roryclear opened this issue Nov 25, 2024 · 6 comments

Comments

@roryclear
Copy link
Contributor

roryclear commented Nov 25, 2024

I'm getting bad responses from tinygrad (except for the first). Running on an M2 Mac Mini, I've hardcoded the inference_engine_name value to use tinygrad instead of MLX. Seeing the same thing happen for 3B too, I haven't tried other models.
Will update if I find more info.

MLX output:
image

Tinygrad output:
image

@AlexCheema
Copy link
Contributor

Why is this a "bad" output?

tinygrad and MLX are using slightly different models. It's one of the magical things about exo: different models are interoperable.

@roryclear
Copy link
Contributor Author

in the tinygrad screenshot it hasn't answered what I've asked in the second prompt at all. Try having a conversation with 1B using MLX and then tinygrad, I'm just getting nonsense and/or irrelevant responses after the first, it's like it's not receiving the prompts correctly or something.

Didn't realize the models were different, if this is just down to the models being different then obv not an issue, does feel like more than a slight difference though

@AlexCheema
Copy link
Contributor

AlexCheema commented Nov 25, 2024

in the tinygrad screenshot it hasn't answered what I've asked in the second prompt at all. Try having a conversation with 1B using MLX and then tinygrad, I'm just getting nonsense and/or irrelevant responses after the first, it's like it's not receiving the prompts correctly or something.

Didn't realize the models were different, if this is just down to the models being different then obv not an issue, does feel like more than a slight difference though

The example you gave has context of the previous part of the conversation.
Can you give an example where it doesn't have context of the previous part?

The example you gave seems totally reasonable.

@roryclear
Copy link
Contributor Author

ah this might explain more
MLX:
image
Tinygrad:
image

@blindcrone
Copy link
Contributor

blindcrone commented Nov 25, 2024

@AlexCheema Yea, this looks like a context bug to me, and makes an argument for spending some time reconciling the different caching methods between these implementations, and fully utilizing the caching for inference over a chat session rather than stacking prompts within the API. This will also fix some bugs that happen during a long session due to passing too large a context into the inference model

This kind of change will dovetail well with some other inference engine compatibility generalizations, so it seems like a good thing to do

@AlexCheema
Copy link
Contributor

@AlexCheema Yea, this looks like a context bug to me, and makes an argument for spending some time reconciling the different caching methods between these implementations, and fully utilizing the caching for inference over a chat session rather than stacking prompts within the API. This will also fix some bugs that happen during a long session due to passing too large a context into the inference model

This kind of change will dovetail well with some other inference engine compatibility generalizations, so it seems like a good thing to do

Agree, that seems like a good strategy to fix this and probably a bunch of other unknown bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants