-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
larger model, worse peformance? #30
Comments
The goal here is to evaluate an LLM in realtime. We give them the ability to make 3-5 moves ahead of time. Large LLMs can generate more move but yes they take longer. The goal is to have that inference latency but we could add an option to remove this with a parameter for some games. Please feel free to open a PR to put this into place but optionnaly and not by default ;) |
in my experience, yes. small model has high token/second, always generate actions. while big model waits for tokens to know how to re-act. @_@ |
The record show small model can generate more actions with high token/second 0.5b wins 3 rounds! Player 1 using: ollama:qwen:14b-chat-v1.5-fp16 Round 1 🏟️ (0647) (0)Starting game ————————— round 2 🏟️ (2b8a) (0)Starting game ——————— Round 3 🏟️ (b34c) (0)Starting game |
win rate 44% after 50 rounds |
Very interesting results! |
hi, the leader board shows arger model, worse peformance, is it because of the inference time? smaller model have high action frequency. if so, the bench is not very useful.
i think maybe change the game so it can pause, then we can compare models without bias on inference latency.
The text was updated successfully, but these errors were encountered: