larger model, worse peformance? #30

WSPeng · 2024-03-26T13:42:51Z

hi, the leader board shows arger model, worse peformance, is it because of the inference time? smaller model have high action frequency. if so, the bench is not very useful.

i think maybe change the game so it can pause, then we can compare models without bias on inference latency.

StanGirard · 2024-03-26T15:00:47Z

The goal here is to evaluate an LLM in realtime. We give them the ability to make 3-5 moves ahead of time. Large LLMs can generate more move but yes they take longer.

The goal is to have that inference latency but we could add an option to remove this with a parameter for some games.

Please feel free to open a PR to put this into place but optionnaly and not by default ;)

taozhiyuai · 2024-03-30T04:13:41Z

in my experience, yes. small model has high token/second, always generate actions. while big model waits for tokens to know how to re-act. @_@

taozhiyuai · 2024-03-30T04:31:42Z

The record show small model can generate more actions with high token/second

0.5b wins 3 rounds!

Player 1 using: ollama:qwen:14b-chat-v1.5-fp16
Player 2 using: ollama:qwen:0.5b-chat-v1.5-fp16

Round 1

🏟️ (0647) (0)Starting game
🏟️ (0647) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-30 12:20:26.448 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Evaluate Opponent', 'Assess Distance for Effective Attacks']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: super attack 2
Player 2 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: medium kick
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: fireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: fireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 2
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump away
Player 1 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low punch
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: low punch
Player 2 move: low punch
2024-03-30 12:21:41.329 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Mid Punch', 'Mid Punch']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
🏟️ (0647) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megapunch
Player 1 move: hurricane
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

—————————

round 2

🏟️ (2b8a) (0)Starting game
🏟️ (2b8a) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump away
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megapunch
Player 1 move: low punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
Player 1 move: jump closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high punch
Player 2 move: high punch
Player 2 move: high punch
Player 2 move: megapunch
Player 2 move: low punch
Player 2 move: low punch
Player 2 move: low kick
🏟️ (2b8a) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: high kick
Player 1 move: megapunch
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

———————

Round 3

🏟️ (b34c) (0)Starting game
🏟️ (b34c) (0)Waiting for fight to start
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 3
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: medium punch
Player 2 move: super attack 2
Player 2 move: high punch
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-30 12:28:29.109 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Move Closer to get into better attacking range', 'Megafireball or Super attack 2 as a powerful offensive option while closing in']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: megapunch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: high punch
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: megafireball
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
Player 2 move: high punch
Player 2 move: jump closer
Player 2 move: jump away
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: high punch
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: fireball
Player 2 move: high kick
Player 2 move: fireball
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megafireball
Player 1 move: medium punch
Player 1 move: fireball
2024-03-30 12:28:58.413 | WARNING | agent.robot:get_moves_from_llm:317 - Many invalid moves: ['Assess the distance to the opponent', 'If close', 'If far', 'Move Clo']
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: high kick
Player 2 move: low kick
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
Player 2 move: low kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: megafireball
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: low kick
Player 2 move: medium kick
Player 2 move: high kick
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: super attack 2
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: jump closer
Player 1 move: megafireball
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: fireball
Player 2 move: megapunch
Player 2 move: hurricane
Player 2 move: megafireball
Player 2 move: super attack 2
Player 2 move: super attack 3
Player 2 move: super attack 4
Player 2 move: low punch
Player 2 move: medium punch
🏟️ (b34c) (0)Round won by P2
(0)Moving to next round
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 1 move: move closer
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Player 2 move: move away
Player 2 move: super attack 3
Player 2 move: low kick
Player 2 move: high kick
Player 2 move: jump closer
Player 2 move: jump away
Player2 ollama:qwen:0.5b-chat-v1.5-fp16 Daddy won!

taozhiyuai · 2024-03-30T12:49:10Z

win rate 44% after 50 rounds

@oulianov

oulianov · 2024-03-30T18:56:33Z

Very interesting results!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

larger model, worse peformance? #30

larger model, worse peformance? #30

WSPeng commented Mar 26, 2024

StanGirard commented Mar 26, 2024

taozhiyuai commented Mar 30, 2024

taozhiyuai commented Mar 30, 2024

taozhiyuai commented Mar 30, 2024 •

edited

Loading

oulianov commented Mar 30, 2024

larger model, worse peformance? #30

larger model, worse peformance? #30

Comments

WSPeng commented Mar 26, 2024

StanGirard commented Mar 26, 2024

taozhiyuai commented Mar 30, 2024

taozhiyuai commented Mar 30, 2024

taozhiyuai commented Mar 30, 2024 • edited Loading

oulianov commented Mar 30, 2024

taozhiyuai commented Mar 30, 2024 •

edited

Loading