Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Sync eval changes in OLMo/ladder-1xC to here (#122)
This adds scaling law eval sets as in-loop. Testing of metric: https://legacy.beaker.org/ex/01JF4NNA49YJGC55P3Q5FPEAPA/tasks/01JF4NNA4HM9Q90BQNQ99XSJ9Y/job/01JF4P6XRZVTDXWC3J2559R0K5 ``` 2024-12-15T08:21:11.301073649Z 2024-12-15 08:21:11.300 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:68 INFO Running downstream evals... 2024-12-15T08:21:14.829675802Z 2024-12-15 08:21:14.829 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=5/75] 2024-12-15T08:21:14.940428448Z 2024-12-15 08:21:14.940 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=10/75] 2024-12-15T08:21:15.049435484Z 2024-12-15 08:21:15.049 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=15/75] 2024-12-15T08:21:15.157967512Z 2024-12-15 08:21:15.157 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=20/75] 2024-12-15T08:21:15.267427337Z 2024-12-15 08:21:15.267 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=25/75] 2024-12-15T08:21:15.375047960Z 2024-12-15 08:21:15.374 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=30/75] 2024-12-15T08:21:15.483513780Z 2024-12-15 08:21:15.483 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=35/75] 2024-12-15T08:21:15.594538312Z 2024-12-15 08:21:15.594 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=40/75] 2024-12-15T08:21:15.702422918Z 2024-12-15 08:21:15.702 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=45/75] 2024-12-15T08:21:15.811504739Z 2024-12-15 08:21:15.811 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=50/75] 2024-12-15T08:21:15.919817749Z 2024-12-15 08:21:15.919 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=55/75] 2024-12-15T08:21:16.026753004Z 2024-12-15 08:21:16.026 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=60/75] 2024-12-15T08:21:16.133501599Z 2024-12-15 08:21:16.133 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=65/75] 2024-12-15T08:21:16.240990822Z 2024-12-15 08:21:16.240 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=70/75] 2024-12-15T08:21:16.348730485Z 2024-12-15 08:21:16.348 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:111 INFO [eval=downstream,step=75/75] 2024-12-15T08:21:17.056109188Z 2024-12-15 08:21:17.055 d22e6d646321:0 olmo_core.train.callbacks.evaluator_callback:104 INFO Eval metrics: 2024-12-15T08:21:17.056129669Z arc_challenge_val_rc_5shot (len_norm)=0.2441 2024-12-15T08:21:17.056131828Z arc_challenge_val_rc_5shot (ce_loss)=2.472 2024-12-15T08:21:17.056133529Z arc_challenge_val_rc_5shot (bpb)=3.565 2024-12-15T08:21:17.056134965Z arc_challenge_val_rc_5shot (soft)=0.2539 2024-12-15T08:21:17.056136416Z arc_challenge_val_rc_5shot (soft_log)=-1.46E+00 ``` To see things in Comet: https://www.comet.com/ai2/olmo-core-1b/7a3614872861484dbc7ad651ad5c9e35
- Loading branch information