Skip to content

Releases: DCGM/lm-evaluation-harness

v0.4 Preview Release

07 Oct 12:21
Compare
Choose a tag to compare
v0.4 Preview Release Pre-release
Pre-release
  • Fixes bug causing binary F1 computation.
  • Fixes bug with double include in yaml inheritance.
  • Added clarification for exception when using language modeling tasks with smart truncation.
  • Added unit tests.

Fixed issue with subjectivity task

23 Sep 11:02
Compare
Choose a tag to compare
Pre-release
  • Unfortunately, the subjectivity task was not properly configured. The labels were assigned the other way around, when compared. This was fixed in commit a85cf.
  • Reevaluation of experiments is not necessary, it is enough, if you flip the llhs in the logfiles, and recompute your metrics.

v0.2

19 Sep 11:06
Compare
Choose a tag to compare
v0.2 Pre-release
Pre-release
  • Fixes Belebele prompts,
  • Bad Metric Assignment for certain tasks (auroc vs accuracy),

v0.1 Preview Release

04 Sep 12:33
Compare
Choose a tag to compare
v0.1 Preview Release Pre-release
Pre-release

This is the code we used with first experiments.