Releases · DCGM/lm-evaluation-harness · GitHub

07 Oct 12:21

MFajcik

v0.4 Preview Release Pre-release

Pre-release

Fixes bug causing binary F1 computation.
Fixes bug with double include in yaml inheritance.
Added clarification for exception when using language modeling tasks with smart truncation.
Added unit tests.

Assets 2

23 Sep 11:02

MFajcik

Fixed issue with subjectivity task Pre-release

Pre-release

Unfortunately, the subjectivity task was not properly configured. The labels were assigned the other way around, when compared. This was fixed in commit a85cf.
Reevaluation of experiments is not necessary, it is enough, if you flip the llhs in the logfiles, and recompute your metrics.

Assets 2

19 Sep 11:06

MFajcik

v0.2 Pre-release

Pre-release

Fixes Belebele prompts,
Bad Metric Assignment for certain tasks (auroc vs accuracy),

Assets 2

04 Sep 12:33

MFajcik

v0.1 Preview Release Pre-release

Pre-release

This is the code we used with first experiments.

Assets 2