GitHub

dharma: build your own tiny benchmark datasets

use dharma to craft small or large benchmarking datasets that can be used during training or for fast evals. these serve as good indicators on the benchmarks you care about. make sure to craft a benchmark dataset appropriate for your use cases. more benchmarks and features are in the works to give you even more control over your bench datasets. dharma's core value is the idea of 'eval through time' during a training run. it sheds light to on your model's performance as it processes and is optimized on your training data. this can be useful to train more powerful models that do exactly what you intend them to. of course, MCQ based benches do not inform us much on performance beyond this format, therefore dharma will expand to include non MCQ based benches as well. stay tuned.

Quickstart

!pip install git+https://github.com/pharaouk/dharma

#SETUP config.yml file

#IN YOUR SCRIPT

import dharma
dharma.run_dharma('config.yml')

or

Clone and Setup:

git clone https://github.com/pharaouk/dharma.git
pip install -r requirements.txt

Configs:

output: #(string) dataset name, leave blank to use default

hf_namespace: #(string)  hf username/namespace
hf_upload: false  #(bool) hf username/namespace
hf_private: false #(bool) hf private? T/F

prompt_format: "Question: {questions}. {options} Answer:"  #(string) prompt format to use for the eval datasets, not yet customizable

dataset_size: 2000  #(int) total target dataset size

data_seed: 42  #(int) dataset seed

force_dist: true  #(bool) force even distribution for answers (i.e. A-25 B-25 C-25 D-25)

benchmarks: #this determines which benchmarks and counts/distirbutions for the target dataset. enter 0 if you don't want that dataset included.

  mmlu: 
    count: 1
  arc_c:
    count: 1
  arc_e:
    count: 1
  agieval:
    count: 1
  boolq:
    count: 1
  obqa:
    count: 1
  truthfulqa:
    count: 1
  winogrande:
    count: 1

Run:

python dharma/dharma.py

or

python dharma/dharma.py --config <CONFIG_PATH>

How is Dharma used? Example dharma-1 dataset: https://huggingface.co/datasets/pharaouk/dharma-1 Example axolotl implementation: https://github.com/OpenAccess-AI-Collective/axolotl/blob/638c2dafb54f1c7c61a5f7ad40f8cf6965bec896/src/axolotl/core/trainer_builder.py#L152

#On Axolotl (in config.yml for your training run)
do_bench_eval: true
bench_dataset: <LINK_TO_JSON> (default="pharaouk/dharma-1/dharma_1_mini.json")

Example wandb:

TODOS

bigbench compatibility. [in progress] (currently not optimal)
Custom prompt formats (to replace standard one we've set)
standardize dataset cleaning funcs (add sim search and subject based segmentation)
Add a testing/eval script with local llm w local lb
Upload cleaned and corrected copies of all benchmrk datasets to HF
Fix uneven distributions
CLIx updates (tqdm + cleanup)
pip package
New benchmarks, non MCQ
HF Compatible Custom Callback library with customization options
better selection algo for the benchmarks
Randomize answers options (could be useful to evaluate/minimize bias in model)
More languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
dharma		dharma
examples		examples
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dharma: build your own tiny benchmark datasets

Quickstart

About

Releases

Packages

Contributors 2

Languages

pharaouk/dharma

Folders and files

Latest commit

History

Repository files navigation

dharma: build your own tiny benchmark datasets

Quickstart

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages