BLESS/exp_configs at main · ZurichNLP/BLESS

History

Name		Name	Last commit message	Last commit date
parent directory ..
cluster		cluster
rtx		rtx
README.md		README.md

README.md

Experiment Configs

This directory contains predefined experiment settings for various models. The purpose of these config file is to facilitate inference run experiments, specifically on a slurm cluster.

We specify model-specific arguments such as GPU memory requirements in the appropriate JSON file, which can be passes to slurm_scripts.submit_inference as a positional argument in position 1.

We then specify experiment specific arguments as usual, e.g.:

python -m run exp_configs/rtx/bloom-560m.json \
    --prompt_json "prompts/p0.json" \
    --examples "data/asset/dataset/asset.valid.jsonl" \
    --input_file "data/asset/dataset/asset.test.orig" \
    --n_refs 1 --few_shot_n 3 --seed 489

Where rtx is a directory containing experiment configs for a server with 8 RTX 3090 GPUs (24GB). It's assumed that these GPUs support parallelisation.

The directory cluster contains config settings for inference experiments on a SLURM cluster with 8x A100s (80GB) and T4s (16GB) GPUs. Here, we always use A100s for parallelisation.

To adjust for your own GPU configuration, simply create a new folder and edit the appropriate config file.

Observations

Below are some observations from running inference with LLMs:

Beam search uses significantly more GPU memory compared to sampling-based decoding methods with num_beams=1

The model footprint (loaded as 8bit-int) is roughly 1GB per 1B parameters. To run batched inference, you need to account for sufficient headroom.
Inference with GPT-NeoX is very slow! (~ 1.5 hours on ASSET test set (359))

The following table contains some statistics observed during test inference runs with the following params (unless otherwise specified): batch_size=8, max_new_tokens=100, num_beams=1, num_return_sequences=1, do_sample=True, top_p=0.9

model	Footprint (8bit-int)	Inference time	Inference GPU mem	# GPUS	rtx
bigscience/bloom-560m	0.78 GB	~10 secs (bs=4)	~6GB	1 T4-16GB	✅
bigscience/bloomz-560m	0.78 GB	~10 secs (bs=4)	~6GB	1 T4-16GB	✅
bigscience/bloom-1b1	1.35 GB	~10 ses (bs=4)	~8GB	1 T4-16GB	✅
bigscience/bloomz-1b1	1.35 GB	~10 ses (bs=4)	~8GB	1 T4-16GB	✅
bigscience/bloom-3b	3.39 GB			1 T4-16GB	✅
bigscience/bloomz-3b1	3.39 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
bigscience/bloom-7b1	7.54 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
bigscience/bloom	167.5 GB	~45 secs (bs=8)		4 A100-80GB
bigscience/bloomz	167.5 GB	~45 secs (bs=8)		4 A100-80GB
facebook/opt-1.3b	1.32 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
facebook/opt-iml-max-1.3b	1.32 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
facebook/opt-6.7b	6.40 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
facebook/opt-13b	12.22 GB			1 RTX 3090-24GB	✅
facebook/opt-30b	28.26 GB	~27 secs (bs=8)	~60GB	4 RTX 3090-24GB / 1 A100-80GB	✅
facebook/opt-iml-max-30b	28.26 GB	~27 secs (bs=8)	~60GB	4 RTX 3090-24GB / 1 A100-80GB	✅
facebook/opt-66b	61.65 GB	~40 secs (bs=8)	~150GB	2 A100-80GB	✅
facebook/llama-7B	6.58 GB	~6 secs (bs=8)		1 RTX 3090-24GB / 1 T4-16GB	✅
facebook/llama-13b	12.5 GB			1 RTX 3090-24GB / 1 A100-80GB	✅
facebook/llama-30b	30.81 GB	~116 secs (bs=8)		4 RTX 3090-24GB / 1 A100-80GB	✅
facebook/llama-65b	61.45 GB			7 RTX 3090-24GB / 1 A100-80GB	✅
EleutherAI/gpt-j-6b	6.13 GB	~16 secs (bs=8)		1 RTX 3090-24GB / 1 T4-16GB	✅
EleutherAI/gpt-neox-20b	61.45 GB			3 RTX 3090-24GB / 1 A100-80GB	✅

Enc-Dec

ul2 (20b)	30.24 GB	~30 secs (bs=8)		2 RTX 3090-24GB / 1 A100-80GB	✅
flan-ul2 (20b)	30.24 GB			2 RTX 3090-24GB / 1 A100-80GB	✅
t0_3b	4.18 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
t0 (11b)	16.24 GB			1 RTX 3090-24GB / 1 A100-80GB	✅
t0pp (11b)	16.24 GB			1 RTX 3090-24GB / 1 A100-80GB	✅
t5-small (77m)	0.12 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
t5-base (250m)	0.38 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
t5-large (780m)	1.17 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
t5-xl (3b)	4.18 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
t5-xxl (11b)	16.24 GB			1 RTX 3090-24GB / 1 A100-80GB	✅
flan-t5-small (77m)	0.12 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
flan-t5-base (250m)	0.38 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
flan-t5-large (780m)	1.17 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
flan-t5-xl (3b)	4.18 GB			1 RTX 3090-24GB / 1 T4-16GB	✅
flan-t5-xxl (11b)	16.24 GB			1 RTX 3090-24GB / 1 A100-80GB	✅

Notes

T0* models fail to run with protobuf==4.22.1. Downgraded to protobuf==3.20.0 to fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp_configs

exp_configs

README.md

Experiment Configs

Observations

Notes

Files

exp_configs

Directory actions

More options

Directory actions

More options

Latest commit

History

exp_configs

Folders and files

parent directory

README.md

Experiment Configs

Observations

Notes