llm-loss-validator

Validator that computes the validation loss for a huggingface-compatible LLM

Environment Setup

We recommand you to use conda to manage the python env for this repo.

conda create -n llm-loss-validator python==3.10.12
conda activate llm-loss-validator
pip install -r requirements.txt

How to run validation script

Automation with GPU

If you wish to continuously receive task assignments, you should use the following command:

cd /src
CUDA_VISIBLE_DEVICES=0 \
bash start.sh \
--hf_token your_hf_token \
--flock_api_key your_flock_api_key \
--task_id your_task_id \
--validation_args_file validation_config.json.example \
--auto_clean_cache False \
--lora_only True

Explanation of Parameters

CUDA_VISIBLE_DEVICES=0: Specifies which GPU to use. 0 indicates the first GPU. Adjust this based on your available GPUs.
--hf_token: Your Hugging Face token, required for accessing certain models. This should token should have write access.
--flock_api_key: Your FLock API key.
--task_id: The ID of the task you want to validate. If you are validating multiple tasks, you can pass a list eg. if you are validating tasks 8 and 9, you can pass --task_id 8,9
--validation_args_file: The path to the validation arguments file.
--auto_clean_cache: A flag to determine whether to automatically clean the model cache.
--lora_only: A flag to indicate whether to validate only repositories with LoRA (Low-Rank Adaptation) weights. True means only LoRA weights will be validated. This is useful for validators with limited network bandwidth, as LoRA weights are significantly smaller (10-500 MiB) compared to full model files (>10 GiB).

Validate only one assignment

With CPU

cd /src
FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--local_test \
--validation_args_file validation_config_cpu.json.example

With GPU

cd /src
CUDA_VISIBLE_DEVICES=0 FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--local_test \
--validation_args_file validation_config.json.example

The --local_test flag is for both validator and training node to test that whether they can successfully run validation for a given model submission and dataset. It won't interact with the Fed Ledger service.

To actually calculate and submit the score for a given task assignment. You should use the following command

CUDA_VISIBLE_DEVICES=0 FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--assignment_id <assignment-id> \
--validation_args_file validation_config.json.example

Optional: Installing FlashAttention

FlashAttention is a fast and memory-efficient attention mechanism that can be beneficial for large models. However, it can be tricky to compile depending on your GPU setup.

Advantages

Memory Efficiency: FlashAttention reduces memory usage significantly, allowing for longer sequence lengths.
Speed: It provides a speedup over standard attention mechanisms, especially on GPUs with high memory bandwidth.

Installation Guide

Ensure CUDA Toolkit is Installed: FlashAttention requires CUDA 11.7 or above.

Install FlashAttention:

pip install flash-attn --no-build-isolation

Alternatively, compile from source:

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention
python setup.py install

Using FlashAttention 2

For models like Qwen with automatically support flash attention, just installing the birary suffices. For other models that support FlashAttention 2, you can enable it by adding attn_implementation="flash_attention_2" in the AutoModelForCausalLM.from_pretrained() method in validate.py

For more detailed instructions and advanced usage, please refer to the FlashAttention GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile-gpu		Dockerfile-gpu
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-loss-validator

Environment Setup

How to run validation script

Automation with GPU

Explanation of Parameters

Validate only one assignment

With CPU

With GPU

Optional: Installing FlashAttention

Advantages

Installation Guide

Using FlashAttention 2

About

Releases 18

Packages

Contributors 9

Languages

License

FLock-io/llm-loss-validator

Folders and files

Latest commit

History

Repository files navigation

llm-loss-validator

Environment Setup

How to run validation script

Automation with GPU

Explanation of Parameters

Validate only one assignment

With CPU

With GPU

Optional: Installing FlashAttention

Advantages

Installation Guide

Using FlashAttention 2

About

Resources

License

Stars

Watchers

Forks

Releases 18

Packages 0

Contributors 9

Languages

Packages