GitHub - cognitivecomputations/grpo_code: A fast, local, and secure approach to training LLMs for code with WebAssembly and interpreter-based rewards.

Note

Check out our blog-post for more detail and benchmarks!

Installation

git clone https://github.com/axolotl-ai-cloud/grpo_code.git
cd grpo_code
pip install -e .
pip install axolotl==0.8.0[vllm,flash-attn]

Training

The following environment variables can be used to modify the behaviour of the reward functions:

WASM_FUEL - Controls the amount of fuel (computation resources) allocated to the WASM environment (default: 10000000000)
WASM_PATH - Path to the Python WASM runtime file (default: "./wasm/python-3.12.0.wasm")
TIMEOUT - Maximum execution time in seconds for code evaluation (default: 1)
MAX_WORKERS - Number of parallel workers for multiprocessing reward functions (default: 4)

First, spin up a vLLM instance:

CUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve r1_acecode.yaml

Then, in another terminal, kick off the training process:

CUDA_VISIBLE_DEVICES=0,1 MAX_WORKERS=64 axolotl train r1_acecode.yaml --num-processes 2

This example uses 4 A100 GPUs - adjust CUDA_VISIBLE_DEVICES and MAX_WORKERS, and cfg.batch_size as necessary to match your hardware.

Python WASM Runtime

This project uses Python 3.12.0 compiled to WebAssembly from VMware Labs.

Verify an Existing Download

If you already have the WASM file and want to verify its integrity:

Ensure you have both python-3.12.0.wasm and python-3.12.0.wasm.sha256sum in the wasm directory.
Run the verification command:

Linux/macOS:

sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum

Manual Download

To download the runtime files yourself:

Download the Python WASM runtime:

curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm -o ./wasm/python-3.12.0.wasm

Download the SHA256 checksum file:

curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm.sha256sum -o ./wasm/python-3.12.0.wasm.sha256sum

Verify the download:

sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum

Place both files in your project directory or specify the path in your configuration.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eval_plus		eval_plus
grpo_code		grpo_code
wasm		wasm
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
r1_acecode.yaml		r1_acecode.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Training

Python WASM Runtime

Verify an Existing Download

Manual Download

About

Releases

Packages

Languages

License

cognitivecomputations/grpo_code

Folders and files

Latest commit

History

Repository files navigation

Installation

Training

Python WASM Runtime

Verify an Existing Download

Manual Download

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages