Skip to content
/ smolR1 Public

reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs

License

Notifications You must be signed in to change notification settings

rasdani/smolR1

Repository files navigation

smolR1

reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs

Training Plots

Setup

Create a virtual environment and install dependencies with

pip install -r requirements.txt

For a two GPU setup, start vLLM first.

CUDA_VISIBLE_DEVICES=1 trl vllm-serve --model Qwen/Qwen2.5-0.5B

Then run training with

accelerate launch --config_file configs/deepspeed/zero3.yaml --num_processes 1 train.py

Evaluation

Follow instructions in eval/README.md.

Acknowledegments

simpleRL-Zoo

Qwen2.5 Math Evaluation

About

reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published