-
-
Notifications
You must be signed in to change notification settings - Fork 174
Manually setup OneTrainer in Runpod
Important
Native built-in cloud training was released in January 2025 that is significantly easier and more integrated to use (and supported), it also adds valuable options. We recommend using the native option.
With Runpod you set up a virtual instance on the cloud. One advantage is to benefit of a large GPU at a reasonable cost, second as it's running on the cloud you can do anyother activity on your local machine during a training.
- Deploy a pod.
- Install OneTrainer
- Copy your dataset, model and config. Edit the config.
- Call the
byobu
terminal (you can use whatever shell you prefer) and start your training.
First Create an account on runpod and top up your account using money. This is necessary to train as its a paid rental service.
Then deploy a pod.
Select a GPU. They are charged hourly when used, cheapest prices are with the NVIDIA previous gen.
Choose a template. Here I'm using "RunPod VS Code Server" but others can work. Note there is a template with One Trainer already installed, search for "dxqbyd/onetrainer-cli:0.7", just think to update OT when using it, this template is updated only for major OT updates.
Review and edit the template. Check on the volume space. OT takes 10GB, then you need to think at your dataset(s), cache and workspace. If you plan to use models from Hugging Face that require a token (SD3, Flux), you can set your HF_TOKEN as an environment variable.

Select a pricing plan and deploy the pod.
Start the pod with the blue arrow top right.
Before connecting to it, open its parameter again (edit pod), you'll find the password for Jupyter Lab.
Connect to the pod and choose "Connect to HTTP Service (Port 8888).
You'll be asked for the Jupyter password and Jupiter Lab will open.
Open the terminal and install OneTrainer and byobu:
git clone https://github.com/Nerogar/OneTrainer.git
apt update
apt install ffmpeg byobu tmux aria2
cd OneTrainer/
./install.sh
Later you can update OT with ./update.sh
in OneTrainer directory.
Now open OneTrainer on your local computer and from the UI export your training configuration with the export button bottom right. Save it locally.
Back to Jupyter move under the root folder your config, base model and dataset(s). If you're reading the base model from HuggingFace, you don't need to upload it.
Edit your config to reflect your dataset (and model) location, save it.
Make sure to start with the root folder /workspace/ or OT won't find it.
Finally start the training in the OneTrainer Directory:
byobu
source venv/bin/activate
python scripts/train.py --config-path "<path_to_config>"
Ex: python scripts/train.py --config-path "/workspace/config.json"
You can stop the training with Ctrl C
when in the byobu, it has the same effect as stopping a training from the UI: create a backup and save the model.
Et voila !