Gazelle Trainer

This is an open source trainer for Gazelle: A Joint Speech Language Model which was made by @hingeloss.

We walk through our own trainer and some data collation steps here.

You can find the trainer in the training folder along with a notebook walking you through the process.

This is a preliminary release for our implementation of the training codebase for Gazelle - does not include data-preprocessing
We are super open to collaborators and if you think you could use more code/data from us or would like to contribute to the repo - please reach out at [email protected]
We are training our model and it will soon be open-source!

Data formatting

The Data Collator takes in a HuggingFace Dataset which has to have columns "input"(text input),"audio" (list of numbers) and "text" (labels)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bin		bin
examples		examples
gazelle		gazelle
trainer		trainer
.gitignore		.gitignore
LICENSE		LICENSE
logo.webp		logo.webp
pyproject.toml		pyproject.toml
readme.md		readme.md