A Gradio interface for Exploring, Creating, Editing, Mixing, Importing and Exporting Speakers for the Coqui XTTS model.
- Clone the repo
git clone https://github.com/ichabodcole/xtts-speaker-forge.git
- CD into the newly created directory
cd xtts-speaker-forge
- Create virtual python environment via venv, conda, etc (optional, but highly recommended)
- Install the packages
pip install -q -r ./requirements.txt
- Run the app
bash run.sh
- Click open the gradio app at localhost:5003
Not yet supported or tested, though it may work under WSL :D
If you encounter package install errors during step #4 of "Run locally" try the alternative installation below (you only need to try this if the app will not actually run).
!pip install --use-deprecated=legacy-resolver -q -r ./requirements.txt
!pip install -q typing_extensions==4.8 numpy==1.26.2
- Add language selection support for speaker audio generation
- Add "Edit" mode to allow renaming and deleting speakers, and adding additional speaker meta data.
- Audio file caching to quickly playback recently generated audio files.
- Allow editing model file paths in Gradio interface (if feasible)
- Allow direct upload / import of speaker files
- Explore adding different voice mixing methods.
- Ability to filter speakers based on metadata created in Edit view in Explore view.