LIVE-CHAT is a voice-based conversational assistant application that uses Speech-to-Text (STT), Large Language Models (LLMs) and Text-to-Speech (TTS) to chat in your terminal. It's designed to simulate a live conversation with short, conversational responses. Now with enhanced TTS speaker selection and Language Model (LLM) selection features.
Example Video 👆
- Text-to-Speech (TTS) support with multiple providers: Microsoft Edge TTS, Deepgram.com, Coqui XTTSv2 (Offline). Now includes the ability to select your preferred TTS speaker.
- Language model processing for conversational responses. Now includes the ability to select your preferred Language Model (LLM), with support with multiple providers: Groq, OpenAI API, Ollama (Offline).
- Speech-to-Text (STT) support with multiple providers: Deepgram.com, Whisper (Offline). You can put audio files in
/voices
for custom cloning with Coqui. - Enhanced user customization options for a more personalized experience.
- Clone the repository and
cd live-chat
- Create a python environment
- Conda:
conda create -n live-chat python=3.11
and activate withconda activate live-chat
- Python Virtual Environment:
python -m venv venv
and activate withsource venv/bin/activate
(Linux) orvenv\Scripts\activate
(Windows)
- Conda:
- Install torch as per your hardware
- Install the required Python packages by running
pip install -r requirements.txt
- Set up your environment variables in a
.env
file. You'll need to provide your API keys for the TTS and STT services. You can also use the offline modes without any API keys, however you will have to install and configure Ollama - Install ffmpeg and ensure you can run it in your command line
- Run the application with
python app.py
When you run the application, you'll be prompted to enter your preferred TTS and STT providers. You can now also select your preferred TTS speaker and Language Model (LLM). After that, the application will start a conversation. You can stop the conversation by saying "goodbye".
The fastest combination of tools that I have found is STT using Deepgram, LLM with Groq, and TTS with Deepgram.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the terms of the MIT license.