GitHub - ma14ch/ma14ch-voice-to-voice-generative-AI-chatbot-persian-language: A fully functional voice-to-voice chatbot designed to facilitate natural conversations in Persian. This project combines Language Modeling (LM Studio), Text-to-Speech (TTS), and Automatic Speech Recognition (ASR) technologies, enabling seamless interaction between users and the AI.

Voice-to-Voice Chatbot for Persian Language

This project is a voice-to-voice chatbot inspired by the capabilities of tools like ChatGPT. It utilizes open-source technologies to deliver a conversational AI experience entirely in the Persian language. By combining Text-to-Speech (TTS), Language Modeling (LM Studio), and Automatic Speech Recognition (ASR), it creates a seamless environment for natural voice communication.

Key Features:

Voice-to-Voice Interaction: Users can speak to the chatbot and receive spoken responses.
Persian Language Support: Focused on enhancing the experience for Persian-speaking users.
Open-Source Language Models: Integrates widely available and adaptable TTS, LM, and ASR tools for efficient performance.
Customizable Components: Built with modularity in mind, making it easy to replace or upgrade specific components like TTS or ASR systems.

This project showcases the power of open-source technology in building accessible, localized AI applications. Perfect for developers interested in natural language processing, voice technology, and Persian language support.

Step 1: Install LMStudio and Set Up a Large Language Model

To power the voice-to-voice chatbot, we rely on LMStudio and its ability to host large language models (LLMs). Follow the steps below to install and set up LMStudio:

Installation of LMStudio

Download and install LMStudio from its official repository or website. LMStudio GitHub
Set up LMStudio following the installation instructions provided in its documentation.

Choosing a Language Model

For this project, we used the CohereForAI/aya-23b language model. You can choose a larger model if your hardware supports it.

Recommended Hardware:
- At least ** NVIDIA RTX 3070 GPU ** (or better for faster inference times).
- The inference time for Aya-23b on our setup (2x3070 GPUs) is less than a second per generation.

Setting Up the Model

Download the model weights from Hugging Face: Aya-23b Model.
Load the model in LMStudio.

Running the LMStudio API

Once the model is set up in LMStudio, start the API server on port 1234 .
Verify that the API server is running correctly by visiting http://localhost:1234.

You now have a functional API endpoint that the chatbot will use to generate responses.

Step 2: Install and Set Up Text-to-Speech (TTS)

After successfully running the Language Model (LLM), the next step is to convert the generated text into speech. For this, we use the Persian TTS Coqui project, which provides a robust and efficient text-to-speech engine for Persian.

Installation Steps

Clone the Persian TTS Coqui repository:

git clone https://github.com/karim23657/Persian-tts-coqui
cd Persian-tts-coqui

Create a Python environment with Python 3.9 (required for compatibility):

python3.9 -m venv tts_env
source tts_env/bin/activate  # For Linux/macOS
tts_env\Scripts\activate     # For Windows

Install the required dependencies:
```
pip install -r requirements.txt
```

Running the TTS System

To test the TTS system, run the main.py file in the repository:
```
python main.py
```
Ensure the TTS system can successfully process input text and generate speech.

Notes

Make sure the Python 3.9 environment is active before running the TTS system.
Check the repository's documentation for advanced configuration options if needed.

Once the TTS is running, it can be used to convert the LLM's text outputs into high-quality Persian speech.

Step 3: Set Up the Application Environment and Run the App

The final step is to set up the main application, which integrates all components (LLM, TTS, and ASR) to create a fully functional voice-to-voice chatbot. This step uses the Hezar Persian Whisper system for converting human speech to text (ASR - Automatic Speech Recognition).

Setting Up the Environment

Create a Python environment with Python 3.11:

python3.11 -m venv app_env
source app_env/bin/activate  # For Linux/macOS
app_env\Scripts\activate     # For Windows

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure all dependencies required by Hezar and the app are installed successfully.

Running the Application

Navigate to the directory where the main.py file is located.
Run the application:
```
python main.py
```
By default, your app will be accessible at http://localhost:8000. You should see a UI similar to the screenshot provided earlier.

About Hezar Persian Whisper

This project utilizes Hezar Persian Whisper, an open-source Persian ASR (Automatic Speech Recognition) library. It converts spoken human input into text, which is then processed by the LLM and TTS systems.

You can find more about Hezar here: Hezar Persian Whisper.
Don’t forget to ⭐ star their repository if you find it helpful!

Notes:

Ensure all components (LLM API, TTS, and ASR) are running correctly before starting the app.
For additional customization, refer to the main.py file and Hezar's documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
templates		templates
text-to-speech		text-to-speech
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-to-Voice Chatbot for Persian Language

Key Features:

Step 1: Install LMStudio and Set Up a Large Language Model

Installation of LMStudio

Choosing a Language Model

Setting Up the Model

Running the LMStudio API

Step 2: Install and Set Up Text-to-Speech (TTS)

Installation Steps

Running the TTS System

Notes

Step 3: Set Up the Application Environment and Run the App

Setting Up the Environment

Running the Application

About Hezar Persian Whisper

Notes:

About

Releases

Packages

Languages

License

ma14ch/ma14ch-voice-to-voice-generative-AI-chatbot-persian-language

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Voice Chatbot for Persian Language

Key Features:

Step 1: Install LMStudio and Set Up a Large Language Model

Installation of LMStudio

Choosing a Language Model

Setting Up the Model

Running the LMStudio API

Step 2: Install and Set Up Text-to-Speech (TTS)

Installation Steps

Running the TTS System

Notes

Step 3: Set Up the Application Environment and Run the App

Setting Up the Environment

Running the Application

About Hezar Persian Whisper

Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages