Veda-Guru

Veda-Guru is a Sanskrit language model specifically designed for the Vedas, including the Rigveda, Samaveda, Yajurveda, and Atharvaveda. The project aims to provide a comprehensive tool for understanding and analyzing Vedic texts, with a focus on audio mode training and handling special symbols in Vedic sentences.

Project Overview

The Veda-Guru project involves the following key steps:

Text Collection and Preprocessing: Downloading and preprocessing texts from the Vedas.
Model Training: Fine-tuning a pre-trained BERT model on the Vedic texts.
Evaluation and Fine-tuning: Assessing the model's performance and making necessary adjustments.
API Development: Creating an API for interacting with the trained model.
Audio Mode Training: Incorporating techniques for decoding audio from different parts of the throat and handling special symbols in Vedic sentences.

Installation

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/kasinadhsarma/Veda-Guru.git
cd Veda-Guru

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

API Endpoints

The Veda-Guru API provides the following endpoints:

POST /predict: Receives an audio file, processes it, and returns a prediction.

Example Usage

To use the API for audio predictions, send a POST request with a .wav audio file to the /predict endpoint. The API will return the predicted label.

Example:

curl -X POST -F "audio_file=@./audio/vedic_chanting_Vedic Chanting ｜ Rudri Path by 21 Brahmins.wav" http://localhost:5000/predict

Error Handling

The API includes error handling for common issues that users might encounter. Here are some examples:

Invalid File Format: If the uploaded file is not a .wav file, the API will return a 400 Bad Request error with a message indicating the invalid file format.
Missing File: If no file is uploaded, the API will return a 400 Bad Request error with a message indicating that the file is missing.
Internal Server Error: If there is an issue processing the file or making a prediction, the API will return a 500 Internal Server Error with a message indicating the problem.

Model Capabilities and Limitations

The Veda-Guru model is designed to handle audio files of Vedic chanting and make predictions based on the trained data. However, there are some limitations to be aware of:

Special Symbols: The model is trained to handle special symbols in Vedic sentences, but its performance may vary depending on the quality and clarity of the audio.
Bias: Efforts have been made to ensure the model is not biased, but users should be aware that the training data's diversity can impact the model's predictions.
Audio Quality: The model performs best with high-quality audio recordings. Poor audio quality may affect the accuracy of the predictions.

Training the Model

To train the model, run the following script:

python train_sanskrit_model.py

Preprocessing

To preprocess the Vedic texts, run the following script:

python preprocess_rigveda.py

Downloading Data

To download the Rigveda hymns, run the following script:

python download_rigveda.py

Audio Mode Training

To train the audio model, run the following script:

python audio_mode_training.py

Handling Special Symbols

The Veda-Guru model is designed to handle special symbols in Vedic sentences. During preprocessing, special symbols are retained and appropriately tokenized to ensure the model can accurately interpret and process them.

Ensuring Model Unbias

Efforts have been made to ensure the Veda-Guru model is not biased. The training data includes a diverse range of examples to minimize bias. Additionally, the model's predictions are regularly evaluated to identify and address any potential biases.

Interacting with the Trained Model

The trained model can be interacted with through the provided API. The POST /predict endpoint allows users to send audio files for prediction. The model is saved in the Keras format (.keras extension) and can be loaded using the following code snippet:

from tensorflow.keras.models import load_model

model = load_model('fine_tuned_model/model.keras')

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idx		.idx
fine_tuned_model		fine_tuned_model
researchbrowser		researchbrowser
DOCUMENTATION.md		DOCUMENTATION.md
PROJECT_DOCUMENTATION.md		PROJECT_DOCUMENTATION.md
README.md		README.md
atharvaveda_book_1.txt		atharvaveda_book_1.txt
atharvaveda_book_2.txt		atharvaveda_book_2.txt
audio_analysis.py		audio_analysis.py
audio_mode_training.py		audio_mode_training.py
audio_processing.py		audio_processing.py
download_atharvaveda.py		download_atharvaveda.py
download_rigveda.py		download_rigveda.py
download_rigveda_book6.py		download_rigveda_book6.py
download_yajurveda.py		download_yajurveda.py
preprocess_atharvaveda.py		preprocess_atharvaveda.py
preprocess_rigveda.py		preprocess_rigveda.py
preprocess_rigveda_book6.py		preprocess_rigveda_book6.py
preprocess_samaveda.py		preprocess_samaveda.py
preprocess_yajurveda.py		preprocess_yajurveda.py
rigveda_hymn_1.txt		rigveda_hymn_1.txt
rigveda_hymn_book6_hymn_1.txt		rigveda_hymn_book6_hymn_1.txt
samaveda_hymns.txt		samaveda_hymns.txt
sanskrit_api.py		sanskrit_api.py
train_sanskrit_model.py		train_sanskrit_model.py
yajurveda_kanda_1.txt		yajurveda_kanda_1.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Veda-Guru

Project Overview

Installation

Usage

API Endpoints

Example Usage

Error Handling

Model Capabilities and Limitations

Training the Model

Preprocessing

Downloading Data

Audio Mode Training

Handling Special Symbols

Ensuring Model Unbias

Interacting with the Trained Model

Contributing

License

About

Releases

Packages

Languages

Exploit0xfffff/Veda-Guru

Folders and files

Latest commit

History

Repository files navigation

Veda-Guru

Project Overview

Installation

Usage

API Endpoints

Example Usage

Error Handling

Model Capabilities and Limitations

Training the Model

Preprocessing

Downloading Data

Audio Mode Training

Handling Special Symbols

Ensuring Model Unbias

Interacting with the Trained Model

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages