Skip to content

Speech-to-text demo solution for transcribing audio files and analyzing sentiment using Yandex SpeechKit, RemBERT trained on KazSAnDRA dataset by ISSAI, and Streamlit.

Notifications You must be signed in to change notification settings

tvran/Forte-stt

Repository files navigation

πŸŽ™ STTSSTTSSTTSSTTS: Speech-to-Text & Sentiment Analysis

πŸš€ STTSSTTSSTTSSTTSS is a tool for transcribing audio files and analyzing sentiment using Yandex SpeechKit, RemBERT trained on KazSAnDRA dataset created by ISSAI, and Streamlit.

πŸ“Œ Features

  • βœ… Transcribe audio files (WAV, MP3, FLAC, etc.)
  • βœ… Sentiment analysis (positive, neutral, negative)
  • βœ… Supports Kazakh and Russian languages
  • βœ… User-friendly UI with Streamlit
  • βœ… Leverages Yandex Cloud API as a submodule

πŸ“₯ Installation & Setup

πŸ”§ Prerequisites

  • Python 3.12 (Ensure Python 3.12 is installed)
  • FFmpeg (Required for audio processing)
    sudo apt install ffmpeg  # Linux
    brew install ffmpeg      # macOS
  • Git (For cloning the repository and initializing submodules)

πŸš€ Clone the Repository

Since this project uses Yandex Cloud API as a submodule, use:

git clone --recurse-submodules https://github.com/tvran/Forte-stt.git
cd Forte-stt

If you have already cloned the repo without submodules, initialize it manually:

git submodule update --init --recursive

πŸ›  Generate gRPC Client Interface

To use Yandex SpeechKit, you need to generate the gRPC client interface.

1️⃣ Install grpcio-tools

pip install grpcio-tools

2️⃣ Run the following command inside the Forte-STT directory:

python3 -m grpc_tools.protoc -I cloudapi -I cloudapi/third_party/googleapis \
  --python_out=output \
  --grpc_python_out=output \
  cloudapi/google/api/http.proto \
  cloudapi/google/api/annotations.proto \
  cloudapi/yandex/cloud/api/operation.proto \
  cloudapi/google/rpc/status.proto \
  cloudapi/yandex/cloud/operation/operation.proto \
  cloudapi/yandex/cloud/validation.proto \
  cloudapi/yandex/cloud/ai/stt/v3/stt_service.proto \
  cloudapi/yandex/cloud/ai/stt/v3/stt.proto

This will generate necessary Python files in output/:

  • stt_pb2.py
  • stt_pb2_grpc.py
  • stt_service_pb2.py
  • stt_service_pb2_grpc.py

πŸ“¦ Install Dependencies

Activate a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Then, install the required dependencies:

pip install -r requirements.txt

πŸš€ Running the Project

Start the Streamlit UI

streamlit run main.py

If deploying on a server, use:

streamlit run main.py --server.port 8501

πŸš€ Setting up API Keys

To use Yandex SpeechKit and Hugging Face Transformers, you need to store API keys securely.

1️⃣ Create a .env file in the root of the project

2️⃣ Add your API keys inside .env:

# Yandex SpeechKit API Key
YANDEX_API_KEY=your_yandex_api_key_here

# Yandex Object Storage Keys
ACCESS_KEY=your_access_key_here
SECRET_KEY=your_secret_key_here

# Hugging Face Token (for sentiment analysis)
HF_TOKEN=your_huggingface_token_here

πŸ“‚ Project Structure

Forte-stt/
│── output/                  # Audio processing & recognition logic
β”‚   β”œβ”€β”€ adjust_audio.py       # Converts audio to 16kHz PCM
β”‚   β”œβ”€β”€ load_file.py          # Uploads to Yandex Cloud Storage
β”‚   β”œβ”€β”€ recognize.py          # Handles Yandex SpeechKit transcription
β”‚   β”œβ”€β”€ stt_pb2.py            # gRPC-generated file
β”‚   β”œβ”€β”€ stt_service_pb2.py    # gRPC-generated file
│── cloudapi/                 # Yandex Cloud API (submodule)
│── main.py                   # Streamlit UI
│── requirements.txt          # Python dependencies
│── README.md                 # Documentation

πŸ›  Technologies Used

  • Python 3.12
  • Streamlit – UI for audio processing
  • Yandex SpeechKit – Speech-to-Text processing
  • Hugging Face Transformers – Sentiment analysis
  • FFmpeg – Audio conversion
  • gRPC – Communication with Yandex API

πŸ“ž Contact

πŸ‘€ Turan Nurgozhin
πŸ“§ Email: turannurgozhin@gmail.com
πŸ”— LinkedIn: https://www.linkedin.com/in/turan-nurgozhin-81931428b/
πŸš€ GitHub: github.com/tvran

About

Speech-to-text demo solution for transcribing audio files and analyzing sentiment using Yandex SpeechKit, RemBERT trained on KazSAnDRA dataset by ISSAI, and Streamlit.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages