π STTSSTTSSTTSSTTSS is a tool for transcribing audio files and analyzing sentiment using Yandex SpeechKit, RemBERT trained on KazSAnDRA dataset created by ISSAI, and Streamlit.
- β Transcribe audio files (WAV, MP3, FLAC, etc.)
- β Sentiment analysis (positive, neutral, negative)
- β Supports Kazakh and Russian languages
- β User-friendly UI with Streamlit
- β Leverages Yandex Cloud API as a submodule
- Python 3.12 (Ensure Python 3.12 is installed)
- FFmpeg (Required for audio processing)
sudo apt install ffmpeg # Linux brew install ffmpeg # macOS
- Git (For cloning the repository and initializing submodules)
Since this project uses Yandex Cloud API as a submodule, use:
git clone --recurse-submodules https://github.com/tvran/Forte-stt.git
cd Forte-stt
If you have already cloned the repo without submodules, initialize it manually:
git submodule update --init --recursive
To use Yandex SpeechKit, you need to generate the gRPC client interface.
pip install grpcio-tools
python3 -m grpc_tools.protoc -I cloudapi -I cloudapi/third_party/googleapis \
--python_out=output \
--grpc_python_out=output \
cloudapi/google/api/http.proto \
cloudapi/google/api/annotations.proto \
cloudapi/yandex/cloud/api/operation.proto \
cloudapi/google/rpc/status.proto \
cloudapi/yandex/cloud/operation/operation.proto \
cloudapi/yandex/cloud/validation.proto \
cloudapi/yandex/cloud/ai/stt/v3/stt_service.proto \
cloudapi/yandex/cloud/ai/stt/v3/stt.proto
This will generate necessary Python files in output/
:
stt_pb2.py
stt_pb2_grpc.py
stt_service_pb2.py
stt_service_pb2_grpc.py
Activate a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Then, install the required dependencies:
pip install -r requirements.txt
streamlit run main.py
If deploying on a server, use:
streamlit run main.py --server.port 8501
1οΈβ£ Create a .env file in the root of the project
2οΈβ£ Add your API keys inside .env:
# Yandex SpeechKit API Key
YANDEX_API_KEY=your_yandex_api_key_here
# Yandex Object Storage Keys
ACCESS_KEY=your_access_key_here
SECRET_KEY=your_secret_key_here
# Hugging Face Token (for sentiment analysis)
HF_TOKEN=your_huggingface_token_here
Forte-stt/
βββ output/ # Audio processing & recognition logic
β βββ adjust_audio.py # Converts audio to 16kHz PCM
β βββ load_file.py # Uploads to Yandex Cloud Storage
β βββ recognize.py # Handles Yandex SpeechKit transcription
β βββ stt_pb2.py # gRPC-generated file
β βββ stt_service_pb2.py # gRPC-generated file
βββ cloudapi/ # Yandex Cloud API (submodule)
βββ main.py # Streamlit UI
βββ requirements.txt # Python dependencies
βββ README.md # Documentation
- Python 3.12
- Streamlit β UI for audio processing
- Yandex SpeechKit β Speech-to-Text processing
- Hugging Face Transformers β Sentiment analysis
- FFmpeg β Audio conversion
- gRPC β Communication with Yandex API
π€ Turan Nurgozhin
π§ Email: turannurgozhin@gmail.com
π LinkedIn: https://www.linkedin.com/in/turan-nurgozhin-81931428b/
π GitHub: github.com/tvran