This repository contains a Python script that uses Azure Cognitive Services for real-time speech transcription and system volume control. The script listens for a hotkey (F2) to start or stop transcription and mutes/unmutes the system volume accordingly. The transcribed text is typed into the active window. I wrote this out of necessity for my own use, to minimise my typing load over hundreds of MS-Teams messages every day, but I hope it can be helpful to others as well. I understand there are built-in options, but I wanted to try this regardless.
- Real-time speech transcription using Azure Cognitive Services.
- Mute/unmute system volume during transcription.
- Hotkey listener to start/stop transcription.
- Python 3.6+
- Azure Cognitive Services Speech SDK
python-dotenv
keyboard
pyautogui
pycaw
comtypes
-
Clone the repository:
git clone https://github.com/dan-hampton/azure-speech-transcribe.git cd azure-speech-transcribe
-
Install the required packages:
pip install -r requirements.txt
-
Set up Azure Speech Service and obtain your API keys:
- Go to the Azure Portal.
- Create a new Speech service resource.
- Navigate to the resource and copy the API key and region.
-
Create a
.env
file in the root directory and add your Azure Cognitive Services API key and region:AZURE_SPEECH_KEY=your_speech_key AZURE_REGION=your_service_region
-
Run the script:
python speech.py
-
Choose the transcription mode:
- Press
1
for continuous transcription. - Press
2
for one-shot transcription.
- Press
-
Follow the on-screen instructions:
- For continuous transcription, press
F2
to start or stop transcription. - For one-shot transcription, press
F2
to start a one-shot transcription.
- For continuous transcription, press
-
To exit the script, press
Ctrl+C
.
This script is particularly useful for individuals who need to transcribe spoken words into text in real-time, such as during meetings, lectures, or interviews. By muting the system volume during transcription, it ensures that background noise from the system does not interfere with the transcription process. The hotkey functionality allows for easy control over when transcription starts and stops, making it convenient to use in various scenarios.
This project is licensed under the MIT License. See the LICENSE file for details.