Speech Transcription and Control

This repository contains a Python script that uses Azure Cognitive Services for real-time speech transcription and system volume control. The script listens for a hotkey (F2) to start or stop transcription and mutes/unmutes the system volume accordingly. The transcribed text is typed into the active window. I wrote this out of necessity for my own use, to minimise my typing load over hundreds of MS-Teams messages every day, but I hope it can be helpful to others as well. I understand there are built-in options, but I wanted to try this regardless.

Features

Real-time speech transcription using Azure Cognitive Services.
Mute/unmute system volume during transcription.
Hotkey listener to start/stop transcription.

Requirements

Python 3.6+
Azure Cognitive Services Speech SDK
python-dotenv
keyboard
pyautogui
pycaw
comtypes

Installation

Clone the repository:

git clone https://github.com/dan-hampton/azure-speech-transcribe.git
cd azure-speech-transcribe

Install the required packages:
```
pip install -r requirements.txt
```
Set up Azure Speech Service and obtain your API keys:
- Go to the Azure Portal.
- Create a new Speech service resource.
- Navigate to the resource and copy the API key and region.
Create a .env file in the root directory and add your Azure Cognitive Services API key and region:
```
AZURE_SPEECH_KEY=your_speech_key
AZURE_REGION=your_service_region
```

Usage

Run the script:
```
python speech.py
```
Choose the transcription mode:
- Press 1 for continuous transcription.
- Press 2 for one-shot transcription.
Follow the on-screen instructions:
- For continuous transcription, press F2 to start or stop transcription.
- For one-shot transcription, press F2 to start a one-shot transcription.
To exit the script, press Ctrl+C.

Use Case

This script is particularly useful for individuals who need to transcribe spoken words into text in real-time, such as during meetings, lectures, or interviews. By muting the system volume during transcription, it ensures that background noise from the system does not interfere with the transcription process. The hotkey functionality allows for easy control over when transcription starts and stops, making it convenient to use in various scenarios.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
readme.md		readme.md
requirements.txt		requirements.txt
speech.py		speech.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Transcription and Control

Features

Requirements

Installation

Usage

Use Case

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

dan-hampton/azure-speech-transcribe

Folders and files

Latest commit

History

Repository files navigation

Speech Transcription and Control

Features

Requirements

Installation

Usage

Use Case

License

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages