Transcribe

A robust Python package for high-quality audio transcription using OpenAI's Whisper and post-processing with GPT models.

Features

High-Quality Transcription: Leverages OpenAI's Whisper model for accurate audio transcription
Smart Post-Processing: Uses OpenAI's GPT models to clean and format transcripts
Timestamp Support: Include precise timing information for each segment of speech
Subtitle Generation: Create SRT and VTT subtitle files for videos
Parallel Processing: Efficiently processes large audio files in parallel
Context-Aware: Apply domain-specific context to improve transcription quality
Multi-language Support: Automatic language detection and support for numerous languages
Configurable: Extensive configuration options via command line, environment variables, or config files
Caching: Intelligent caching to avoid redundant API calls and reduce costs
Robust Error Handling: Graceful degradation with automatic retry mechanisms

Installation

Using the Installation Script (Recommended)

The easiest way to install is using the provided installation script:

# Clone the repository
git clone https://github.com/yourusername/transcribe.git
cd transcribe

# Local installation (in the current directory)
./install.sh --local

# OR System-wide installation (requires sudo)
sudo ./install.sh --system

After installation, you may need to create a .env file with your OpenAI API key:

echo "OPENAI_API_KEY=your_api_key_here" > .env

Manual Installation

If you prefer to install manually, follow these steps:

Local Installation

# Clone the repository
git clone https://github.com/yourusername/transcribe.git
cd transcribe

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Make scripts executable
chmod +x transcribe clean-transcript create-sentences

System-wide Installation

For system-wide installation, you can install the package and create symlinks to the commands:

# Clone the repository
sudo git clone https://github.com/yourusername/transcribe.git /usr/share/transcribe
cd /usr/share/transcribe

# Create and activate a virtual environment
sudo python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
sudo pip install -r requirements.txt

# Make scripts executable
sudo chmod +x transcribe clean-transcript create-sentences

# Create symlinks in /usr/local/bin
sudo ln -sf /usr/share/transcribe/transcribe /usr/local/bin/transcribe
sudo ln -sf /usr/share/transcribe/clean-transcript /usr/local/bin/clean-transcript
sudo ln -sf /usr/share/transcribe/create-sentences /usr/local/bin/create-sentences

Quick Start

Transcribe an Audio File

# Basic transcription
transcribe audio_file.mp3 -o output.txt -c "context about the audio" -m gpt-4o -v

# Transcription with timestamps
transcribe audio_file.mp3 -o output.txt -T -v

# Generate SRT subtitles
transcribe audio_file.mp3 -o output.txt --srt -v

# Generate VTT subtitles
transcribe audio_file.mp3 -o output.txt --vtt -v

Clean a Transcript

clean-transcript input.txt -o output.txt -c "context information" -m gpt-4o

Create Well-Formed Sentences

create-sentences input.txt -o output.txt

Command Line Options

Transcribe

Usage: transcribe [options] audio_file

Output Options:
  -o, --output FILE      Output file (default: input filename with .txt extension)
  -O, --output-to-stdout Output the transcription to stdout

Processing Options:
  -l, --chunk-length MS  Length of audio chunks in milliseconds (default: 600000)
  -L, --input-language   Define the language used in the input audio (auto-detects if not specified)
  -c, --context TEXT     Context information to improve transcription
  -m, --model MODEL      GPT model for post-processing (default: gpt-4o)
  -s, --max-chunk-size N Maximum chunk size for post-processing (default: 3000)
  -t, --temperature N    Temperature for generation (default: 0.1)
  -p, --prompt TEXT      Provide a prompt to guide the initial transcription
  -P, --no-post-processing Skip post-processing step

Timestamp and Subtitle Options:
  -T, --timestamps       Include timestamp information in the output
  --srt                  Generate SRT subtitle file (enables timestamps automatically)
  --vtt                  Generate VTT subtitle file (enables timestamps automatically)

Logging Options:
  -v, --verbose          Verbose output
  -d, --debug            Debug output

Clean Transcript

Usage: clean-transcript [options] input_file

Options:
  -o, --output FILE      Output file (default: stdout)
  -c, --context TEXT     Context information to improve cleaning
  -m, --model MODEL      GPT model for processing (default: gpt-4o)
  -t, --temperature N    Temperature for generation (default: 0.1)
  -v, --verbose          Verbose output
  --debug                Debug output
  --config FILE          Configuration file

Create Sentences

Usage: create-sentences [options] input_file

Options:
  -o, --output FILE      Output file (default: stdout)
  --min-sentences N      Minimum sentences per paragraph (default: 2)
  --max-sentences N      Maximum sentences per paragraph (default: 8)
  -v, --verbose          Verbose output
  --debug                Debug output

Environment Variables

OPENAI_API_KEY: Your OpenAI API key
OPENAI_COMPLETION_MODEL: Default model for completions
OPENAI_SUMMARY_MODEL: Default model for summaries
TRANSCRIBE_CHUNK_LENGTH_MS: Chunk length in milliseconds (default: 600000)
TRANSCRIBE_MAX_CHUNK_SIZE: Maximum chunk size for processing
TRANSCRIBE_TEMP_DIR: Directory for temporary files
TRANSCRIBE_LOG_LEVEL: Logging level (INFO, DEBUG, etc.)
TRANSCRIBE_NO_CACHE: Disable caching (set to "1")
TRANSCRIBE_CACHE_DIR: Custom cache directory
TRANSCRIBE_CACHE_SIZE_MB: Maximum cache size in MB
TRANSCRIBE_DEFAULT_SUBTITLE_FORMAT: Default subtitle format (srt or vtt)
TRANSCRIBE_MAX_LINE_LENGTH: Maximum line length for subtitles (default: 42)
TRANSCRIBE_MAX_SUBTITLE_DURATION: Maximum duration for a single subtitle (default: 5.0)

Configuration File

You can specify configuration options in a JSON file:

{
  "openai": {
    "models": {
      "completion": "gpt-4o",
      "summary": "gpt-4o-mini"
    }
  },
  "transcription": {
    "chunk_length_ms": 600000
  },
  "processing": {
    "max_chunk_size": 3000,
    "temperature": 0.1
  },
  "subtitles": {
    "default_format": "srt",
    "max_line_length": 42,
    "max_duration": 5.0
  }
}

Advanced Usage

Using as a Python Package

from transcribe_pkg.core.transcriber import transcribe_audio_file
from transcribe_pkg.core.processor import process_transcript
from transcribe_pkg.utils.subtitle import generate_srt, generate_vtt

# Transcribe an audio file
transcript = transcribe_audio_file(
    audio_path="lecture.mp3",
    output_file="transcript.txt",
    context="University lecture on quantum physics",
    model="gpt-4o"
)

# Generate subtitles
transcript_with_timestamps = transcribe_audio_file(
    audio_path="lecture.mp3",
    with_timestamps=True
)

# Create SRT subtitles
with open("lecture.srt", "w") as f:
    f.write(generate_srt(transcript_with_timestamps["segments"]))

# Create VTT subtitles
with open("lecture.vtt", "w") as f:
    f.write(generate_vtt(transcript_with_timestamps["segments"]))

# Process a transcript
processed_text = process_transcript(
    input_text=transcript,
    context="Technical content with scientific terminology",
    model="gpt-4o",
    language="en"
)

Development

Running Tests

python run_tests.py
python run_tests.py --coverage  # Generate coverage report
python run_tests.py --pattern "test_config"  # Run specific tests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Authors

Gary Dean
Claude Code 0.2.29

This README was last updated on March 2nd, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
test_audio		test_audio
tests		tests
transcribe_pkg		transcribe_pkg
.coverage		.coverage
LICENSE		LICENSE
README.md		README.md
clean-transcript		clean-transcript
clean-transcript.py		clean-transcript.py
clean-transcript2.py		clean-transcript2.py
clean_transcript.py		clean_transcript.py
config.json		config.json
create-sentences		create-sentences
create-sentences.py		create-sentences.py
create_sentences.py		create_sentences.py
install.sh		install.sh
language-codes		language-codes
language-codes.py		language-codes.py
language_codes.py		language_codes.py
requirements.txt		requirements.txt
run_tests.py		run_tests.py
setup.py		setup.py
subtitle_utils.py		subtitle_utils.py
transcribe		transcribe
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcribe

Features

Installation

Using the Installation Script (Recommended)

Manual Installation

Local Installation

System-wide Installation

Quick Start

Transcribe an Audio File

Clean a Transcript

Create Well-Formed Sentences

Command Line Options

Transcribe

Clean Transcript

Create Sentences

Environment Variables

Configuration File

Advanced Usage

Using as a Python Package

Development

Running Tests

Contributing

License

Authors

About

Releases

Packages

Languages

License

Open-Technology-Foundation/transcribe

Folders and files

Latest commit

History

Repository files navigation

Transcribe

Features

Installation

Using the Installation Script (Recommended)

Manual Installation

Local Installation

System-wide Installation

Quick Start

Transcribe an Audio File

Clean a Transcript

Create Well-Formed Sentences

Command Line Options

Transcribe

Clean Transcript

Create Sentences

Environment Variables

Configuration File

Advanced Usage

Using as a Python Package

Development

Running Tests

Contributing

License

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages