Getting Started

This repository provides a full-stack web application for interview transcription. It features an integrated pipeline that utilizes OpenAI Whisper for accurate transcription and NVIDIA NeMo for reliable speaker diarization. The backend infrastructure is built using FastAPI, while the frontend is based on Material UI for a modern and accessible user experience. Deployment is made seamless and flexible through the included Docker setup. Additionally, Caddy provides automatic HTTPS and certificate management.

Getting Started

Prerequisites

Install Docker to run the application.
Configure the application before starting it. Refer to the Configuration section for details.
The project uses Make to streamline interactions with Docker Compose. Alternatively, you can refer to the Makefile and execute the commands manually. The following commands are executed from the docker folder, but you can run them from anywhere with make -C path/to/docker/folder [target].

Setup

Choose between two environments for running the application: dev and prod. Optionally, specify the device argument as cuda to utilize an NVIDIA GPU. The default settings are env=dev and device=cpu. Follow these steps for prod and cpu setups:

Start the compose setup (this may take some time):

make up env=prod

Initialize the database schema:

make db-create-all

Create the initial admin user:

make admin

Download the used ML-Models (NeMo download uses wget):

make download-nemo-models
make download-whisper-models

The application should now be running on the domain(s) configured with CADDY_DOMAIN_NAME or on localhost:3000 for the development environment (unless another DEV_APP_PORT is specified). Access the Swagger API documentation at /api/docs.

The Makefile includes more targets for managing and monitoring the setup:

make restart
make start
make stop
make down
make logs               # shows logs for all containers
make logs container=api # shows logs for api container
make remove-build       # removes the frontend production build volume 
make mariadb            # mariadb -u root -p
make db-drop-all        # destroys the database schema

All application data is stored in the docker/data/ directory. The following command creates a timestamped zip-archive of docker/data/ and places it in docker/backup/:

make backup

Configuration

Configure the application by creating a .env file in the docker folder with these environment variables:

CELERY_CONCURRENCY
Maximum number of concurrent transcriptions.
JWT_SECRET_KEY
Secret for signing JWT tokens. Generate one using openssl rand -hex 32.
JWT_TOKEN_EXPIRY_HOURS
Token expiry time in hours, determining user login duration.
PASSWORD_RESET_EXPIRY_MINUTES
Expiry time for password reset tokens in minutes.
FILES_MAX_SIZE_BYTE
Upload file size limit in bytes e.g. 300 MB = $300 \times 2^{20}$ bytes.
CONTACT_EMAIL
Contact email displayed on the help page and in emails.
MARIADB_ROOT_PASSWORD
Password for the MariaDB database.
MARIADB_USER
Username for MariaDB database access.
MARIADB_PASSWORD
Password for MariaDB database access.
MARIADB_DATABASE
Database name used by the application.
REDIS_PASSWORD
Password for the Redis database.

Use pwgen -Bnc 40 3 to generate strong passwords for MariaDB and Redis.

Production

CADDY_DOMAIN_NAME
Domain(s) for your application e.g. "example.com www.example.com". For multiple domains, enclose them in quotes and separate them with spaces. Certificates for your domains are automatically managed by caddy.
BASE_URL
Base URL for generating email links e.g. https://www.example.com.
EMAIL_HOST
Email server host. Use smtp.mail.me.com for iCloud and smtp.gmail.com for Gmail.
EMAIL_PORT
Email server port. Use 587 for iCloud and Gmail.
EMAIL_USER
Username for the email server.
EMAIL_PASSWORD
Password for the email server.

Development

DEV_APP_PORT
Application's running port, defaults to 3000. A phpMyAdmin instance is running under phpmyadmin.localhost:{DEV_APP_PORT}. There also is a mailcatcher reachable under mailpit.localhost:{DEV_APP_PORT}.

Features

This is only a brief overview of the application's features. For a more detailed look, just spin up the docker setup and check out the application yourself.

Recordings

Files can be uploaded in various formats, including MP3, WAV, and OGG.

Alternatively, the integrated recorder can be used to record audio directly in the browser.

Transcriptions

The progress of the transcriptions can be monitored in real-time. Optionally, users can be notified via email when a transcription is finished.

Transcriptions allow for several configurations. The Whisper model can be selected, speaker diarization can be enabled, and the number of speakers can be specified. Users can also opt-in to receive email notifications.

When speaker diarization is enabled, the transcription viewer allows users to playback speaker snippets. Transcripts can be exported as plain text, JSON, CSV, and Excel.

Monitoring

Administrators can monitor the system parameters in real-time, including CPU/GPU usage and RAM/VRAM consumption. Additionally, running transcriptions can be cancelled.

Users

Users can be managed by administrators. New users receive an email with a link to set their password. Users can also reset their password via email.

Emails

Thanks to MJML, emails are designed in the application's theme and optimized for all devices.

Mobile optimized

The application is fully responsive and optimized for mobile devices.

Light Mode

In addition to the default dark mode, the application also supports a light mode.

License

This project is licensed under the AGPL-3.0. The key point: if you modify and use this code, especially in networked applications, those changes should be shared under the same license. It's about maintaining openness in software development. For exact terms refer to the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Prerequisites

Setup

Configuration

Production

Development

Features

Recordings

Transcriptions

Monitoring

Users

Emails

Mobile optimized

Light Mode

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
api		api
app		app
docker		docker
LICENSE		LICENSE
README.md		README.md

License

Joost385/transcription-ui

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Prerequisites

Setup

Configuration

Production

Development

Features

Recordings

Transcriptions

Monitoring

Users

Emails

Mobile optimized

Light Mode

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages