This repository provides a full-stack web application for interview transcription. It features an integrated pipeline that utilizes OpenAI Whisper for accurate transcription and NVIDIA NeMo for reliable speaker diarization. The backend infrastructure is built using FastAPI, while the frontend is based on Material UI for a modern and accessible user experience. Deployment is made seamless and flexible through the included Docker setup. Additionally, Caddy provides automatic HTTPS and certificate management.
- Install Docker to run the application.
- Configure the application before starting it. Refer to the Configuration section for details.
- The project uses Make to streamline interactions with Docker Compose. Alternatively, you can refer to the Makefile and execute the commands manually. The following commands are executed from the docker folder, but you can run them from anywhere with
make -C path/to/docker/folder [target]
.
Choose between two environments for running the application: dev
and prod
. Optionally, specify the device
argument as cuda
to utilize an NVIDIA GPU. The default settings are env=dev
and device=cpu
. Follow these steps for prod
and cpu
setups:
Start the compose setup (this may take some time):
make up env=prod
Initialize the database schema:
make db-create-all
Create the initial admin user:
make admin
Download the used ML-Models (NeMo download uses wget
):
make download-nemo-models
make download-whisper-models
The application should now be running on the domain(s) configured with CADDY_DOMAIN_NAME
or on localhost:3000
for the development environment (unless another DEV_APP_PORT
is specified). Access the Swagger API documentation at /api/docs
.
The Makefile includes more targets for managing and monitoring the setup:
make restart
make start
make stop
make down
make logs # shows logs for all containers
make logs container=api # shows logs for api container
make remove-build # removes the frontend production build volume
make mariadb # mariadb -u root -p
make db-drop-all # destroys the database schema
All application data is stored in the docker/data/
directory. The following command creates a timestamped zip-archive of docker/data/
and places it in docker/backup/
:
make backup
Configure the application by creating a .env
file in the docker folder with these environment variables:
-
CELERY_CONCURRENCY
Maximum number of concurrent transcriptions. -
JWT_SECRET_KEY
Secret for signing JWT tokens. Generate one usingopenssl rand -hex 32
. -
JWT_TOKEN_EXPIRY_HOURS
Token expiry time in hours, determining user login duration. -
PASSWORD_RESET_EXPIRY_MINUTES
Expiry time for password reset tokens in minutes. -
FILES_MAX_SIZE_BYTE
Upload file size limit in bytes e.g. 300 MB =$300 \times 2^{20}$ bytes. -
CONTACT_EMAIL
Contact email displayed on the help page and in emails. -
MARIADB_ROOT_PASSWORD
Password for the MariaDB database. -
MARIADB_USER
Username for MariaDB database access. -
MARIADB_PASSWORD
Password for MariaDB database access. -
MARIADB_DATABASE
Database name used by the application. -
REDIS_PASSWORD
Password for the Redis database.
Use pwgen -Bnc 40 3
to generate strong passwords for MariaDB and Redis.
-
CADDY_DOMAIN_NAME
Domain(s) for your application e.g."example.com www.example.com"
. For multiple domains, enclose them in quotes and separate them with spaces. Certificates for your domains are automatically managed by caddy. -
BASE_URL
Base URL for generating email links e.g. https://www.example.com. -
EMAIL_HOST
Email server host. Usesmtp.mail.me.com
for iCloud andsmtp.gmail.com
for Gmail. -
EMAIL_PORT
Email server port. Use587
for iCloud and Gmail. -
EMAIL_USER
Username for the email server. -
EMAIL_PASSWORD
Password for the email server.
DEV_APP_PORT
Application's running port, defaults to 3000. A phpMyAdmin instance is running underphpmyadmin.localhost:{DEV_APP_PORT}
. There also is a mailcatcher reachable undermailpit.localhost:{DEV_APP_PORT}
.
This is only a brief overview of the application's features. For a more detailed look, just spin up the docker setup and check out the application yourself.
Files can be uploaded in various formats, including MP3, WAV, and OGG.
Alternatively, the integrated recorder can be used to record audio directly in the browser.
The progress of the transcriptions can be monitored in real-time. Optionally, users can be notified via email when a transcription is finished.
Transcriptions allow for several configurations. The Whisper model can be selected, speaker diarization can be enabled, and the number of speakers can be specified. Users can also opt-in to receive email notifications.
When speaker diarization is enabled, the transcription viewer allows users to playback speaker snippets. Transcripts can be exported as plain text, JSON, CSV, and Excel.
Administrators can monitor the system parameters in real-time, including CPU/GPU usage and RAM/VRAM consumption. Additionally, running transcriptions can be cancelled.
Users can be managed by administrators. New users receive an email with a link to set their password. Users can also reset their password via email.
Thanks to MJML, emails are designed in the application's theme and optimized for all devices.
The application is fully responsive and optimized for mobile devices.
In addition to the default dark mode, the application also supports a light mode.
This project is licensed under the AGPL-3.0. The key point: if you modify and use this code, especially in networked applications, those changes should be shared under the same license. It's about maintaining openness in software development. For exact terms refer to the LICENSE file.