This release includes a lot of improvements and a new License starting with the v0.4.0 of wordcab-transcribe (inspired by the HFOIL).

The new License WTLv0.1

The new License prevents anyone from using this project after v0.4.0 (included) to sell a self-hosted version of this software without any agreements from Wordcab.

But you can still use the project for research, personal use, or even as a backend tool for your projects.

API

Fixed CortexResponse for Svix size limit #101
Made alignment non-critical if the process fails #105
Added multi-GPU support for transcription, alignment, and diarization #114
Added the audio_duration (in seconds) in the API response #127
Added a catch for invalid or empty audio file #128
Added a log about the number of detected and used GPUs at launch #138
Updated pydantic to v2 #157
Added an audio file global download queue #168
Added the new WTL v0.1 License #177 #183 #184

Transcription

Added the vocab feature #124
Added an internal_vad parameter that helps with empty utterances #142 #173
Added a new fallback for empty segments during transcription #149
Added the float32 compute type for the transcription model #157

Diarization

Decomposed the diarization process into sub-modules and optimized diarization inference #180

Alignment

Added new cs, in, sl and th alignment models #164

Post-processing

Improved the post-processing strategy #136 #157
Fix word_timestamps parameter for dual_channel #152

Instructions

Improvement of the contributions instructions #131

Deploy

Update error payload for Svix in cortex endpoint #118
Docker image updated to cuda:11.7.1 #133
Update Svix payload in cortex endpoint #144
Add a configuration file using Nginx for custom deploy #146

Need improvements / Not fully working

Added the possibility to use extra transcription models for specific languages #110

Contributors:
@chainyo @aleksandr-smechov @jissagn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0