Skip to content

v0.4.0

Compare
Choose a tag to compare
@chainyo chainyo released this 02 Aug 13:10
· 171 commits to main since this release
ad689f4

This release includes a lot of improvements and a new License starting with the v0.4.0 of wordcab-transcribe (inspired by the HFOIL).

The new License WTLv0.1

The new License prevents anyone from using this project after v0.4.0 (included) to sell a self-hosted version of this software without any agreements from Wordcab.

But you can still use the project for research, personal use, or even as a backend tool for your projects.

API

  • Fixed CortexResponse for Svix size limit #101
  • Made alignment non-critical if the process fails #105
  • Added multi-GPU support for transcription, alignment, and diarization #114
  • Added the audio_duration (in seconds) in the API response #127
  • Added a catch for invalid or empty audio file #128
  • Added a log about the number of detected and used GPUs at launch #138
  • Updated pydantic to v2 #157
  • Added an audio file global download queue #168
  • Added the new WTL v0.1 License #177 #183 #184

Transcription

  • Added the vocab feature #124
  • Added an internal_vad parameter that helps with empty utterances #142 #173
  • Added a new fallback for empty segments during transcription #149
  • Added the float32 compute type for the transcription model #157

Diarization

  • Decomposed the diarization process into sub-modules and optimized diarization inference #180

Alignment

  • Added new cs, in, sl and th alignment models #164

Post-processing

  • Improved the post-processing strategy #136 #157
  • Fix word_timestamps parameter for dual_channel #152

Instructions

  • Improvement of the contributions instructions #131

Deploy

  • Update error payload for Svix in cortex endpoint #118
  • Docker image updated to cuda:11.7.1 #133
  • Update Svix payload in cortex endpoint #144
  • Add a configuration file using Nginx for custom deploy #146

Need improvements / Not fully working

  • Added the possibility to use extra transcription models for specific languages #110

Contributors:
@chainyo @aleksandr-smechov @jissagn