v0.4.0
This release includes a lot of improvements and a new License starting with the v0.4.0
of wordcab-transcribe (inspired by the HFOIL).
The new License WTLv0.1
The new License prevents anyone from using this project after v0.4.0
(included) to sell a self-hosted version of this software without any agreements from Wordcab.
But you can still use the project for research, personal use, or even as a backend tool for your projects.
API
- Fixed
CortexResponse
for Svix size limit #101 - Made
alignment
non-critical if the process fails #105 - Added multi-GPU support for transcription, alignment, and diarization #114
- Added the
audio_duration
(in seconds) in the API response #127 - Added a catch for invalid or empty audio file #128
- Added a log about the number of detected and used GPUs at launch #138
- Updated pydantic to v2 #157
- Added an audio file global download queue #168
- Added the new WTL v0.1 License #177 #183 #184
Transcription
- Added the
vocab
feature #124 - Added an
internal_vad
parameter that helps with empty utterances #142 #173 - Added a new fallback for empty segments during transcription #149
- Added the
float32
compute type for the transcription model #157
Diarization
- Decomposed the diarization process into sub-modules and optimized diarization inference #180
Alignment
- Added new
cs
,in
,sl
andth
alignment models #164
Post-processing
Instructions
- Improvement of the contributions instructions #131
Deploy
- Update error payload for Svix in cortex endpoint #118
- Docker image updated to
cuda:11.7.1
#133 - Update Svix payload in cortex endpoint #144
- Add a configuration file using Nginx for custom deploy #146
Need improvements / Not fully working
- Added the possibility to use extra transcription models for specific languages #110
Contributors:
@chainyo @aleksandr-smechov @jissagn