This is the roadmap for wespeaker version 2.0.
- SSL support
- Algorithms
- DINO
- MOCO
- SimCLR
- Iteratively psudo label prediction and supervised finetuning
- Recipes
- VoxCeleb
- WenetSpeech
- Gigaspeech
- Algorithms
- Recipes
- 3D-speaker
- NIST SRE
- SRE16
- SRE18
- Documents
- Speaker embedding learning basics
- Core code explanation
- Step-by-step tutorials
- VoxCeleb Supervised
- VoxCeleb Self-supervised
- VoxSRC Diarization
This is the roadmap for wespeaker version 1.0.
- Standard dataset support
- VoxCeleb
- CnCeleb
- SOTA models support
- x-vector (tdnn based, milestone deep speaker embedding)
- r-vector (resnet based, winner of voxsrc 2019)
- ecapa-tdnn (variant of tdnn, winner of voxsrc 2020)
- Back-end Support
- Cosine
- EER/minDCF
- AS-norm
- PLDA
- UIO for effective industrial-scale dataset processing
- Online data augmentation
- Noise && RIR
- Speed Perturb
- Specaug
- Online data augmentation
- ONNX support
- Triton Server support (GPU)
- ~~
- Training or finetuning big models such as WavLM might be too costly for current stage
- Basic Speaker Diarization Recipe
- Embedding based (more related with our speaker embedding learner toolkit)
- Interactive Demo
- Support using features from released pretrained models (hugging face)
- Model (SOTA Models)
- Pooling Functions
- TAP(mean) / TSDP(std) / TSTP(mean+std)
- Comparison of mean/std pooling can be found in shuai_iscslp, anna_arxiv
- Attentive Statistics Pooling (ASTP)
- Mainly for ECAPA_TDNN
- Multi-Query and Multi-Head Attentive Statistics Pooling (MQMHASTP)
- Details can be found in MQMHASTP
- TAP(mean) / TSDP(std) / TSTP(mean+std)
- Criteria
- Scoring
- Cosine
- PLDA
- Score Normalization (AS-Norm)
- Quality-aware Score Calibration
- Metric
- EER
- minDCF
- DER
- Online Augmentation
- Noise && RIR
- Speed Perturb
- SpecAug
- Training Strategy
- Well-designed Learning Rate and Margin Schedulers
- Large Margin Fine-tuning
- Automatic Mixed Precision (AMP) Training
- Runtime
- Python Binding
- Triton Inference Server on verification && diarization in GPU deployment
- C++ Onnxruntime
- MNN
- Self-Supervised Learning (SSL)
- Literature