Lists (26)
Sort Name ascending (A-Z)
algro
am
Annotation
BIG
books
cv
dataset
diffusion_models
expressive_tts
frontend
fun
Go
mos-predict
multilingual
nlp
others
separate
sing
star
TODO
toy
tts_data_process
tts_framework
ttsing
vae
vocoder
Starred repositories
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
Megatts2 use HierSpeechpp's vocoder
Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
A curated list of reinforcement learning with human feedback resources (continually updated)
first base model for full-duplex conversational audio
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Robust recipes to align language models with human and AI preferences
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Real-time Speech-Text Foundation Model Toolkit (wip)
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
An Open-Sourced LLM-empowered Foundation TTS System
A word list containing 25 000 of the most popular English words, divided into syllables.
The open source code for SimpleSpeech series
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature