Skip to content
View auzxb's full-sized avatar
😌
I may be slow to respond.
😌
I may be slow to respond.
  • Shenzhen

Block or report auzxb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashMLA: Efficient MLA decoding kernels

C++ 11,194 782 Updated Mar 1, 2025

Fully open reproduction of DeepSeek-R1

Python 22,337 2,002 Updated Mar 7, 2025

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 204 13 Updated Sep 16, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 985 110 Updated Jan 30, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,240 2,101 Updated Mar 7, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 177 11 Updated Jan 9, 2025

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 888 107 Updated Aug 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,195 277 Updated Nov 5, 2024

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,124 64 Updated Jul 20, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

391 16 Updated Jan 18, 2025

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 53 2 Updated Dec 23, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 547 54 Updated Jul 26, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 330 40 Updated Aug 15, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 23,505 2,328 Updated Mar 7, 2025

The open source code for LLM-Codec

Python 130 7 Updated Aug 18, 2024

Community interface for generative AI

TypeScript 8,946 908 Updated Apr 30, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 90 5 Updated Jun 2, 2024

Implementation of Google's USM speech model in Pytorch

Python 30 4 Updated Jan 27, 2025

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 232 16 Updated Feb 28, 2025

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 389 66 Updated Aug 16, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,314 126 Updated Jul 11, 2024

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,615 320 Updated Jan 4, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,759 263 Updated Mar 8, 2025

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 175 17 Updated May 29, 2024

更纯粹、更高压缩率的Tokenizer

Python 470 23 Updated Nov 27, 2024
36 1 Updated Jan 28, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,451 831 Updated Jul 18, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 37,146 4,385 Updated Aug 19, 2024
Next