-
University of Maryland
- Washington DC
- https://somepago.github.io/
- @gowthami_s
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A Data Streaming Library for Efficient Neural Network Training
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
Using FlexAttention to compute attention with different masking patterns
Efficient Triton Kernels for LLM Training
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 π and reasoning techniques.
NumPy tutorials & educational content in notebook format
γγΌγγγδ½γ Deep Learning βΈγ(O'Reilly Japan, 2020)
LLM related research papers curated by LLMs themselves
Neural Networks: Zero to Hero
Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).
Lumina-T2X is a unified framework for Text to Any Modality Generation
πΊ An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. π₯
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
When do we not need larger vision models?
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024