Skip to content
View youngkyunJang's full-sized avatar

Block or report youngkyunJang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Audio Large Language Models

Python 407 25 Updated Feb 27, 2025

Grounded Language-Image Pre-training

Python 2,335 200 Updated Jan 24, 2024

OpenMMLab Detection Toolbox and Benchmark

Python 30,389 9,565 Updated Aug 21, 2024

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 761 39 Updated Aug 13, 2024

Matryoshka Multimodal Models

Python 97 5 Updated Jan 22, 2025

Multimodal Models in Real World

Jupyter Notebook 440 20 Updated Feb 24, 2025

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 795 60 Updated Oct 11, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 722 41 Updated Aug 5, 2024

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 101,863 16,516 Updated Feb 28, 2025

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Python 164 13 Updated Oct 28, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,587 70 Updated Aug 15, 2024

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 12,415 1,580 Updated Feb 29, 2024
Python 84 5 Updated Jan 4, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,936 113 Updated Jul 29, 2024

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,743 441 Updated Jan 12, 2025

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Python 361 18 Updated Dec 15, 2023

TextAugment: Text Augmentation Library

Python 415 60 Updated Feb 20, 2024

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Python 4,918 482 Updated Aug 6, 2024

Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks

OpenEdge ABL 189 22 Updated Feb 12, 2025

Densely Captioned Images (DCI) dataset repository.

Python 169 5 Updated Jul 1, 2024

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 274 27 Updated Jul 19, 2024

Anserini is a Lucene toolkit for reproducible information retrieval research

Java 1,046 477 Updated Feb 25, 2025

Open source implementation of "Vision Transformers Need Registers"

Python 165 15 Updated Jan 27, 2025
JavaScript 2,954 1,099 Updated Jun 21, 2024

Flickr30K Entities Dataset

MATLAB 168 26 Updated Dec 23, 2018

Referring Expression Datasets API

Jupyter Notebook 492 79 Updated Aug 27, 2024

COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!

Python 24 Updated Nov 23, 2024
Python 502 34 Updated Jul 29, 2024

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 128 14 Updated Oct 1, 2024

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,932 268 Updated Jun 4, 2024
Next