Skip to content

Multimodal Large Models Are Effective Action Anticipators (IEEE TMMοΌ‰πŸŒ³

Notifications You must be signed in to change notification settings

2tianyao1/ActionLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multimodal Large Models Are Effective Action Anticipators

This repository is the official implementation of ActionLLM. In this study, we introduce the ActionLLM, which leverages Large Language Models (LLMs) to anticipate long-term actions by treating video sequences as successive tokens. By simplifying the model architecture and incorporating a Cross-Modality Interaction Block, it enhances multimodal semantic understanding and achieves superior performance on benchmark datasets. Paper from https://arxiv.org/abs/2501.00795.

Illustrating the architecture of the proposed ActionLLM

Environmental setup

  • Conda environment settings:
conda env export > actionllm.yaml
conda activate actionllm

Data

Create a directory './data' for the two datasets , text feature and LLaMA-7B. Please ensure the data structure is as below:

    β”œβ”€β”€ data/                      
        β”œβ”€β”€ 50_salads/ 
        β”‚   β”œβ”€β”€ groundTruth/
        β”‚   β”œβ”€β”€ features/
        β”‚   β”œβ”€β”€ mapping.txt
        β”‚   └── splits/             
        β”œβ”€β”€ breakfast/ 
        β”‚   β”œβ”€β”€ groundTruth/
        β”‚   β”œβ”€β”€ features/
        β”‚   β”œβ”€β”€ mapping.txt
        β”‚   └── splits/                         
        β”œβ”€β”€ text_feature/ 
        β”‚   β”œβ”€β”€ breakfast/
        β”‚   └── 50_salads/  
        └── weights/ 
            └── 7B/      
                β”œβ”€β”€ checklist.chk
                β”œβ”€β”€ consolidated.00.pth
                β”œβ”€β”€ params.json
                └── ...       

Training

  • Please modify the address information in the .sh file and opts.py file according to your file location.

1. Breakfast

./scripts/bf/train_bf.sh   

2. 50salads

./scripts/50s/train_50s.sh  

Testing

1. Breakfast

./scripts/bf/eval_bf.sh  

2. 50salads

./scripts/50s/eval_50s.sh  

Examples

Citation

If you find our code or paper useful, please consider citing our paper:

@article{wang2025actionllm,
  title={Multimodal Large Models Are Effective Action Anticipators},
  author={Wang, Binglu and Tian, Yao and Wang, Shunzhou and Yang, Le}
  journal={IEEE Transactions on Multimedia},
  year={2025},
  publisher={IEEE}
}

Acknowledgement

This repo borrows some data and codes from LLaMA, FUTR and LaVIN. Thanks for their great works.

About

Multimodal Large Models Are Effective Action Anticipators (IEEE TMMοΌ‰πŸŒ³

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published