This is the official implementation of our ICCV paper Learning Concise and Descriptive Attributes for Visual Recognition.
- torch == 2.0.1
- python 3.9.13
- torchvision == 0.15.2
- CUB: Download the dataset from here. The downloaded files are organized as below.
- Stanford_Cars: Download the dataset from here. The downloaded files are organized as below.
- CIFAR10: run the code
python main.py --config configs/cifar10.yaml
then the dataset will be automatically downloaded into the folder./data/cifar-10-batches-py
. - CIFAR100: run the code
python main.py --config configs/cifar100_bn.yaml
then the datasete will be automatically downloaded into the folder./data/cifar-100-python
. - Flowers102: run the code
python main.py --config configs/flower.yaml
then the dataset will be automatically downloaded into the folder./data/flowers-102
. - Food101: run the code
python main.py --config configs/food_bn.yaml
then the datasete will be automatically downloaded into the folder./data/food-101
. - Oxford-Pets: run the code
python main.py --config configs/oxford_pets_bn.yaml
then the datasete will be automatically downloaded into the folder./data/oxford-iiit-pet
. - Imagenet-Animals: Download t he dataset from here and the downloaded files are organized as below.
- data
- CUB_200_2011
- cub_attributes_gpt3.txt # generated by us
- image_class_labels.txt # generated by us
- train_test_split.txt
- images.txt
- attributes
- images
- parts
- README.md
- ...
- stanford_cars
- cars_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- cars_train
- *.jpg
- cars_test
- *.jpg
- devkit
- cars_train.tgz
- cars_test.tgz
- cars_test_annos_withlabels.mat
# The url provided from "torchvision" is invalid,
# so you need to first download the files and put
# the tgz files under this folder so that the class
# would think the dataset has already been downloaded.
- cifar-10-batches-py
- cifar10_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- cifar-100-python
- cifar100_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- flowers-102
- flower_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- food-101
- food_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- oxford-iiit-pet
- oxford_pets_attributes.txt # generated by us
- image_class_labels.txt # generated by us
- imagenet
- imagenet_animal_attributes.txt # generated by us
- imagenet_attributes.txt # generated by us
- image_class_labels.txt # generated by us
We put the attributes quried for each class with GPT3 in the folder cls2attributes
.
The following key parameters are available for customization:
- cluster_feature_method: Choose one from [kmeans, random, linear]. "Linear" refers to our method.
- model_size: Set the size of the CLIP model.
- mahalanobis: Enable or disable Mahalanobis distance regularization.
- division_power: Control the strength of Mahalanobis constraints.
- reinit: Decide whether to initialize the model with weights from image training features.
- num_attributes: Specify the number of attributes selected for classification.
Please make sure to adjust these parameters according to your requirements.
If you find our codebase useful for your research, please consider citing our paper:
@article{DBLP:journals/corr/abs-2308-03685,
author = {An Yan and
Yu Wang and
Yiwu Zhong and
Chengyu Dong and
Zexue He and
Yujie Lu and
William Wang and
Jingbo Shang and
Julian J. McAuley},
title = {Learning Concise and Descriptive Attributes for Visual Recognition},
journal = {CoRR},
volume = {abs/2308.03685},
year = {2023}
}