Our project discusses and tackles gender bias in captioning models, so that the image captions will be neutral of gender stereotypes. Specifically, this project 1) identifies types of gender bias in the captioning model, 2) experiments with different methods in reducing gender bias and 3) interprets the success of the final model. We also aim at building an inclusive captioning model that can distinguish not only the gender binary (ie. man or woman) but also a third category (ie. person) based on visual appearance.
Above is some examples of gender bias in an untuned captioning model. Some activities, objects and context are associated with either gender in the model due to overrepresentation of men in the samples and bias in the annotations. Women can also be professional chefs cooking outside of home kitchen, while men can also enjoy taking selfies.
- Install the project dependencies in your virtual environment. Refer to pipenv for commands specific to your OS. For macOS/ Linux, run
$ pipenv install --dev
Main dependencies:
- Python3
- torch
- torchvision
- nltk
- sklearn
- PIL
- matplotlib
- json
- Clone this repository locally. Create a subdirectory
model
, all trained models will be saved in this subdirectory.
$ mkdir models
- Download the following data from the COCO website into a subdirectory
data
located inside this project's directory. Move the folders into the repo's subdirectorydata
. This step is necessary only if intended to train a model or evaluate model results using COCO dataset.
-
Under Annotations, download:
- 2014 Train/Val annotations [241MB]
By completion, the subdirectory
data/annotations/
should contain 5 files:captions_train2014.json
,captions_val2014.json
,instanes_train2014.json
andinstances_val2014.json
. -
Under Images, download:
- 2014 Train images [13GB]
- 2014 Val images [6GB]
By completion, the subdirectory
data/images/
should contain 2 folders,train2014
andval2014
.
- Download folder containing pretained model. This model is trained on 4,625 COCO images with human figures as center of interest, using a balanced clean dataset and cross-entropy loss. Move the folders into the repo's main directory. This step is only necessary if intended to use pre-trained model.
-
Download and unzip:
- Gender_Neutral_Captioning_model
By completion, the subdirectory
Gender_Neutral_Captioning_model/
should contain 3 files,training_image_ids.pkl
,vocab.pkl
andbest-model.pkl
.
The end-to-end process of our project can be reproduced via our Gender_Neutral_Captioning notebook. From the directory of the repo, run
$ jupyter notebook Gender_Neutral_Captioning.ipynb
The notebook consists of 4 major parts:
- Part I. Preparing Dataset for Training
- Part II. Model Training
- a. Select method to generate training set
- b. Train CNN+ LSTM model
- Part III. Predict on test images
- a. Predict on human images in the COCO dataset
- b. Predict on any images
- Part IV. Evaluate Model Performance
Note:
- Each section can be run independently.
- To run an individual section, click the cell in the section and hit Cmd + enter.
- All sections other than Part IIIb requires a full download of the COCO 2014 Training and Validation dataset, in the structure specified above.