Skip to content

Latest commit

 

History

History
170 lines (120 loc) · 5.92 KB

README.md

File metadata and controls

170 lines (120 loc) · 5.92 KB

Colour analysis of images from Instagram posts with hashtag #selfcare


This project explores high-level colour patterns present in Instagram posts with hashtag #selfcare. To this end, it compares pixel colour values of #selfcare-tagged images and generic images.

The code is built using Python and is distributed under GPL-3.0 License.

Content

1. Data

For this experiment, 2 datasets have been created. One containing Instagram images with hashtag #selfcare and the other containing generic Instagram images.

Read more to prepare your dataset.

1.1 #selfcare dataset

A total of 3526 images have been retrieved mostly from the following days:

  • 2021-01-07
  • 2021-01-08
  • 2021-01-10

However, other dates are also present. Details on the date occurences can be found in this file.

1.2 Generic dataset

A total of 3526 images have been retrieved. They come from different hashtags: #tbt, #followme, #repost, #photooftheday, #picoftheday, #follow, #like4like, #nature, #instagood, #instadaily, #instagram, #happy. Data was retrieved from different dats, specific date occurences can be found in this file.

We deemed that the images tagged with these 12 hashtags present a wide variety of imagery that may be representative of Instagram as a whole. The hashtags have been obtained from this list of the most used Instagram hashtags.

2. Method

For both datasets (selfcare and generic):

  • Download images: Images are downloaded from Instagram posts with specific hashtags using instaloader package.
  • Process images: Near-squared images are resized into (100, 100) pixel images.
  • Build collage: Build a collage with all (100, 100) processed images. Example here.
  • Extract palette: Finally, the colour palette is extracted from the previously generated collage, leveraging colorgram.py package.

Finally, once results for both datasets are obtained:

  • Comparison: Palettes obtained from both datasets are compared.

3. Results

In the following, results obtained from both datasets are presented.

3.1 #selfcare

Find below a graph with the most descriptive 10-colour palette of the selfcare dataset. The horizontal axis shows the RGB colour codes and the vertical axis quantifies the relative share of importance of each palette component (i.e. the higher the bar, the more presence a colour has in the dataset). We refer to the later as relative importance.

Note: The relative importance measures the proportion of all images with a given colour. Note that it is normalized such that the relative importance of the palette colours add up to 1.

The table below shows the relative importance values:

RGB colour Relative importance
(240, 232, 223) 0.299
(186, 159, 134) 0.170
(121, 93, 72) 0.111
(37, 26, 19) 0.097
(216, 226, 236) 0.072
(240, 224, 231) 0.064
(230, 241, 236) 0.051
(21, 28, 44) 0.050
(135, 165, 189) 0.047
(72, 97, 124) 0.038

3.2 Generic

Likewise, the following graph shows the same results for the generic dataset.

The table below shows the relative importance values:

RGB colour Relative importance
(181, 157, 134) 0.169
(119, 92, 72) 0.163
(237, 230, 220) 0.162
(36, 25, 18) 0.157
(21, 27, 42) 0.084
(212, 223, 234) 0.063
(139, 163, 184) 0.060
(75, 96, 119) 0.056

4. Use the code

The core code of the project lives in folder scripts, where multiple scripts are found.

4.1 Installation

Make sure to have python installed.

$ pip install -r requirements.txt

This project was developed using Python 3.8

4.2 Prepare the dataset

4.2.1 Download images

Use the script download_images.py. By default, images are stored under data/original (make sure it exists).

$ python scripts/download_images.py

4.2.2 Process images

Use the script process_images.py. By default, images are stored under data/processed (make sure it exists).

$ python scripts/process_images.py

This script resizes the images to 224x224 pixels. In order to minimize the impact of resizing (it can lead to noticeable distortions), only near-squared images have been used.

4.2.3 Build data collage

Use the script build_collage.py.

$ python scripts/build_collage.py

By default, the generted collage is stored as results/collage.jpg.

4.2.4 Obtain palette

Use the script get_palette.py.

$ python scripts/get_palette.py

This will do the following (by default):

4.3 Others

4.3.1 Some stats (post's date occurence)

Use the script get_stats.py.

$ python scripts/get_stats.py

By default, it saves results as results/stats_dates.csv