Skip to content

Activation Analysis

Timo Denk edited this page Oct 28, 2018 · 27 revisions

This wiki page contains information about the analysis of activations of our models.

Baseline ResNet

Layer Overview

# Name Shape Size Description
0 act0_raw_input 64x64x3 12,288
1 act1_processed_imgs 64x64x3 12,288
2 act2_first_conv 64x64x64 262,144
3 act3_block1 64x64x64 262,144
4 act4_block2 32x32x128 131,072
5 act5_block3 16x16x256 65,536
6 act6_block4 8x8x512 32,768
7 act7_block4_postact 8x8x512 32,768
8 act8_global_avg 512 512
9 act9_logits 200 200

ALP ResNet

Layer Overview

# Name Shape Size Description
1 act1_input 64x64x3 12,288 Input images
2 act2_first_conv 16x16x64 16,384
3 act3_block1 8x8x256 16,384
4 act4_block2 4x4x512 8,192
5 act5_block3 2x2x1024 4,096
6 act6_block4 2x2x2048 8,192
7 act7_norm 2x2x2048 8,192
8 act8_pool 200 Logits

Compression Effects

Compression of the activations after layer xxx (1) disabled, (2) with a random matrix, (3) a matrix computed using PCA. The PCA was computed on the training data.

Preceeding layer Axis Matrix shape Mode Validation accuracy [%]
- - - (1) 72.04
act2_first_conv Channels 64x64 (2) 72.04
act2_first_conv Channels 64x63 (2) 72.60
act2_first_conv Channels 64x62 (2) 61.68
act2_first_conv Channels 64x61 (2) 61.72
act2_first_conv Channels 64x64 (3) 72.04
act2_first_conv Channels 64x63 (3) 60.88
act5_block3 All 4096x4096 (3) 72.04
act5_block3 All 4096x3968 (3) 71.54 71.65
act5_block3 All 4096x3840 (3) 70.71 71.47
act5_block3 All 4096x3584 (3) 68.72 70.00
act5_block3 All 4096x3328 (3) 67.64
act5_block3 All 4096x3072 (3) 65.87
act5_block3 All 4096x2048 (3) 47.40 46.60
act5_block3 All 4096x1024 (3) 7.12
act4_block2 All 8192x7680 (3) 70.08
act4_block2 All 8192x7168 (3) 67.23
act4_block2 All 8192x4096 (3) 21.51
act4_block2 All 8192x2048 (3) 2.31
act4_block2 All 8192x1024 (3) 0.75

Explanations: Italic accuracies are expected to match the baseline (1).

image
Fig.: Plot of the table above. As can be seen, the earlier layer (closer to the input) is more sensitive to PCA compression.

Cosine Similarity

Tiny ImageNet contains samples labeled with 200 different classes. We store the activations induced by 74,246 correctly classified training samples and compute the cosine similarity between those vectors and a single activation chosen from this matrix.

act5_block3_hist
Fig.: Activations taken from the layer act5_block3 in this commit. The unimodal distribution shows where the cosine similarities are in general. The red histogram shows the similarity of labels of the same class as the sample that is being compared to the activations. It can be seen that samples of the same class tend to induce activation vectors with higher cosine similarities among each other.

act5_block3_hist_zoomed
Fig.: Zoomed version of the histogram above.

We did the same type of analysis for all layers (with 20,000 training samples, minus misclassified ones). The resulting histograms can be seen in the following figure:

activation_cosine_histograms
Fig.: The histograms clearly show the gain of "classification information" with increasing depth. Samples of the same class tend to be closer to each other, the closer to the output.

Cosine Distance for Compressed Activations

Compressing the activations of the layer act5_block3 with a PCA and computing the same histogram as above (for different compression levels).

pca_histograms
Fig.: The layer output vector size is 4096. Here we compare the cosine similarity of a randomly chosen sample from the dataset with all other vectors after applying a PCA reducing to the named dimensionality (4096, 2048, 1024, ..., 8). In can be seen that the similarity of samples of the same class still holds for compression levels up down to 32.