Activation Analysis

This wiki page contains information about the analysis of activations of our models.

Baseline ResNet

Layer Overview

#	Name	Shape	Size
0	`act0_raw_input`	64x64x3	12,288
1	`act1_processed_imgs`	64x64x3	12,288
2	`act2_first_conv`	64x64x64	262,144
3	`act3_block1`	64x64x64	262,144
4	`act4_block2`	32x32x128	131,072
5	`act5_block3`	16x16x256	65,536
6	`act6_block4`	8x8x512	32,768
7	`act7_block4_postact`	8x8x512	32,768
8	`act8_global_avg`	512	512
9	`act9_logits`	200	200

ALP ResNet

Layer Overview

#	Name	Shape	Size	Description
1	`act1_input`	64x64x3	12,288	Input images
2	`act2_first_conv`	16x16x64	16,384
3	`act3_block1`	8x8x256	16,384
4	`act4_block2`	4x4x512	8,192
5	`act5_block3`	2x2x1024	4,096
6	`act6_block4`	2x2x2048	8,192
7	`act7_norm`	2x2x2048	8,192
8	`act8_pool`	200		Logits

Compression Effects

Compression of the activations after layer xxx (1) disabled, (2) with a random matrix, (3) a matrix computed using PCA. The PCA was computed on the training data.

Preceeding layer	Axis	Matrix shape	Mode	Validation accuracy [%]
-	-	-	(1)	72.04
`act2_first_conv`	Channels	64x64	(2)	72.04
`act2_first_conv`	Channels	64x63	(2)	72.60
`act2_first_conv`	Channels	64x62	(2)	61.68
`act2_first_conv`	Channels	64x61	(2)	61.72
`act2_first_conv`	Channels	64x64	(3)	72.04
`act2_first_conv`	Channels	64x63	(3)	60.88
`act5_block3`	All	4096x4096	(3)	72.04
`act5_block3`	All	4096x3968	(3)	~~71.54~~ 71.65
`act5_block3`	All	4096x3840	(3)	~~70.71~~ 71.47
`act5_block3`	All	4096x3584	(3)	~~68.72~~ 70.00
`act5_block3`	All	4096x3328	(3)	67.64
`act5_block3`	All	4096x3072	(3)	65.87
`act5_block3`	All	4096x2048	(3)	~~47.40~~ 46.60
`act5_block3`	All	4096x1024	(3)	7.12
`act4_block2`	All	8192x7680	(3)	70.08
`act4_block2`	All	8192x7168	(3)	67.23
`act4_block2`	All	8192x4096	(3)	21.51
`act4_block2`	All	8192x2048	(3)	2.31
`act4_block2`	All	8192x1024	(3)	0.75

Explanations: Italic accuracies are expected to match the baseline (1).

Fig.: Plot of the table above. As can be seen, the earlier layer (closer to the input) is more sensitive to PCA compression.

Cosine Similarity

Tiny ImageNet contains samples labeled with 200 different classes. We store the activations induced by 74,246 correctly classified training samples and compute the cosine similarity between those vectors and a single activation chosen from this matrix.

act5_block3_hist
Fig.: Activations taken from the layer act5_block3 in this commit. The unimodal distribution shows where the cosine similarities are in general. The red histogram shows the similarity of labels of the same class as the sample that is being compared to the activations. It can be seen that samples of the same class tend to induce activation vectors with higher cosine similarities among each other.

act5_block3_hist_zoomed
Fig.: Zoomed version of the histogram above.

We did the same type of analysis for all layers (with 20,000 training samples, minus misclassified ones). The resulting histograms can be seen in the following figure:

activation_cosine_histograms
Fig.: The histograms clearly show the gain of "classification information" with increasing depth. Samples of the same class tend to be closer to each other, the closer to the output.

Cosine Distance for Compressed Activations

Compressing the activations of the layer act5_block3 with a PCA and computing the same histogram as above (for different compression levels).

pca_histograms
Fig.: The layer output vector size is 4096. Here we compare the cosine similarity of a randomly chosen sample from the dataset with all other vectors after applying a PCA reducing to the named dimensionality (4096, 2048, 1024, ..., 8). In can be seen that the similarity of samples of the same class still holds for compression levels up down to 32.

Machine Learning
Infrastructure
- Google Cloud Platform Analysis
- Training Pipeline
Challenge
- Submissions
Misc
- Repository Conventions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation Analysis

Baseline ResNet

Layer Overview

ALP ResNet

Layer Overview

Compression Effects

Cosine Similarity

Cosine Distance for Compressed Activations

Clone this wiki locally