-
Notifications
You must be signed in to change notification settings - Fork 1
Activation Analysis
This wiki page contains information about the analysis of activations of our models.
# | Name | Shape | Size | Description |
---|---|---|---|---|
0 | act0_raw_input |
64x64x3 | 12,288 | |
1 | act1_processed_imgs |
64x64x3 | 12,288 | |
2 | act2_first_conv |
64x64x64 | 262,144 | |
3 | act3_block1 |
64x64x64 | 262,144 | |
4 | act4_block2 |
32x32x128 | 131,072 | |
5 | act5_block3 |
16x16x256 | 65,536 | |
6 | act6_block4 |
8x8x512 | 32,768 | |
7 | act7_block4_postact |
8x8x512 | 32,768 | |
8 | act8_global_avg |
512 | 512 | |
9 | act9_logits |
200 | 200 |
# | Name | Shape | Size | Description |
---|---|---|---|---|
1 | act1_input |
64x64x3 | 12,288 | Input images |
2 | act2_first_conv |
16x16x64 | 16,384 | |
3 | act3_block1 |
8x8x256 | 16,384 | |
4 | act4_block2 |
4x4x512 | 8,192 | |
5 | act5_block3 |
2x2x1024 | 4,096 | |
6 | act6_block4 |
2x2x2048 | 8,192 | |
7 | act7_norm |
2x2x2048 | 8,192 | |
8 | act8_pool |
200 | Logits |
Compression of the activations after layer xxx
(1) disabled, (2) with a random matrix, (3) a matrix computed using PCA. The PCA was computed on the training data.
Preceeding layer | Axis | Matrix shape | Mode | Validation accuracy [%] |
---|---|---|---|---|
- | - | - | (1) | 72.04 |
act2_first_conv |
Channels | 64x64 | (2) | 72.04 |
act2_first_conv |
Channels | 64x63 | (2) | 72.60 |
act2_first_conv |
Channels | 64x62 | (2) | 61.68 |
act2_first_conv |
Channels | 64x61 | (2) | 61.72 |
act2_first_conv |
Channels | 64x64 | (3) | 72.04 |
act2_first_conv |
Channels | 64x63 | (3) | 60.88 |
act5_block3 |
All | 4096x4096 | (3) | 72.04 |
act5_block3 |
All | 4096x3968 | (3) |
|
act5_block3 |
All | 4096x3840 | (3) |
|
act5_block3 |
All | 4096x3584 | (3) |
|
act5_block3 |
All | 4096x3328 | (3) | 67.64 |
act5_block3 |
All | 4096x3072 | (3) | 65.87 |
act5_block3 |
All | 4096x2048 | (3) |
|
act5_block3 |
All | 4096x1024 | (3) | 7.12 |
act4_block2 |
All | 8192x7680 | (3) | 70.08 |
act4_block2 |
All | 8192x7168 | (3) | 67.23 |
act4_block2 |
All | 8192x4096 | (3) | 21.51 |
act4_block2 |
All | 8192x2048 | (3) | 2.31 |
act4_block2 |
All | 8192x1024 | (3) | 0.75 |
Explanations: Italic accuracies are expected to match the baseline (1).
Fig.: Plot of the table above. As can be seen, the earlier layer (closer to the input) is more sensitive to PCA compression.
Tiny ImageNet contains samples labeled with 200 different classes. We store the activations induced by 74,246 correctly classified training samples and compute the cosine similarity between those vectors and a single activation chosen from this matrix.
Fig.: Activations taken from the layer act5_block3
in this commit. The unimodal distribution shows where the cosine similarities are in general. The red histogram shows the similarity of labels of the same class as the sample that is being compared to the activations. It can be seen that samples of the same class tend to induce activation vectors with higher cosine similarities among each other.
Fig.: Zoomed version of the histogram above.
We did the same type of analysis for all layers (with 20,000 training samples, minus misclassified ones). The resulting histograms can be seen in the following figure:
Fig.: The histograms clearly show the gain of "classification information" with increasing depth. Samples of the same class tend to be closer to each other, the closer to the output.
Compressing the activations of the layer act5_block3
with a PCA and computing the same histogram as above (for different compression levels).
Fig.: The layer output vector size is 4096. Here we compare the cosine similarity of a randomly chosen sample from the dataset with all other vectors after applying a PCA reducing to the named dimensionality (4096, 2048, 1024, ..., 8). In can be seen that the similarity of samples of the same class still holds for compression levels up down to 32.
- Machine Learning
- Infrastructure
- Challenge
- Misc