The model architecture is shown as below:
The attention heatmap is shown as below:
Explanation of the heatmap: The atten1 is the output of the first branch of the network, which is used to zoom in the lesion of the second branch. The input1 is the original image. The input2 in the image that zoomed in by the heatmap of attn1. We can clearly see that the lesion part is zoomed in. The following mid1, mid2, mid3 is the output of the non-local block of the first branch. We use the att1 to zoom them in and we get mid1_zoom, mid2_zoom, mid3_zoom and concat them into the second branch respectively. We can get a better result in the second branch and further improve the accuracy.
The handsome Kris Wu before distortion:
The handsome Kris Wu after distortion by the attention map:
The tensorflow_sample.py file contains this demo! The attention map is simply set to be 1 in the middle and surrounded by 0. So it will zoom in the center.