March 2020
tl;dr: Answers this question: which tasks are better transferred to others?
This paper proposed a large dataset with 4 million images, each has 26 tasks labeled as GT. This work directly inspired task grouping, which answers a different question of how to perform multitask learning more efficiently.
The main purpose of transfer learning is to reduce the number of images needed for training the task--they focus on supervision efficiency. Given enough images, trained from scratch is also viable, per rethinking ImageNet pretraining.
- A fully computational approach to reveal the relationships between tasks. Previously the relationship is based on human intuition or analytical knowledge.
- Depth -> normals are easy for humans, but the opposite is true for NN.
- Four main steps
- Task specific modeling
- Transfer Modleing (frozen encoder, trained shallow transfer+decoder)
- Normalization with AHP (analytical hierarchical process): only assumes monotonocity and keeps the ordinal order
- Taxonomy extraction with BIP (boolean integer programming) with constraints
- With only 2% of data, finetuning from reshading to surface normal yields good results already.
- Architecture
- The backbone is ResNet-50
- Neck is 2 conv layers (concat channels for higher-order tasks
- Decoder: 15-layer fully convolutional network, 2-3 FC layers.
- Transitive transfer is multi-hop transfers.
- Video of the presentation at CVPR
- What if I have a new task and I want to quantify which other tasks are related to this task?
- Need 2k images to find which are the best sources for it
- Use pretrained networks and use 2k images to finetune
-
Beam search
- Beam search is an efficient way to explore a graph. Beam search is not optimal (that is, there is no guarantee that it will find the best solution).
- Beam search with beam width B=∞ is essentially breadth-first search.
- Beam search with beam width B=1 is essentially greedy search.
- Powerset of any set S is the set of all subsets of S, including the empty set and S itself, variously denoted as
$\wp (S)$ .