Associative Domain Adaptation

Problem

Domain adaptation - task of inferring class labels for an unlabeled target domain based on statistical properties of a labeled source domain.

Key points

Authors introduce a new similarity loss for enforcing associations between same classes of source and target domains, as opposed to Maximum Mean Discrepancy that is not class specific.
Formalize the transition probability from one embedding to the another (source to target domain) by
where M_ij is the dot product of A_i and B_j. Associative similarity is then formalized as a two-step round trip probability:
Net association loss is a weighted summation of a walker loss and a visit loss :
- Walker loss is a cross-entropy loss which forces the two-step probabilities to be similar to the uniform distribution of class labels:
- Visit loss is used as a regularizer to ensure that each target sample is visited with equal probability. It's a cross entropy loss between uniform distribution of target samples and probability of visiting some target sample from any source sample:
This association loss could then be used with any other network for domain adaptation.
While training, authors ensure that mini-batches have examples from all classes.
Network is initially optimized on classification loss only and the weight for association loss is gradually increased.

Results

Show state-of-the-art results on various small datasets.
Show that reducing association loss also reduces the MMD, but lower MMD does not imply lower test errors. The relation holds true for the proposed loss though.
t-SNE embeddings show that class clustering is better using this technique.

Notes

A plug-and-play non-trivial technique that can be used effectively on various classification tasks.
Can the approach be generalized to non-classification tasks?
Should report results on more complex datasets.

Provide feedback