Neural architecture search (NAS) has begun to outperform human-designed models. The paper tries to utilize the NAS to create better architecture. The architecture proposed in this paper shows consistent improvement over transformer on four well-established language translation tasks.
With NAS better feed-forward achitectures for seq2seq tasks can be designed. Tournament selection architeture search and warm start are applied with transformer, to search for better transformer architecture.
- Define a gene encoding that describes a neural network architecture
- Create initial population by randomly sampling from space of gene encoding.
- The individuals are trained and assigned fitness (in this case its negative log perplexities on WMT 14En-De validation set)
- Then the population is sampled to produce subpopulations, where highest fitness candidates are selected as parent.
- Selected parent's gene encoding is mutated to produce child models
- Then child models are assigned fitness via training and evaluating on target task just like initial population.
- After fitness evaluation, the population is sampled once again, individual in subpopulation with lowest fitness is removed from population.
- Newly evaluated child models are added to population, taking place of removed population.
- This process is repseated, and inventually it results in a population with high fitness individuals.