Explaination of normalization and orthogonal unit vector generation in Paper "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts"
Prove
$$std(\nu)=\sqrt{\frac{1}{n}\times\sum_{i}\left(\nu_{i}-mean(\nu)\right)^{2}}$$ replace std with eq.3 in eq.1,and get$$\mu = \sqrt{n}\times\frac{\nu-mean(\nu)}{\sqrt{\sum_{i}\left(\nu_{i}-mean(\nu)\right)^{2}}}$$ the right part$\frac{\nu-mean(\nu)}{\sqrt{\sum_{i}\left(\nu_{i}-mean(\nu)\right)^{2}}}$ is one normalized vector with L2 norm. Above process is z-score normalization. Obviously,$\mu$ in eq.1 is a normalization vector with coefficient. Therefore, the final$\mu$ in eq.2 is obtained by deviding$\sqrt{n}$
m1 is a random vector, m2 is a unit vector. How to transform m1 into a vector orthogonal to m2?
process
$$ m1 = m1 - dot(m1, m2) * m2 $$
$dot(m1, m2)$ means the projection length of m1 in the m2 direction. Because m2 is a unit vector, we can know that dot(m1, m2) * m2 is one projection vector in m2 direction easily. Lastly,$m1 = m1 - dot(m1, m2) * m2$ is a vector orthogonal to m2.