contributors: @GitYCC
TR;DR
- design advantages:
- it does not need any pre-training
- it learns both high- and low-order feature interactions
- it introduces a sharing strategy of feature embedding to avoid feature engineering
- performance:
- DeepFM outperforms the state-of- the-art models in terms of AUC and Logloss on both datasets (Criteo Dataset and Company Dataset)
- The efficiency of DeepFM is comparable to the most effi- cient deep model in the state-of-the-art
Introduction
- prediction target: click-through rate (CTR)
- feature interactions
- download apps for food delivery at meal-time, suggesting that the (order-2) interaction between app category and time-stamp
- male teenagers like shooting games and RPG games, which means that the (order-3) interaction of app category, user gender and age
Approach
(Weight-1 Connection: point-wise operation to keep embedding vector format and information)
-
denote
- size of the features:
$M$ - size of the feature fields:
$F$ - size of the feature embedding:
$K$
- size of the features:
-
FM Component: first order
- obtain
${w_m x_m}$ terms
- obtain
-
FM Component: second order
- obtain
${x_ix_j(V_i\odot V_j)}$ ($i\neq j$ ;$i,j=1,...,M$ ) terms
- obtain
-
Deep Component
$a^{(l+1)}=\sigma(W^{(l)}a^{(l)}+b^{(l)})$
-
DeepFM: FM Component: first order + second order & Deep Component
- output:
$sigmoid(concat[{w_m x_m}, {x_ix_j(V_i\odot V_j)},a^{(L)}])$
- output:
-
benefits:
- it learns both low- and high-order fea- ture interactions from raw features
- there is no need for ex- pertise feature engineering of the input, as required in Wide & Deep
Relationship with the other Neural Networks
Experiments
- datasets:
- Criteo Dataset
- Company Dataset
- Model Comparison: We compare 9 models in our experiments: LR, FM, FNN, PNN (three variants), Wide & Deep, and DeepFM.