This repository contains PyTorch Implementation of CIKM 2022 research-track oral paper:
- OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction paper.
Notes: This repository is under debugging. We do not guarantee the reproducibility of our result on current version of code. We are actively debugging the reproducibility issue. Please check our code later.
You can prepare the Criteo data in the following format. Avazu and KDD12 datasets can be preprocessed by calling its own python file.
python datatransform/criteo2tf.py --store_stat --stats PATH_TO_STORE_STATS
--dataset RAW_DATASET_FILE --record PATH_TO_PROCESSED_DATASET \
--threshold 2 --ratio 0.8 0.1 0.1 \
Then you can find a stats
folder under the PATH_TO_STORE_STATS
folder and your processed files in the tfrecord format under the PATH_TO_PROCESSED_DATASET
folder. You should update line 181-190 in train.py
and line 200-209 in evo.py
corresponding.
Running OptEmbed requires the following three phases. First is supernet training:
python supernet.py --gpu 0 --dataset $YOUR_DATASET --model $YOUR_MODEL \
--batch_size 2048 --epoch 30 --latent_dim 64 \
--mlp_dims [1024, 512, 256] --mlp_dropout 0.0 \
--optimizer adam --lr $LR --wd $WD \
--t_lr $LR_T --alpha $ALPHA \
Second is evolutionary search:
python evolution.py --gpu 0 --dataset $YOUR_DATASET --model $YOUR_MODEL \
--batch_size 2048 --epoch 30 --latent_dim 64 \
--mlp_dims [1024, 512, 256] --mlp_dropout 0.0 \
--keep_num 0 --mutation_num 10 \
--crossover_num 10 --m_prob 0.1 \
Third is retraining:
python retrain.py --gpu 0 --dataset $YOUR_DATASET --model $YOUR_MODEL \
--batch_size 2048 --epoch 30 --latent_dim 64 \
--mlp_dims [1024, 512, 256] --mlp_dropout 0.0 \
--optimizer adam --lr $LR --wd $WD \
Notes: Due to the sensitivity of OptEmbed, we do not guarantee that the following hyper-parameters will be 100% optimal in your own preprocessed dataset. Kindly tune the hyper-parameters a little bit. If you encounter any problems regarding hyper-parameter tuning, you are welcomed to contact the first author directly.
Here we list all the hyper-parameters we used in the supernet training stage for each model in the following table.
Model\Dataset | Criteo | Avazu | KDD12 |
---|---|---|---|
DeepFM |
lr=3e-5, l2=1e-3, lrt=1e-4, |
lr=3e-4, l2=1e-5, lrt=1e-4, |
lr=3e-5, l2=1e-5, lrt=1e-4, |
DCN |
lr=3e-4, l2=1e-5, lrt=1e-4, |
lr=1e-4, l2=3e-5, lrt=1e-4, |
lr=1e-5, l2=1e-6, lrt=1e-4, |
FNN |
lr=3e-4, l2=1e-5, lrt=1e-4, |
lr=1e-4, l2=3e-5, lrt=1e-4, |
lr=1e-5, l2=1e-6, lrt=1e-4, |
IPNN |
lr=3e-4, l2=1e-5, lrt=3e-5, |
lr=1e-4, l2=3e-5, lrt=1e-4, |
lr=1e-5, l2=1e-6, lrt=1e-4, |
The following procedure describes how we determine these hyper-parameters:
First, we determine the hyper-parameters of the basic models by grid search: learning ratio and l2 regularization. We select the optimal learning ratio lr from {1e-3, 3e-4, 1e-4, 3e-5, 1e-5} and l2 regularization from {1e-3, 3e-4, 1e-4, 3e-5, 1e-5, 3e-6, 1e-6}. Adam optimizer and Xavier initialization are adopted. We empirically set the batch size to be 2048, embedding dimension to be 64, MLP structure to be [1024, 512, 256].
Second, we tune the hyper-parameters introduced by the OptEmbed method: learning ratio for threshold lrt, threshold regularization
For the evolutionary search stage, we adopt the same hyper-parameters from previous work\cite{One-shot}. For all experiments, mutation number $nm = 10$, crossover number $nc = 10$, max iteration $T = 30$, mutation probability $prob = 0.1$ and $k =15$.
For the retraining stage, we adopt the same learning ratio lr and l2 regularization from the supernet training stage.
Kindly cite our paper using the following bibliography:
@inproceedings{OptEmbed,
author = {Fuyuan Lyu and
Xing Tang and
Hong Zhu and
Huifeng Guo and
Yingxue Zhang and
Ruiming Tang and
Xue Liu},
title = {OptEmbed: Learning Optimal Embedding Table for Click-through Rate
Prediction},
booktitle = {Proceedings of the 31st {ACM} International Conference on Information
{\&} Knowledge Management},
pages = {1399--1409},
address = {Atlanta, GA, USA},
publisher = {{ACM}},
year = {2022},
url = {https://doi.org/10.1145/3511808.3557411},
doi = {10.1145/3511808.3557411},
timestamp = {Mon, 26 Jun 2023 20:40:13 +0200},
biburl = {https://dblp.org/rec/conf/cikm/Lyu0ZG0TL22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}