The source code for the DASFAA 2021 paper: Towards Entity Alignment in the Open World: An Unsupervised Approach.
- Python>=3.7 (tested on Python=3.8.10)
- Tensorflow-gpu=2.x (tested on Tensorflow-gpu=2.6.0)
- Scipy
- Numpy
- Scikit-learn
- python-Levenshtein
The original datasets are obtained from DBP15K dataset, GCN-Align and JAPE.
Take the dataset DBP15K (ZH-EN) as an example, the folder "zh_en" contains:
- ent_ids_1: ids for entities in source KG (ZH);
- ent_ids_1_trans_goo: entities in source KG (ZH) with translated names;
- ent_ids_2: ids for entities in target KG (EN);
- ref_ent_ids: entity links for testing/validation;
- sup_ent_ids: entity links for training;
- triples_1: relation triples encoded by ids in source KG (ZH);
- triples_2: relation triples encoded by ids in target KG (EN);
- zh_vectorList.json: the input entity feature matrix initialized by word vectors;
Regarding the Semantic Information, we obtain the entity name embeddings for DBP15K from RDGCN. You may also obtain from here.
Note that before running you need to place the _vectorList.json
file under the corresponding directory.
- First generate the string similarity by running
python stringsim.py --lan "fr_en"
. The dataset could be chosen fromzh_en, ja_en, fr_en
- Then run
python main.py --lan "fr_en"
- You may also directly run
bash auto.sh
Due to the instability of embedding-based methods, it is acceptable that the results fluctuate a little bit when running code repeatedly.
If you have any questions about reproduction, please feel free to email to [email protected].
If you use this model or code, please cite it as follows:
@inproceedings{DBLP:conf/dasfaa/ZengZTLLZ21,
author = {Weixin Zeng and
Xiang Zhao and
Jiuyang Tang and
Xinyi Li and
Minnan Luo and
Qinghua Zheng}
title = {Towards Entity Alignment in the Open World: An Unsupervised Approach},
booktitle = {DASFAA},
pages = {272--289},
publisher = {Springer},
year = {2021},
}