NumericalPrediction

Implementations of "Introducing Semantic Information for Numerical Attribute Prediction over Knowledge Graphs", ISWC 2022.

Overview

In this paper, we focus on the problem of numerical attribute prediction over knowledge graphs. That is to predict missing numerical attribute values given a group of relational facts and numerical attributive facts. Upon existing graph-based methods, we propose several novel semantic methods as well as effective combination strategies. Experiments on two benchmarks have shown a huge performance improvement.

All the source code, datasets and results can be found in this repository.

Data

Two datasets (FB15K, YAGO15K) are used from MMKG, and the numerical triples are devided into a 80/10/10 split of train/valid/test. The statistics are shown below.

	#Ent	#Rel	#Rel_fact	#Attr	#Attr_fact	#Train	#Valid	#Test
FB15K	14,951	1,345	592,213	116	29,395	23,516	2,939	2,940
YAGO15K	15,404	32	122,886	7	23,532	18,825	2,353	2,354

All the data and the preprocessing results are in ./data/.

Methods

Traditional graph-based methods typically ignore the semantics behind numerical values and are usually incapable of handling unseen and isolated entities. We believe different types of language models, such as bert, have captured and stored rich knowledge during the large-scale pre-training processes and are promising to predict missing numerics.

We provide several novel strategies to capture the implicit knowledge behind pre-trained language models for numerical attribute prediction over KGs, which we call MLM and PLM-reg. Effective combination strategies are also proposed to make full use of both structural and semantic information, where base models are automatically selected for different prediction targets to achieve the best performance.

Results

Main results of different methods are listed below. For each dataset, the three blocks top to bottom contain graph-based, semantic-based and combination methods respectively. Best results in each block are underlined and the best ones of all methods are in boldface. Text in parentheses behind PLM-reg indicates the type of inputs to PLMs.

These results can demonstrate that the semantic-based methods are quite promising to predict numerical attributes over KGs and effective combination strategies making use of both structural and semantic knowledge can significantly improve the performances.

All the detailed result files are in ./results/.

Code

Entry code of the proposed methods are:

GLOBAL/LOCAL/MRAP: graphStructure.py, referring to the implementation of MrAP
KGE-reg & PLM-reg: regression.py
MLM: mlm.py
Combination: ensemble.py

Some auxiliary files:

utils.py: data fetching and performance calculation functions for all methods.
modules.py: base modules for ensemble.py.
MrAP: graph models for graphStructure.py.
helpers: auxiliary files including SameAs links between FB15K and YAGO15K, paraphrase templates, and multilingual description texts. Preprocessing functions for different methods are also include in this directory.

We also conduct several other attempts, such as another paradigm to use pre-trained language models named prompt, fine-grained combination strategies on embeddings and so on. These are regarded as future work and not reflected in the paper. We make the code and results temporarily stored in folder ./attempts/.

Pre-trained Models

Pre-trained knowledge graph embedding models are from LibKGE, and we adopt four popular embedding techniques in link prediction in our ablation study, namely, TransE, RESCAL, ComplEx, and RotatE for FB15K. And YAGO15K entities are mapped by the SameAs links.
Pre-trained language models are from Transformers, and we adopt bert-base/large-uncased, roberta-base/large, xlm-roberta-base/large, and numBert in the ablation study.
We publish our fine-tuned language models with known attributes here, and the code is in ./helpers/forMLM.py.

All the models can be downloaded from here to replace the ./pretrainedModels folder.

Citation

If you find this work is useful in your research, please consider citing:

@inproceedings{xue2022introducing,
  title={Introducing Semantic Information for Numerical Attribute Prediction over Knowledge Graphs},
  author={Xue, bingcong, and Li, Yanzeng and Zou, Lei},
  booktitle={The 21st International Semantic Web Conference},
  pages={0--0},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NumericalPrediction

Overview

Data

Methods

Results

Code

Pre-trained Models

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
MrAP		MrAP
assets		assets
attempts		attempts
data		data
helpers		helpers
pretrainedModels		pretrainedModels
results		results
README.md		README.md
ensemble.py		ensemble.py
graphStructure.py		graphStructure.py
mlm.py		mlm.py
modules.py		modules.py
regression.py		regression.py
utils.py		utils.py

xbc0112/NumericalPrediction

Folders and files

Latest commit

History

Repository files navigation

NumericalPrediction

Overview

Data

Methods

Results

Code

Pre-trained Models

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages