Claim-to-Knowledge-Graph-Framework

In this research project, we built the claim news headlines dataset. We also proposed a methodology from claim detection to a Knowledge graph (KG) construction framework.

Dataset:

Dataset is saved in dataset folder with name "Claim News Headline Dataset". This dataset is generated from news headlines of ARY and Express Tribune News Website. The dataset size is 5200 claims headlines and 52 are non-claims.

Methodology

The input dataset is news headlines. Then, claim classification is performed as a binary task on headlines. The claim headlines are passed to the OpenIE system for triple generation. These triples are filtered and linked to DBpedia through entity linking. The final triples are stored in the knowledge graph for downstream tasks. The methodology steps are:

Claim Classification
OpenIE triples extraction
Triple Filtering
Entity Linking
KG Construction

Claim Classification

We use the five algorithms: SVM, Gaussian Naive Bayes, Logistic Regression, decision tree, and Ada Boost classifier. We combine TF-IDF features with numerical features (headline length, no. of nouns, no. of verbs) and use the combined features in machine learning classification models.

OpenIE triples extraction

Baseline Model is Dependency Parsing and Three Deep Learning Models are:

OpenIE6: https://github.com/dair-iitd/openie6
IMOJIE: https://github.com/dair-iitd/imojie
Gen2OIE: https://github.com/dair-iitd/moie

Triple Filtering

First, we created a lexicon of the most frequent 100 nouns from the claim dataset. Then we extracted noun phrases from triples arguments. We match the lexicon with filtered triples. If there is a match between the lexicon noun and an arguments noun phrase, we keep the triple; otherwise, we discard it.

Entity Linking

We link the filtered triples with the DBpedia knowledge base for entity disambiguation. We use the Falcon tool for this purpose.

KG Construction

The linked triples were saved with DBpedia URIs. We use the Neo4j database for triples storage.

Results of ML Algorithms on claim classification on the News Headline built corpus: (Accuracy)

SVM: 96%
Logistic Regression: 95.2%
Naive Bayes: 93%
AdaBoost: 95.3%
Decision Tree: 95.1 %

Results of OpenIE models on the News Headline built corpus:

Baseline Model - Dependency Parsing: 39.7 % F1
OpenIE6: 65.9 % F1
IMOJIE: 54.4 % F1
Gen2OIE: 62.2 % F1

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Github		Github
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claim-to-Knowledge-Graph-Framework

Dataset:

Methodology

Claim Classification

OpenIE triples extraction

Triple Filtering

Entity Linking

KG Construction

Results of ML Algorithms on claim classification on the News Headline built corpus: (Accuracy)

Results of OpenIE models on the News Headline built corpus:

About

Releases

Packages

Languages

FizaGulzarHussain/Claim-to-Knowledge-Graph-Framework

Folders and files

Latest commit

History

Repository files navigation

Claim-to-Knowledge-Graph-Framework

Dataset:

Methodology

Claim Classification

OpenIE triples extraction

Triple Filtering

Entity Linking

KG Construction

Results of ML Algorithms on claim classification on the News Headline built corpus: (Accuracy)

Results of OpenIE models on the News Headline built corpus:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages