Skip to content

marekrei/ml_nlp_paper_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML and NLP paper data

This repository contains the data crawled and processed for the post series on ML and NLP publications.

The project was created by Marek Rei (@MarekRei). The country annotation was contributed by Jonas Pfeiffer (@PfeiffJo) and Andrew Caines (@cainesap).

Conference proceedings

The papers directory contains json files for each of the crawled conferences. Take a look inside to see the available metadata.

Country annotation

annotated_orgs.tsv contains the following columns in tab-separated format:

  • id
  • org_name - the name of the organization, as crawled
  • paper_count - the number of papers that matched that name, after initial processing
  • is_org - manually annotated field, indicating whether this is an actual organization or crawling noise
  • canonical_org_name - a canonical name for this organization, to match together different versions
  • country - manually annotated country name for each organization
  • example1 - an example paper where this organization was crawled from
  • example2 - another example
  • example3 - another example

License

This dataset is made available under the CC BY-NC 4.0 license.

About

Dataset of ML and NLP papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published