Skip to content

armoko/WikipediaDataAnalyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing article histories and authors interaction on Wikipedia using Apache Spark

  1. Download Wikipedia-Dump file enwiki-*.xml.bz2 (https://dumps.wikimedia.org/enwiki/)
  2. Convert downloaded archived XML to JSON by executing xmlparse.py script
  3. After that it could be used for executing scripts from ./spark/* to get data about articles with the most comments, the most edits by month, edits by authors and etc.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages