Game Of Thrones full dataset from fandom wiki.

Entity graph created by scraping GOTWiki- https://gameofthrones.fandom.com/wiki

Nodes- ['Organization', 'Person', 'Event', 'Episode', 'Animal', 'Location', 'HistoriesNLore', 'Weapon', 'House', 'PersonType', 'Religion', 'Season']

Relationships- ['SeenOrMentioned', 'Membership', 'Religion', 'Center', 'Location', 'Clergy', 'Allegiance', 'Leader', 'Founder', 'Predecessor', 'Death', 'Culture', 'Conflict', 'Place', 'Outcome', 'AssociatedLocation', 'Father', 'Mother', 'Spouse', 'Siblings', 'Battles', 'Rulers', 'Narratedby', 'Lovers', 'Successor', 'Children', 'Maker', 'Owner', 'Lord', 'Capital', 'Cities', 'Towns', 'Castles', 'Species', 'Range', 'Ruler', 'Population', 'Heir', 'Ancestralweapon', 'PlacesofNote', 'Formerly', 'Placesofnote', 'Military', 'Institutions', 'Villages', 'Placeoforigin', 'Formedfrom', 'Cadetbranches', 'Militarystrength', 'Premiere', 'Finale']

I have written a introductory blog about web scraping - https://codefringo.wordpress.com/2018/10/22/webcralwer-in-python/

Important files-

spiders/GotTGraphSpider.py is the main spider used to scrape fandom wiki
DataProcessor/ScrapyOutputProcessing.ipynb is the jupyter notebook that processes the scrapedOutput and generates tabular data for entities and creates graph in neo4j instance.

Run the whole project-

Run command- "scrapy crawl GotGraphSpider -o Data/ScrapedData.json"
Execute the jupyter notebook - DataProcessor/ScrapyOutputProcessing.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Game Of Thrones full dataset from fandom wiki.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Game Of Thrones full dataset from fandom wiki.