Skip to content

Latest commit

 

History

History

Something About Riddles and Fruit

This volumes dataset is a collection of Tweets scraped from the #SSNTrade and #SSNSignings hastags from across the SSN signing period (i.e. September 6th - October 8th). I later realised that the SSN team accounts rarely or inconsistently used these hashtags, so there is an added database of tweets across the signing period from the official SSN team accounts.

There are two versions of each dataset, a .json and .csv version. The .json file is more extensive, as this comes directly from the Python snscrape module and contains a few more columns of data - and may be more easily readable if experienced with this type of file. Note though that the .json file has multiple lines of content for each tweet, which can cause some error with basic loading approaches. Here is an except of Python code that imports the tweets in the .json file:

#Option 1
tweets = []
for line in open('somethingAboutRiddlesAndFruit.json', 'r'):
    tweets.append(json.loads(line))
    
#Option 2 (using pandas)
df = pd.read_json('somethingAboutRiddlesAndFruit.json', lines = True)

The .csv is a more generally readable file but contains a smaller amount of columns (see Data Dictionary below).

As is the general goal with this program, the idea of working with this dataset is to produce some new informative knowledge through data and/or statistical analyses, or produce a neat data visualisation to communicate your findings. This is a slightly different dataset to existing volumes, and opens up opportunities for both quantitative (e.g. like/reply/retweet counts) and qualitative (e.g. common terms, phrases) analyses.

Get the Data

The dataset — 'somethingAboutRiddlesAndFruit.csv' — can be viewed online by clicking the link above. To download the data, you can clone this repository or directly download the file by:

  • Clicking the link above to view the dataset
  • Right click on the 'Raw' button and 'Save link as...'
  • Save the file wherever you like. It may be necessary to change the file extension from '.txt' to '.csv' for easier use

Data Dictionary

somethingAboutRiddlesAndFruit.csv

The somethingAboutRiddlesAndFruit.csv file contains the scraped tweets from the #SSNTrade and #SSNSignings hashtags over the signing period.

Variable Data Type Description
url string URL link to tweet
date dateTime Date and timestamp of tweet (likely in local Tweeters time)
content string Content of the tweet
renderedContent string Content of the tweet - but perhaps includes some additional rendered elements (e.g. icons)
id numeric Unique ID number of tweet
replyCount numeric Number of replies to tweet (as of October 25th)
retweetCount numeric Number of general retweets of tweet (as of October 25th)
likeCount numeric Number of likes for tweet (as of October 25th)
quoteCount numeric Number of quotes of tweet (as of October 25th)
conversationID numeric Unique ID number of conversation (seems to be the same as 'id')
lang string Listed language of tweet (e.g. 'en' = English)
outlinks list List of weblinks included in tweet
hashtags list List of hashtags used in tweet
username string Username of tweeter (note this does not include the @ symbol at beginning)

somethingAboutRiddlesAndFruit_teamTweets.csv

The somethingAboutRiddlesAndFruit_teamTweets.csv contains tweets from the official SSN team accounts (irrespective of hashtag used) over the signing period. The tabulated variables above are also relevant to this file.