Load the data using pandas
load_data
Loads in rocstories csv file and iterates over the stories
Returns a generator of stories
a generator is like a collection/list that isn’t populated until a item is requested
calling a next
function on the generator gives an item if there is one left
if there are no items left, returns that it is empty
for story in load_data ("~/path/to/rocstories.csv" ):
print (story )
Want to load up spacy (or keep it loaded)
Want to pass each sentence into spacy
Extract Dependency Pairs
Heuristic 1: first mentioned protagonist
Heuristic 2: Most frequently mentioned protagonist
Heuristic 3: break ties in 2 with 1
output looks like [(entity1, 2), (entity2, 1) ... ]
Want to store and count [(story-id, entity, VERB, relation, index)]
Want to find all instance where lose and find refer to the same entity?
start of by counting all stories that have lose and find
find all story-ids where at least one verb is lose
find all story-ids where at least one verb is find
count how many times they refer to the same entity within the story
P(e(w,d)) => Probability of an event-dep
Specifically, verb is w and the dependency is d
$P(E1) = \frac{Count(E1)}{Count(stories)}$
$P(E1,E2) = \frac{Count(E1 \& E2)}{Count(stories)}$
So we are calling e(w,d)=E1
Equation 1:
$pmi(E1,E2)=log\frac{P(E1,E2)}{P(E1)P(E2)}$
Equation 2:
$P(E1, E2)=\frac{C(E1,E2)}{DENOMINATOR}$
The denominator is supposed to be the number of all event pairs that share an entity.
This is a little more complicated than just the number of stories
Load data and probability table
# this example is in example.py
data , probability = process_corpus ("train .csv , sample = 100 )
print (probability .pmi ("move" , "dobj" , "move" , "dobj" ))
Load Pretrained model instead:
Download model from https://github.com/mrmechko/narrative_chains/releases/download/0.0.1/all.json and save it as all.json
use the following snippet
with open ("all.json" ) as fp :
table = chains .ProbabilityTable (json .load (fp ))