Document

Load Data

Load the data using pandas load_data Loads in rocstories csv file and iterates over the stories

Returns a generator of stories
a generator is like a collection/list that isn’t populated until a item is requested
calling a next function on the generator gives an item if there is one left
if there are no items left, returns that it is empty

for story in load_data("~/path/to/rocstories.csv"):
    print(story)

Parse Data using Spacy

Want to load up spacy (or keep it loaded)
Want to pass each sentence into spacy
- store the parse as well.

Extract Dependency Pairs

Count Dependency pairs

Protagonist detection

Heuristic 1: first mentioned protagonist
Heuristic 2: Most frequently mentioned protagonist
Heuristic 3: break ties in 2 with 1

How to use Coreference

output looks like [(entity1, 2), (entity2, 1) ... ]
Want to store and count [(story-id, entity, VERB, relation, index)]
Want to find all instance where lose and find refer to the same entity?
- start of by counting all stories that have lose and find
  - find all story-ids where at least one verb is lose
    - find all story-ids where at least one verb is find
- count how many times they refer to the same entity within the story

PMI

P(e(w,d)) => Probability of an event-dep
- Specifically, verb is w and the dependency is d
- $P(E1) = \frac{Count(E1)}{Count(stories)}$
- $P(E1,E2) = \frac{Count(E1 \& E2)}{Count(stories)}$
So we are calling e(w,d)=E1
Equation 1: $pmi(E1,E2)=log\frac{P(E1,E2)}{P(E1)P(E2)}$
Equation 2: $P(E1, E2)=\frac{C(E1,E2)}{DENOMINATOR}$
- The denominator is supposed to be the number of all event pairs that share an entity.
- This is a little more complicated than just the number of stories

Load data and probability table

# this example is in example.py
data, probability = process_corpus("train.csv, sample=100)
print(probability.pmi("move", "dobj", "move", "dobj"))

Load Pretrained model instead:

Download model from https://github.com/mrmechko/narrative_chains/releases/download/0.0.1/all.json and save it as all.json

use the following snippet

with open("all.json") as fp:
    table = chains.ProbabilityTable(json.load(fp))

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.org		README.org
assignment.pdf		assignment.pdf
chains.py		chains.py
example.py		example.py
table.json		table.json
test.csv		test.csv
train.csv		train.csv
val.csv		val.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document

Load Data

Parse Data using Spacy

Extract Dependency Pairs

Count Dependency pairs

Protagonist detection

How to use Coreference

PMI

Load data and probability table

Load Pretrained model instead:

About

Releases 1

Packages

Languages

mrmechko/narrative_chains

Folders and files

Latest commit

History

Repository files navigation

Document

Load Data

Parse Data using Spacy

Extract Dependency Pairs

Count Dependency pairs

Protagonist detection

How to use Coreference

PMI

Load data and probability table

Load Pretrained model instead:

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages