Home > lextools
from pelitk import conc
Produce a concordance of a selected word or (word,POS) tuple. Combines three sub functions get_node
, flatten
, and prettify
(optional).
tok_text
: tokenized text (list of strings) | Example:['The', 'key', 'word', 'in', 'this', 'text', 'is', 'the', 'noun', 'platypus', '.', 'I', 'want', 'to', 'see', 'the', 'cotext', 'every', 'time', 'the', 'word', 'platypus', 'occurs', '.']
OR
tokenized text with POS tags (list of tuples) | Example:[('The', 'DT'), ('key', 'JJ'), ('word', 'NN'), ('in', 'IN'), ('this', 'DT'), ('text', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('noun', 'JJ'), ('platypus', 'NN'), ('.', '.'), ('I', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('see', 'VB'), ('the', 'DT'), ('cotext', 'NN'), ('every', 'DT'), ('time', 'NN'), ('the', 'DT'), ('word', 'NN'), ('platypus', 'NN'), ('occurs', 'VBZ'), ('.', '.')]
node
: node word or tuple | Example:'platypus'
num
: size of the collocation span, i.e. how many words on either side of the node | Example:5
pos
(optional, defaults toFalse
): bool specifying if the tok_text is a list of tuples | Example: see abovepretty
(optional, defaults toFalse
): bool specifying if the output should be formatted with all the node words aligned in the concordance and each concordance joined into a single string | Example: see below
Concordance list for the specified node word
tok_text = ['The', 'key', 'word', 'in', 'this', 'text', 'is', 'the', 'noun', 'platypus', '.', 'I', 'want', 'to', 'see', 'the', 'cotext', 'every', 'time', 'the', 'word', 'platypus', 'occurs', '.']
conc.concordance(tok_text,'platypus',5)
[('this text is the noun', 'platypus', '. I want to see'),
('cotext every time the word', 'platypus', 'occurs . ')]
conc.concordance(tok_text,'platypus',5,pretty=True)
[' this text is the noun platypus . I want to see ',
' cotext every time the word platypus occurs . ']
tokPOS_text = [('The', 'DT'), ('key', 'JJ'), ('word', 'NN'), ('in', 'IN'), ('this', 'DT'), ('text', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('noun', 'JJ'), ('platypus', 'NN'), ('.', '.'), ('I', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('see', 'VB'), ('the', 'DT'), ('cotext', 'NN'), ('every', 'DT'), ('time', 'NN'), ('the', 'DT'), ('word', 'NN'), ('platypus', 'NN'), ('occurs', 'VBZ'), ('.', '.')]
conc.concordance(tokPOS_text,'platypus',5,pos=True,pretty=False)
[('this text is the noun', 'platypus', '. I want to see'),
('cotext every time the word', 'platypus', 'occurs . ')]
conc.concordance(tokPOS_text,'platypus',5,pos=True,pretty=True)
[' this text is the noun platypus . I want to see ',
' cotext every time the word platypus occurs . ']
Print concordance output (pretty=True)
to a csv file.
to_print = conc.concordance(tok_text,'platypus',5,pretty=True)
with open('concordance.csv', 'w', ) as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
for conc in to_print:
wr.writerow([conc])
Note: To align all the node words in each row, change the font to Consolas