Home > lex
text
: Input string | Example:Hi how are you?
List of tokens from lowercased input, removing symbols and digits.
lex.re_tokenize('Hi how are you?')
['hi', 'how', 'are', 'you']
Calculate Type-Token ratio on a list of tokens.
tokens
: List of tokens | Example:['words', 'should', 'go', 'here', 'more', 'words']
Type-Token Ratio
lex.ttr(['words', 'should', 'go', 'here', 'more', 'words'])
0.8333333333333334
tokens
: Input list of tokens | Example:['Hi', 'how', 'are', 'you', 'doing', 'today']
freq_list
(optional, defaults to'NGSL'
): Specify list of 2K common types to use for AG. Options include'NGSL', 'PET', 'PELIC'
. | Example:'PET'
custom_list
(optional, defaults toNone
): Specify a custom list of common types to use for AG as a list of lemmas. | Example:['the', 'be', .....]
spellcheck
(optional, defaults toTrue
): Specify whether or not advanced types should be spell-checked usingwordnet.synsets()
. | Example:False
Calculated AG lexical diversity index: advanced types / sqrt(tokens)
tokens = lex.re_tokenize('Hi how are you doing today?')
lex.adv_guiraud(tokens)
0.4082482904638631
Implemented as described here
tokens
: Input list of tokens | Example:lex.re_tokenize('Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “ and what is the use of a book,” thought Alice, “ without pictures or conversations ?” So she was considering in her own mind, (as well as she could, for the hot day made her feel very sleepy and stupid,) whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.')
spellcheck
(optional, defaults toTrue
): Specify whether or not advanced types should be spell-checked usingwordnet.synsets()
. | Example:False
length_range
(optional, defaults to(35, 50)
): A tuple with the lower and upper bounds of random sample size for vocd. | Example:(20, 60)
num_subsamples
(optional, defaults to100
): A positive integer specifying how many times to randomly sample the text. | Example:200
num_trials
(optional, defaults to3
): Number of times to average D estimate over. | Example:10
Estimated "D" parameter for the voc-D lexical diversity index
text = 'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “ and what is the use of a book,” thought Alice, “ without pictures or conversations ?” So she was considering in her own mind, (as well as she could, for the hot day made her feel very sleepy and stupid,) whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.'
tokens = lex.re_tokenize(text)
lex.vocd(tokens)
83.757961976435737
Measure of Textual Lexical Diversity implemented as described here
tokens
: list of tokens | Example:lex.re_tokenize('Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “ and what is the use of a book,” thought Alice, “ without pictures or conversations ?” So she was considering in her own mind, (as well as she could, for the hot day made her feel very sleepy and stupid,) whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.'
spellcheck
(optional, defaults toFalse
): Specify whether or not advanced types should be spell-checked usingwordnet.synsets()
. | Example:True
factor_size
(optional, defaults to0.72
): Specify the TTR cutoff for adding to factor counts. | Example:0.6
Calculated MTLD value
text = 'Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “ and what is the use of a book,” thought Alice, “ without pictures or conversations ?” So she was considering in her own mind, (as well as she could, for the hot day made her feel very sleepy and stupid,) whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.'
tokens = lex.re_tokenize(text)
lex.mtld(text)
78.27703170721364