Statistexts Project

The code contains the initial prototype for a text analyzer that gives some statistics about a given text. In the present version, the program

computes the Type-Token Ratio
computes the Lexical Diversity, defined as D.
creates a wordcloud with the most used words in the text.
creates a word-frequency plot, showing the most common words and how many times they occurred in the text.
creates a plot that shows the number of words for each possible length in the text, e.g. 10 words with 3 characters, 24 words with 5 characters and so on

Usage

First, create a directory called plots in the working directory. This is where all the plots generated by the script will be.

Next, make sure that all packages used are installed and run in the command line

python path/to/statistexts.py

The program will then ask for the name of the pdf file that will be analyzed and the name of the .txt file that will be created, from which all the information will be extracted.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
helper		helper
plots		plots
README.md		README.md
book_content.txt		book_content.txt
domCasmurro.pdf		domCasmurro.pdf
pdf_analytics_prototype.ipynb		pdf_analytics_prototype.ipynb
requirements.txt		requirements.txt
statistexts.py		statistexts.py
wordcloud.jpg		wordcloud.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistexts Project

Usage

About

Releases

Packages

Languages

hrmello/statistexts

Folders and files

Latest commit

History

Repository files navigation

Statistexts Project

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages