Skip to content

Extract statistical information and visualizations about texts.

Notifications You must be signed in to change notification settings

hrmello/statistexts

Repository files navigation

Statistexts Project

The code contains the initial prototype for a text analyzer that gives some statistics about a given text. In the present version, the program

  • computes the Type-Token Ratio
  • computes the Lexical Diversity, defined as D.
  • creates a wordcloud with the most used words in the text.
  • creates a word-frequency plot, showing the most common words and how many times they occurred in the text.
  • creates a plot that shows the number of words for each possible length in the text, e.g. 10 words with 3 characters, 24 words with 5 characters and so on

Usage

First, create a directory called plots in the working directory. This is where all the plots generated by the script will be.

Next, make sure that all packages used are installed and run in the command line

python path/to/statistexts.py

The program will then ask for the name of the pdf file that will be analyzed and the name of the .txt file that will be created, from which all the information will be extracted.

About

Extract statistical information and visualizations about texts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published