Letter Statistics

Once I was curious to see what words can have hidden. If there is any patterns or just to have some statistics to train my skills.

Here is my ongoing project which is all about words and letters.

I am currently working with few txt files in both polish and english to have some differences.

Experiments

After few experiments with collecting letter occurances, most recursive letters, vowels, consonants etc. I have come to a conclusion that there is a pattern behind it all. Just take a look at this charts comparing various files. The overall shapes are similar and percentage amount of each letter in all the texts is mind-blowingly close.

Statistics

These are few statistics that I gathered so far.

	English Words	Polish Words	Hamlet-EN	Hamlet-PL
Total Letters	3494707	36090123	122288	144927
Vowels	1438930 (~41.17%)	15835894 (~43.88%)	51284 (~41.94%)	59901 (~41.33%)
Consonants	2055777 (~58.83%)	20254229 (~56.12%)	71004 (~58.06%)	85026 (~58.67%)
Most recursive letter	"e" with 376456 occurances (~10.77%)	"a" with 3388277 occurances (~9.39%)	"e" with 16335 occurances (~13.36%)	"a" with 12103 occurances (~8.35%)

Sources of txt files:

Lalka Bolesława Prusa
- Volume 1 Wolne Lektury
- Volume 2 Wolne Lektury
Pan Tadeusz Adama Mickiewicza Wolne Lektury
Hamlet Williama Shakespeare (Polish Version) Wolne Lektury
Hamlet Williams Shakespeare (English Version) Github/cgovella Project Gutenberg EBook
Romeo and Juliet Github/cgovella Project Gutenberg EBook
List of accepted polish words for word games based on rules by Słownij Języka Polskiego sjp.pl
List of english words Github/dwyl
Best rated polish essay Wiedza z Wami

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
charts		charts
text_sources		text_sources
text_statistics		text_statistics
LICENSE		LICENSE
README.md		README.md
slowa_statystyki.py		slowa_statystyki.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Letter Statistics

Experiments

Statistics

Sources of txt files:

About

Releases

Packages

Languages

License

goralczm/letters_statistics

Folders and files

Latest commit

History

Repository files navigation

Letter Statistics

Experiments

Statistics

Sources of txt files:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages