Once I was curious to see what words can have hidden. If there is any patterns or just to have some statistics to train my skills.
Here is my ongoing project which is all about words and letters.
I am currently working with few txt files in both polish and english to have some differences.
After few experiments with collecting letter occurances, most recursive letters, vowels, consonants etc. I have come to a conclusion that there is a pattern behind it all. Just take a look at this charts comparing various files. The overall shapes are similar and percentage amount of each letter in all the texts is mind-blowingly close.
These are few statistics that I gathered so far.
English Words | Polish Words | Hamlet-EN | Hamlet-PL | |
---|---|---|---|---|
Total Letters | 3494707 | 36090123 | 122288 | 144927 |
Vowels | 1438930 (~41.17%) | 15835894 (~43.88%) | 51284 (~41.94%) | 59901 (~41.33%) |
Consonants | 2055777 (~58.83%) | 20254229 (~56.12%) | 71004 (~58.06%) | 85026 (~58.67%) |
Most recursive letter | "e" with 376456 occurances (~10.77%) | "a" with 3388277 occurances (~9.39%) | "e" with 16335 occurances (~13.36%) | "a" with 12103 occurances (~8.35%) |
- Lalka Bolesława Prusa
- Volume 1 Wolne Lektury
- Volume 2 Wolne Lektury
- Pan Tadeusz Adama Mickiewicza Wolne Lektury
- Hamlet Williama Shakespeare (Polish Version) Wolne Lektury
- Hamlet Williams Shakespeare (English Version) Github/cgovella Project Gutenberg EBook
- Romeo and Juliet Github/cgovella Project Gutenberg EBook
- List of accepted polish words for word games based on rules by Słownij Języka Polskiego sjp.pl
- List of english words Github/dwyl
- Best rated polish essay Wiedza z Wami