werika

This is a language recognition system for Wixarika(huichol) indigenous language of Mexico. It can extract sentences with a minimum of three words inside an unknown text sorces. In theory you may use this script also for other languages.

Usage

Identification This script extracts compleate sentences of wixarika from a text file. The file can be written in many languages, and the sentences in wixarika dont't need to be larger than 3 words to be classified.

python3 idtexto.py [filename.txt]

Training

To train the language model the corpus must be stored as nuevo.txt

pyhton3 wixanlp.py

You can also create a index of words that can be confused with wixarika.

python3 confgen [filename2.txt]

Licence

GPL v3+

Thanks

Thanks to Prof. Ivan Vladmimir Meza http://turing.iimas.unam.mx/~ivanvladimir/, and Prof. Carlos Barron Romero http://academicos.azc.uam.mx/cbr/ for their invaluable help with this work. My homepage: http://www.eenube.com

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
common.pickle		common.pickle
confgen.py		confgen.py
confusion.pickle		confusion.pickle
example.txt		example.txt
idtexto.py		idtexto.py
lm.pickle		lm.pickle
wixanlp.py		wixanlp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

werika

Usage

Licence

Thanks

About

Releases

Packages

Languages

License

pywirrarika/werika

Folders and files

Latest commit

History

Repository files navigation

werika

Usage

Licence

Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages