Skip to content

celikalp/postagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

postagger

Part-of-speech tagger using Hidden Markov Model

The command for training (and tuning) the POS tagger is:

java build_tagger sents.train sents.devt model_file

The file model_file contains the statistics gathered from the training (and tuning) process, which include the POS tag transition probabilities and the word emission probabilities (and other tuned parameters).

The test file consists of a list of sentences (without POS tags), one sentence per line. A sample test file is provided (sents.test). The command to test on this test file and generate an output file is:

java run_tagger sents.test model_file sents.out

The output file has the same format as the POS-tagged training file. A sample output file is also provided (sents.out).

Dataset
Penn Treebank tag set is used.
sents.train -> A training set of POS-tagged sentences
sents.devt -> A separate development set of sentences for tuning the POS tagger

About

Part-of-speech tagger using Hidden Markov Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages