Name		Name	Last commit message	Last commit date
parent directory ..
catalan		catalan
README.md		README.md
tagger.py		tagger.py

README.md

A Part of Speech tagger

For this exercise you will build a POS tagger that predicts the POS tag of the next word, based exclusively on the tag of the previous word (not on the previous word itself).

The tagger

Given a word w_i−1 with tag t_i−1, predict the tag t_i of the next word w_i based exclusively on the tag of the previous word, in other words, t_i = P(t_i | t_i−1). You can estimate the probability by counting, like in today's slides from a large corpus. What is your accuracy?

You can download a PoS-tagged corpus from UD to use for the exercise. Choose whichever language you find most interesting.

Results

Run the tagger.py file providing a training file and a test file. Examples with the Catalan languages are included in catalan/. Use -f to recompute the probabilities file.

python3 tagger.py catalan/ca_ancora-ud-train.conllu catalan/ca_ancora-ud-test.conllu

An example of the result is

Total: 56171 words
Correct: 22128 PoS
Accuracy: 39.39399334175998 %

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02-pos-tagger

02-pos-tagger

README.md

A Part of Speech tagger

The tagger

Results

Files

02-pos-tagger

Directory actions

More options

Directory actions

More options

Latest commit

History

02-pos-tagger

Folders and files

parent directory

README.md

A Part of Speech tagger

The tagger

Results