Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kbatsuren authored Apr 23, 2024
1 parent 9efa0c6 commit 2fa803f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

umLabeller is an inspection tool for characterizing the semantic compositionality of subword tokenization,
based on the morphological information retrieved from UniMorph.
Given a word _w_ and its subword tokenization, _s_ = (_s1_, ..., _sn_) | ∀i _si_ ∈ V, umLabeller assigns one of four categories: _vocab, alien, morph, or n/a_:
Given a word _w_ and its subword tokenization, _s_ = (_s_<sub>1</sub>, ..., _s_<sub>n</sub>) | ∀i _s_<sub>i</sub> ∈ V, umLabeller assigns one of four categories: _vocab, alien, morph, or n/a_:

- **vocabulary subword**: the given word _w_ is a subword in the vocabulary as _w__V_;
- **alien composition**: the given subword sequence _s_ is an alien subword composition if we find at least two subwords _si_ and _sj_ in s that are not meaningful with respect to the meaning of _w_;
- **alien composition**: the given subword sequence _s_ is an alien subword composition if we find at least two subwords _s_<sub>i</sub> and _s_<sub>j</sub> in s that are not meaningful with respect to the meaning of _w_;
- **morphological composition**: the subword sequence _s_ is morphological if it is neither a vocabulary nor an alien subword composition;
- **n/a**: UniMorph has no information on the word.

Expand Down

0 comments on commit 2fa803f

Please sign in to comment.