Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kbatsuren authored Apr 23, 2024
1 parent 2fa803f commit 1e62530
Showing 1 changed file with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@ Given a word _w_ and its subword tokenization, _s_ = (_s_<sub>1</sub>, ..., _s_<

umLabeller can characterize over half a million English words and is compatible with most modern tokenizers.

## Examples

| input word | subword tokenization | output label |
|----------------|-------------------------|-----------------|
| jogging | _j ogging | alien |
| neutralised | _neutral ised | morph |
| stepstones | _steps tones | alien |
| swappiness | _sw appiness | alien |
| swappiness | _swap pi ness | morph |
| jogging | _jogging | vocab |


## Installation

To install from the source, please use the following commands:
Expand Down Expand Up @@ -43,16 +55,6 @@ Output:
```
alien
```
## Examples

| input word | subword tokenization | output label |
|----------------|-------------------------|-----------------|
| jogging | _j ogging | alien |
| neutralised | _neutral ised | morph |
| stepstones | _steps tones | alien |
| swappiness | _sw appiness | alien |
| swappiness | _swap pi ness | morph |
| jogging | _jogging | vocab |

## References
More details can be read in the following article:
Expand Down

0 comments on commit 1e62530

Please sign in to comment.