Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TICCL-unk: do something with "(maai)beheer"? #28

Open
egpbos opened this issue Jul 23, 2018 · 1 comment
Open

TICCL-unk: do something with "(maai)beheer"? #28

egpbos opened this issue Jul 23, 2018 · 1 comment
Assignees

Comments

@egpbos
Copy link

egpbos commented Jul 23, 2018

These currently end up in the .punct file, but could be counted as two separate words at the same position.

@kosloot
Copy link
Collaborator

kosloot commented Jul 30, 2018

Well, atm this is so by design.
Maybe we could add some extra heuristics to the system to handle this case.
The big danger is, that we end up including a full-blown tokenizer into TICCL-unk.
Making it language dependent too.
I leave it to @martinreynaert to decide on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants