You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was also hoping to get original text for the annotated files. Is that available? I would use it alongside the tokens from the NER dataset to build a tokenizer model.
There is no Universal Dependencies dataset for Bengali, which is how we build most of our tokenizers. I am under the impression that generally speaking, Bengali is tokenized by whitespace aside from the punctuation characters, but it would still be useful to make such a dataset.
Thanks!
The text was updated successfully, but these errors were encountered:
I was also hoping to get original text for the annotated files. Is that available? I would use it alongside the tokens from the NER dataset to build a tokenizer model.
There is no Universal Dependencies dataset for Bengali, which is how we build most of our tokenizers. I am under the impression that generally speaking, Bengali is tokenized by whitespace aside from the punctuation characters, but it would still be useful to make such a dataset.
Thanks!
The text was updated successfully, but these errors were encountered: