Tokenization problem #3

mcavdar · 2018-03-01T08:12:17Z

Spacy models should be modified according medical corpus. For example:
tokens['train'][0:10]: [['EMEA', '/', 'H', '/', 'C', '/', '551', 'PRIALT']...

The text was updated successfully, but these errors were encountered:

mcavdar · 2018-03-10T20:06:22Z

A Comparison of 13 Tokenizers on MEDLINE
An Analysis of Biomedical Tokenization: Problems and Strategies
Tokenization for Biomedical Text

mcavdar · 2018-03-10T21:13:47Z

The Quaero French Medical Corpus: A Ressource for Medical Entity Recognition and Normalization:
"Note that due to time constraints, the documents were supplied to the human annotators without prior tokenization."

Etiquetage morpho-syntaxique en domaine de spécialité: le domaine médical:
"Le tokenizer utilisé lors de la première phase d’annotation est un outil maison suivant une segmentation proche de celui du FTB. Il n’est pas adapté au domaine médical, il a été important de modifier la segmentation manuellement afin d’obtenir une annotation morpho-syntaxique de qualité optimale." "Ces observations indiquent qu’avec un tokeniseur adapté, l’utilisation d’une pré-annotation permet un gain de temps significatif"

mcavdar · 2018-03-10T22:32:22Z

How do I create a custom tokenizer - spaCy

mcavdar · 2018-05-10T12:39:32Z

https://twitter.com/MKrallinger/status/994409338860027904

mcavdar added the bug Something isn't working label Mar 1, 2018

mcavdar self-assigned this Mar 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenization problem #3

Tokenization problem #3

mcavdar commented Mar 1, 2018

mcavdar commented Mar 10, 2018 •

edited

Loading

mcavdar commented Mar 10, 2018 •

edited

Loading

mcavdar commented Mar 10, 2018

mcavdar commented May 10, 2018

Tokenization problem #3

Tokenization problem #3

Comments

mcavdar commented Mar 1, 2018

mcavdar commented Mar 10, 2018 • edited Loading

mcavdar commented Mar 10, 2018 • edited Loading

mcavdar commented Mar 10, 2018

mcavdar commented May 10, 2018

mcavdar commented Mar 10, 2018 •

edited

Loading

mcavdar commented Mar 10, 2018 •

edited

Loading