DOC : Add example for the loading and pre-processing of corpus #16

Thibeb · 2024-02-09T09:33:18Z

Example for loading and pre-processing the QUAERO, E3C and CASM2 corpus

Each corpus is first converted to a medkit text document format using a custom function.
Then each text document is processed into a pipeline to tokenize sentences and split the doc into multiple mini docs for better futur training.

The corpus_specs function takes a list of medkit text documents and analyse it to get some metrics such as the number of original docs, sentences, entities and their repartition.

…3C and CASM2 corpus

DOC : Add example for the loading and pre-processing of the QUAERO, E…

a0bf332

…3C and CASM2 corpus

ghisvail mentioned this pull request Apr 2, 2024

Add supporting code for NER benchmark paper #35

Merged

ghisvail closed this in #35 Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC : Add example for the loading and pre-processing of corpus #16

DOC : Add example for the loading and pre-processing of corpus #16

Thibeb commented Feb 9, 2024 •

edited

Loading

DOC : Add example for the loading and pre-processing of corpus #16

DOC : Add example for the loading and pre-processing of corpus #16

Conversation

Thibeb commented Feb 9, 2024 • edited Loading

Example for loading and pre-processing the QUAERO, E3C and CASM2 corpus

Thibeb commented Feb 9, 2024 •

edited

Loading