A comprehensive NLP preprocessing package for clinical notes sentence boundary detection, tokenization
git clone https://github.com/uf-hobi-informatics-lab/NLPreprocessing
cd NLPreprocessing
pip install .
from nlpreprcessing.annotation2BIO import pre_processing, generate_BIO
txt, sents = pre_processing("./test.txt")
generate_BIO(sents, [])
from nlpreprcessing.text_process.sentence_tokenization import SentenceBoundaryDetection
processor = SentenceBoundaryDetection()
processor.sent_tokenizer("this is a test!")
python-version>=3.6
most new features are implemented in dev branch, we need to make a comprehensive tests on the new features before merge to master use at your own risk