This repo contains code for training Named Entity Recognition (NER) task for NCBI disease dataset, by fine-tuning distilbert.
You can use your prefered manager (conda pyenv, pipenv etc etc.) and install from requirements.txt
- NOTE: requirements.txt does not include torch, as I'm using pipenv and it breaks with cuda+11, so I have to manually install it using:
- "pipenv run pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html"
- so please install torch according to your specs: https://pytorch.org/get-started/locally/
The repo is meant to be read via jupyter notebooks inside notebooks directory. If you'd like to just play around with predictions given an arbitrary string, you an either:
- run notebook "notebooks/Playground.ipynb"
- run fastapi API via "python app.py"