Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N2C2 data preprocesssing #38

Open
mnishant2 opened this issue Mar 27, 2024 · 5 comments
Open

N2C2 data preprocesssing #38

mnishant2 opened this issue Mar 27, 2024 · 5 comments

Comments

@mnishant2
Copy link

Hello,
The brat2bio.ipynb does not work for the n2c2 2018 dataset. Do you know if there are any changes needed for it to be useful for n2c2.

@bugface
Copy link
Contributor

bugface commented Mar 27, 2024

Can you post the actual errors?

@mnishant2
Copy link
Author

Can you post the actual errors?

There is no error, just doesn't seem to work, a lot of drugs/reasons go undetected after a certain point. Also please confirm that sent_offset += (len(line.strip())+1)` not +2 works for n2c2

@bugface
Copy link
Contributor

bugface commented Mar 28, 2024

can you try: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER/blob/master/tutorial/pipeline_preprocessing_model_training_prediction.ipynb, we use https://github.com/uf-hobi-informatics-lab/NLPreprocessing for preprocessing which we used for all of our previous works.

also, in n2c2 2018 dataset, some ADE and Reason are overlapped, what we did before is we have three copies for drug, ade and reason separately.

lastly, I recommend checking our project which can handle overlap: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC

@mnishant2
Copy link
Author

Thanks that worked, I have questions about the hyperparameter tuning/values needed to reproduce BERT and RoBERTa general results on all the datasets from table 2 in the paper. I am unable to reproduce the exact results. Would be really nice if you could point to that. I have also communicated through email to the corresponding author.

@bugface
Copy link
Contributor

bugface commented May 23, 2024

Thanks that worked, I have questions about the hyperparameter tuning/values needed to reproduce BERT and RoBERTa general results on all the datasets from table 2 in the paper. I am unable to reproduce the exact results. Would be really nice if you could point to that. I have also communicated through email to the corresponding author.

how far away? if it is within 0.002, then it should be OK. We use random seed=42, batch size = 4, and learning rate = 1e-5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants