N2C2 data preprocesssing #38

mnishant2 · 2024-03-27T10:20:41Z

Hello,
The brat2bio.ipynb does not work for the n2c2 2018 dataset. Do you know if there are any changes needed for it to be useful for n2c2.

bugface · 2024-03-27T11:54:52Z

Can you post the actual errors?

mnishant2 · 2024-03-27T13:16:43Z

Can you post the actual errors?

There is no error, just doesn't seem to work, a lot of drugs/reasons go undetected after a certain point. Also please confirm that sent_offset += (len(line.strip())+1)` not +2 works for n2c2

bugface · 2024-03-28T02:26:31Z

can you try: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER/blob/master/tutorial/pipeline_preprocessing_model_training_prediction.ipynb, we use https://github.com/uf-hobi-informatics-lab/NLPreprocessing for preprocessing which we used for all of our previous works.

also, in n2c2 2018 dataset, some ADE and Reason are overlapped, what we did before is we have three copies for drug, ade and reason separately.

lastly, I recommend checking our project which can handle overlap: https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC

mnishant2 · 2024-04-22T12:22:02Z

Thanks that worked, I have questions about the hyperparameter tuning/values needed to reproduce BERT and RoBERTa general results on all the datasets from table 2 in the paper. I am unable to reproduce the exact results. Would be really nice if you could point to that. I have also communicated through email to the corresponding author.

bugface · 2024-05-23T03:00:22Z

Thanks that worked, I have questions about the hyperparameter tuning/values needed to reproduce BERT and RoBERTa general results on all the datasets from table 2 in the paper. I am unable to reproduce the exact results. Would be really nice if you could point to that. I have also communicated through email to the corresponding author.

how far away? if it is within 0.002, then it should be OK. We use random seed=42, batch size = 4, and learning rate = 1e-5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N2C2 data preprocesssing #38

N2C2 data preprocesssing #38

mnishant2 commented Mar 27, 2024

bugface commented Mar 27, 2024

mnishant2 commented Mar 27, 2024

bugface commented Mar 28, 2024 •

edited

Loading

mnishant2 commented Apr 22, 2024

bugface commented May 23, 2024

N2C2 data preprocesssing #38

N2C2 data preprocesssing #38

Comments

mnishant2 commented Mar 27, 2024

bugface commented Mar 27, 2024

mnishant2 commented Mar 27, 2024

bugface commented Mar 28, 2024 • edited Loading

mnishant2 commented Apr 22, 2024

bugface commented May 23, 2024

bugface commented Mar 28, 2024 •

edited

Loading