Data Centric Approach to SemEval-2024 Task 2

Overview

This repository contains the code for the Data Centric Approach to SemEval-2024 Task 2: Biomedical Natural Language Inference in Clinical Trials.

This project is conducted by Anna Barwig, Pingjun Hong, and Shijia Zhou under the structure of the seminar Biomodical Natural Language Processing at LMU.

Requirements

Python >= 3.10
PyTorch >= 2.1
NumPy >= 1.23
datasets >= 2.16
scikit-learn
transformers
pandas
wandb
pickle

Models

Please download the necessary pre-trained model from huggingface

DeBERTa-Base: MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli

Flan-T5-Base: google/flan-t5-base

DeBERTa Pipeline: DeBERTa.ipynb

Flan Pipeline: flan_t5_base.ipynb

Evidence Retrieval Process

Evidence Retrieval Documentations

The evidence retrieval documentation are saved under the evidence retrieval repository:

Evidence Retrieval for DeBERTa: evidence_retrieval_DeBERTa.xlsx
Evidence Retrieval for Flan: evidence_retrieval_Flan.xlsx

Annotators

DeBERTa instances: Anna Barwig, Pingjun Hong
Flan instances: Pingjun Hong, Shijia Zhou

Data Statements Generation

Code for data selection: data_selection_for_generation_round.ipynb
Statements generation: data_statement_generation.xlsx
Statements generated by: Anna Barwig, Pingjun Hong, Shijia Zhou
Cross-checked by: Anna Barwig, Pingjun Hong, Shijia Zhou
New data set for model update: new_instances.json

Results

The final predictions on test set are saved under the predictions repository:

Predictions on test set of Flan: predictions_Flan
Predictions on test set of DeBERTa (without classification boundary modification): predictions_DeBERTa
Predictions on test set of DeBERTa (classification boundary modification 3:7): predictions_DeBERTa_37
Predictions on test set of DeBERTa (classification boundary modification 4:6): predictions_DeBERTa_46

The corresponding scores are saved under the results repository:

Results on test set of Flan: scores_Flan
Results on test set of DeBERTa (without classification boundary modification): scores_DeBERTa
Results on test set of DeBERTa (classification boundary modification 3:7): scores_DeBERTa_37
Results on test set of DeBERTa (classification boundary modification 4:6): scores_DeBERTa_46

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Automated_GeneratedStatements		Automated_GeneratedStatements
evidence retrieval		evidence retrieval
predictions		predictions
results		results
DeBERTa.ipynb		DeBERTa.ipynb
DeBERTa_after_update.jpeg		DeBERTa_after_update.jpeg
DeBERTa_prior_to_update.png		DeBERTa_prior_to_update.png
FLAN_after_update.png		FLAN_after_update.png
Flan_prior_to_update.png		Flan_prior_to_update.png
GenerateStatements.ipynb		GenerateStatements.ipynb
README.md		README.md
data_selection_for_generation_round.ipynb		data_selection_for_generation_round.ipynb
data_statement_generation.xlsx		data_statement_generation.xlsx
flan_t5_base.ipynb		flan_t5_base.ipynb
new_instances.json		new_instances.json
new_instances_UPDATED.json		new_instances_UPDATED.json
probability_list_DeBERTa.pkl		probability_list_DeBERTa.pkl
probability_list_flan.pkl		probability_list_flan.pkl
problem_categories.png		problem_categories.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Centric Approach to SemEval-2024 Task 2

Overview

Requirements

Models

Evidence Retrieval Process

Data Statements Generation

Results

About

Releases

Packages

Contributors 2

Languages

PingjunHong/Biomed-NLP-Data-Centric

Folders and files

Latest commit

History

Repository files navigation

Data Centric Approach to SemEval-2024 Task 2

Overview

Requirements

Models

Evidence Retrieval Process

Data Statements Generation

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages