Official respository for the Medical Workshop in The 14th European Summer School on Information Retrieval
This repository hosts all the materials related to our one-day working session on Medical Information Retrieval (Medical IR). The workshop is designed to provide attendees with valuable insights into the field of Medical IR.
The workshop's primary aim is to provide participants with a thorough understanding of medical information retrieval. It achieves this by presenting a retrieval task and guiding attendees through the creation of an Information Retrieval pipeline. Additionally, the workshop explores techniques for information extraction and expansion using knowledge bases, language models (LMs), and large language models (LLMs). Practical hands-on sessions cover neural information retrieval and the development of retrieval pipelines, ensuring that participants gain a comprehensive skill set to effectively retrieve medical information.
During the workshop, our goal is to foster a spirit of collaboration. We strongly encourage participants to work closely together, brainstorm, and craft innovative solutions. This collaborative atmosphere not only enriches the learning experience but also empowers you to develop practical methods for effectively uncovering essential medical information.
The retrieval task presented as a case study in the workshop centers around clinical trials and clinical trials retrieval. Clinical trials are experiments crucial in the development of new medical treatments, drugs, or devices. The primary task entails allocating patients to these clinical trials, essentially finding eligible participants for these critical medical studies. Participants in the workshop are encouraged to apply their knowledge and skills to develop end-to-end pipelines capable of addressing this task, illustrating the practical importance of information retrieval in the context of healthcare and medical research.
The dataset for the 2022 Clinical Trials Track TREC task comprises synthetic patient cases fashioned by individuals with medical expertise. These cases are presented as synthetic admissions notes, serving as the topics for the track. These synthetic case descriptions closely mimic real-world medical scenarios and are designed to challenge participants in retrieving pertinent clinical trial information. In this task, participants will work with a corpus derived from a snapshot of ClinicalTrials.gov, collected on April 27, 2021, containing a wealth of clinical trial documents, thereby offering a realistic and comprehensive resource for information retrieval and evaluation. For a more comprehensive understanding of the task, please visit the official TREC 2022 Clinical Trials Track.
-
Introduction to the Working Session - Get to Know Each Other
- Start the day by introducing ourselves and getting to know our fellow participants.
-
Introduction to the Retrieval Task
- Gain insights into the retrieval task we'll tackle during the workshop.
-
Building an Information Retrieval Pipeline
- Presentation and hands-on session covering the fundamental steps of constructing an Information Retrieval pipeline.
-
Information Extraction/Expansion with a Knowledge Base
- Learn about techniques for information extraction and expansion using knowledge bases and participate in a hands-on session.
-
Information Extraction/Expansion with Language Models (LMs) and Large Language Models (LLMs)
- Explore the power of LMs and LLMs in information extraction and expansion, with practical exercises.
-
Neural Information Retrieval
- Dive into the world of Neural Information Retrieval through a hands-on session.
-
Development of Your Working Pipeline
- Apply your newfound knowledge and skills to create your own functional retrieval pipeline.
In the "code" directory, you'll find a collection of Colab notebooks that implement various components of the workshop agenda. Each Colab notebook is paired with a corresponding presentation file, located in the "presentations" folder. This pairing ensures seamless access to both practical code implementations and their associated explanatory presentations, facilitating a comprehensive learning experience.
Georgios Peikos, PhD candiate from University of Milano-Bicocca
Wojciech Kusa, PhD candidate from Tu Wien
Annisa Maulida Ningtyas, PhD candidate from Tu Wien
Oscar E. Mendoza, PhD candiate from University of Milano-Bicocca