Given a collection of documents, that are scholarly articles about coronavirus, and given a query, created by people with not no expertise in the medical domain, the objective of this project is to provide to the user a ranked list of the documents that are considered relevant for the specified query. The relevance assessment of a document with respect to the query is divided into three categories: not relevant, partially relevant and relevant.
The dataset is available at https://ir-datasets.com/cord19.html. This dataset is extracted from the work: Voorhees, Ellen, et al. «TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection». ACM SIGIR Forum, vol. 54, n. 1, giugno 2020, pagg. 1–12. DOI.org (Crossref), https://doi.org/10.1145/3451964.3451965. (link). Other informations on the task are available at linkOthers references are specified in the relation of our work.
The data used in this project are available in the data folder. Data are provided in pickle files, that are obtained by the original dataset.
The relation of this project and the presentation are available in the Relation and presentation folder.
This project is developed in Python. All the code, in the form of google Colab notebook is available in the code folder.
⊜ Marco Piazza
- Current Studies: Computer Science Msc Student @ Università degli Studi di Milano-Bicocca (Unimib) ;
- Background: Bachelor degree in Computer Science @ Università degli Studi di Milano-Bicocca (Unimib).
⊜ Elisa Cazzaniga
- Current Studies: Computer Science Msc Student @ Università degli Studi di Milano-Bicocca (Unimib) ;
- Background: Bachelor degree in Computer Science @ Università degli Studi di Milano-Bicocca (Unimib).