From 4d13a33fd3927edc4942384b9fa71b7b177e8834 Mon Sep 17 00:00:00 2001 From: Anka Date: Fri, 14 Jun 2024 11:54:29 +0000 Subject: [PATCH] Render book --- docs/404.html | 2 +- docs/contact.html | 2 +- docs/index.html | 2 +- docs/mi2redteam.html | 2 +- "docs/mi\302\262betabit.html" | 2 +- "docs/mi\302\262cancer.html" | 2 +- "docs/mi\302\262solutions.html" | 2 +- "docs/mi\302\262space.html" | 2 +- docs/papers.html | 2 +- docs/research-grants.html | 2 +- docs/search_index.json | 2 +- docs/seminars.html | 2 +- docs/the-team.html | 2 +- docs/thesis-proposals.html | 23 +++++++++++++---------- 14 files changed, 26 insertions(+), 23 deletions(-) diff --git a/docs/404.html b/docs/404.html index 55a6964..f1e5c59 100644 --- a/docs/404.html +++ b/docs/404.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/contact.html b/docs/contact.html index b5b6682..c825a6e 100644 --- a/docs/contact.html +++ b/docs/contact.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/index.html b/docs/index.html index b5623d8..ebb015d 100644 --- a/docs/index.html +++ b/docs/index.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/mi2redteam.html b/docs/mi2redteam.html index 313d731..15ea33b 100644 --- a/docs/mi2redteam.html +++ b/docs/mi2redteam.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git "a/docs/mi\302\262betabit.html" "b/docs/mi\302\262betabit.html" index 31057ba..95def10 100644 --- "a/docs/mi\302\262betabit.html" +++ "b/docs/mi\302\262betabit.html" @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git "a/docs/mi\302\262cancer.html" "b/docs/mi\302\262cancer.html" index 94e758a..16ba1dd 100644 --- "a/docs/mi\302\262cancer.html" +++ "b/docs/mi\302\262cancer.html" @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git "a/docs/mi\302\262solutions.html" "b/docs/mi\302\262solutions.html" index 1d10d8c..412274b 100644 --- "a/docs/mi\302\262solutions.html" +++ "b/docs/mi\302\262solutions.html" @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git "a/docs/mi\302\262space.html" "b/docs/mi\302\262space.html" index 0355a5e..193e5b4 100644 --- "a/docs/mi\302\262space.html" +++ "b/docs/mi\302\262space.html" @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/papers.html b/docs/papers.html index 0041fa7..882101c 100644 --- a/docs/papers.html +++ b/docs/papers.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/research-grants.html b/docs/research-grants.html index fb0b9ec..9f70d6f 100644 --- a/docs/research-grants.html +++ b/docs/research-grants.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/search_index.json b/docs/search_index.json index 405d952..4e9a1f5 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "MI² MI².AI", " MI² MI².AI On a mission to responsibly build machine learning predictive models. MI².AI is a group of mathematicians and computer scientists who love to play with predictive models. We are spread between Warsaw University of Technology and University of Warsaw. Here we have workshops and seminars, here we are forging new ideas, creating tools, solving problems, doing consulting and sharing our positive attitude. Feel free to jump in. Mission Machine learning is like atomic energy. We develop leaders, skills, methods, tools and good practices so that predictive models can be deployed responsibly and sustainably. Vision MI² is a group of experts supporting global initiatives aimed at responsible and sustainable machine learning. We support the development of future leaders of responsible machine learning through internships, PhDs, postdoctoral fellowships and so on. We seek for research grants and business projects to conduct both scientific and applied research. We develop and maintain the software and infrastructure necessary to build responsible and sustainable ML. We develop cooperation with international teams working on similar topics. We support companies to implement best practices related to responsible modelling in their operation. We conduct workshops and training on responsible predictive modelling. "],["the-team.html", "The Team", " The Team Members Przemysław Biecek, PhD, DSc (Team Leader) Hubert Baniecki, PhD student Mustafa Cavus, PhD Maciej Chrabąszcz, PhD student Weronika Hryniewska-Guzik, PhD student Filip Kołodziejczyk, MSc student Mateusz Krzyziński, MSc student Tymoteusz Kwieciński, BSc student Stanisław Łaniewski, PhD student Wiktoria Mieleszczenko-Kowszewicz, PhD Nuno Sepúlveda, PhD Bartek Sobieski, MSc student Mikołaj Spytek, MSc student Jakub Świstak, MSc student Paulina Tomaszewska, PhD student Piotr Wilczyński, BSc student Katarzyna Woźnica, PhD student Vladimir Zaigrajew, PhD student Collaborators Mariusz Adamek, Prof, MD Przemysław Bombiński, PhD, MD André Fonseca, PhD student Stanisław Giziński, MSc student Katarzyna Kobylińska, PhD student Piotr Komorowski, MSc Anna Kozak, MSc Marcin Luckner, PhD João Malato, PhD student Bartek Pieliński, PhD, DSc Hanna Piotrowska, MA Barbara Rychalska, PhD Elżbieta Sienkiewicz, PhD Julian Sienkiewicz, PhD Tomasz Stanisławek, PhD Adrian Stańdo, MSc student Patryk Szatkowski, PhD student, MD Emilia Wiśnios, MSc student Jakub Wiśniewski, MSc student Mateusz Wójcik, MSc student Alumni Piotr Czarnecki, MSc Alicja Gosiewska, MSc Adrianna Grudzień, BSc Mateusz Grzyb, MSc student Paulina Kaczyńska, MSc student Maria Kałuska, BSc Marcin Kosiński, MSc Adam Kozłowski, MSc Wojciech Kretowicz, BSc Michał Kuźba, MSc Szymon Maksymiuk, BSc Tomasz Mikołajczyk, PhD Katarzyna Pękala, MSc Piotr Piątyszek, BSc student Hubert Ruczyński, MSc student Adam Rydelek, BSc Bartosz Sawicki, BSc Patryk Słowakiewicz, BSc Michał Sokólski, MSc Mateusz Stączek, BSc Szymon Szmajdziński, BSc Zuzanna Trafas, BSc Hoang Thien Ly, BSc Kinga Ułasik, BSc Anna Wróblewska, PhD Paweł Wojciechowski, BSc Hanna Zdulska, BSc Artur Żółkowski, BSc Przemysław Biecek My personal mission is to enhance human capabilities by supporting them through access to data-driven and knowledge-based predictions. I execute it by developing methods and tools for responsible machine learning, trustworthy artificial intelligence and reliable software engineering. I work as an associate professor at Warsaw University of Technology and the University of Warsaw. I graduated in software engineering and mathematical statistics and now work on model visualisation, explanatory model analysis, predictive modelling and data science for healthcare. In 2016, I formed the research group MI² which develops methods and tools for predictive model analysis. Google Scholar: Af0O75cAAAAJ GitHub: pbiecek LinkedIn: pbiecek Mariusz Adamek I work at two Medical Universities (Silesia and Gdańsk) holding a Professorship in Medicine and Health Sciences. My interests are focused on lung cancer prevention and screening, the latter by means of low-dose computed tomography (LDCT) with special emphasis put on molecular biology methods, prediction models and image analysis aimed to enhance the performance of lung screening outcomes. Website: www.mariuszadamek.io Hubert Baniecki I’m a PhD student in Computer Science at the University of Warsaw. Prior, I did my MSc (2022) and BSc (2021) in Data Science at Warsaw University of Technology. My main research interest is explainable machine learning, with particular emphasis on adversarial attacks & explanation evaluation. Website: hbaniecki.com Mustafa Cavus I work as an assistant professor at the Eskisehir Technical University. I joined the MI² DataLab as a post-doc researcher in 2021. I work on glocal explanations and imbalanced learning. Google Scholar: I63d1WIAAAAJ&hl GitHub: mcavus LinkedIn: mcavus Twitter: mcavus Julian Sienkiewicz I work as an assistant professor at Faculty of Physics, WUT. My main research area links with sociophysics, complex networks and agent-based models. In the scope of MI² DataLab I follow my other interest - scientometrics. Google Scholar: mIwu11QAAAAJ LinkedIn: julek-sienkiewicz-873829 Maciej Chrabąszcz I am pursuing a PhD in Computer Science at Warsaw University of Technology, where I also obtained my MSc in Mathematical Statistics in 2023. My main research interests lie in the fields of responsible and explainable machine learning, with a focus on Red Teaming foundation models. GitHub: maciejchrabaszcz LinkedIn: maciej-chrabaszcz Stanisław Giziński A Research Software Engineer and student of Machine Learning at Faculty of Mathematics Informatics and Mechanic, University of Warsaw. His work in the lab focuses on using natural language processing and network analysis to better understand the spread of AI public policies. Interested also in applying machine learning in bioinformatics. Google Scholar: Stanisław Giziński GitHub: Gizzio LinkedIn: stanislaw-gizinski Mateusz Grzyb MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence and scientific computing, but above all simply enjoys programming. GitHub: mgrzyb99 Weronika Hryniewska-Guzik PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning modelling on medical images in the context of explainability and responsible AI. Google Scholar: aJeg3IQAAAAJ GitHub: Hryniewska LinkedIn: weronikahryniewska Paulina Kaczyńska I am working towards a Master’s degree in Machine Learning at University of Warsaw. I am interested in Natural Language Processing and ML applications in social sciences GitHub: Kaczyniec Piotr Komorowski Master’s student in Machine Learning at the University of Warsaw. Mainly interested in image processing and XAI applied to medical images. GitHub: piotr-komorowski LinkedIn: Piotr-Komorowski Anna Kozak Graduated in mathematical statistics at Warsaw University of Technology. Interested in explainable artificial intelligence and data visualization. Organizes projects related to education. Google Scholar: JIrqf9kAAAAJ GitHub: kozaka93 LinkedIn: kozakanna Mateusz Krzyziński MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, with particular emphasis on XAI methods for survival analysis models and XAI applications in the medical field. Also an enthusiast of data visualization. Google Scholar: i_r7EUgAAAAJ GitHub: krzyzinskim LinkedIn: krzyzinskim Tymoteusz Kwieciński BSc student in Data Science at Warsaw University of Technology. Particularly interested in explainable artificial intelligence, computer vision and NLP. GitHub: Fersoil LinkedIn: Tymoteusz-Kwieciński Stanisław Łaniewski PhD student in Quantitative Psychology and Economics at University of Warsaw, Machine Learning Researcher at MI2 Data Lab, Msc in Actuarial Science and Mathematical Finance at University of Amsterdam, former Quantitative Researcher at Flow Traders His research focuses on enhancing classical methods used in discrete choice and finance with machine learning and how to apply them to explain behavioral phenomena and heuristics. He is also keen on finding balance between best predictive models and their explainability. Avid gamer who applies statistical techniques to deepen the understanding of best strategies LinkedIn: Stanisław-Łaniewski Wiktoria Mieleszczenko-Kowszewicz PhD in social science, graduated from an interdisciplinary doctoral program: information and communication technologies & psychology. Researcher interested in the use of LLMs in psychometrics and developing responsible AI solutions for positive societal impact. LinkedIn: Wiktoria Mieleszczenko-Kowszewicz Piotr Piątyszek Undergraduate Data Science student at Warsaw University of Technology. Works as a research software engineer on enhancing accessibility and completeness of explainable AI. During pandemic contributes to a system of monitoring covid variants. Github: piotrpiatyszek Bartosz Pieliński I am an Assistant Professor at the Faculty of Political Science and International Studies at Warsaw University. I am interested in applying quantitative methods to study public policies. I am a founding member of the Institutional Grammar Research Initiative, which is focused on developing a new way of analysing social rules. I have participated in several research projects covering social policy, non-profit organizations, social enterprises, and international organizations. Website: https://pielinski.info/ Google Scholar: hnWiaVEAAAAJ LinkedIn: Bartosz Pieliński Hanna Piotrowska Information designer, focusing mainly on data visualization, branding and book design, with a strong interest in Data Science and perception studies. Winner of numerous awards, including The Kantar Information Is Beautiful Awards, HOW International Design Awards, Polish Graphic Design Awards and KTR. LinkedIn: hanna-piotrowska Twitter: hannapio Behance: hannapio. Hubert Ruczyński I am working towards Masters’s degree in Data Science at Warsaw University of Technology. I am also teaching students about data exploration and visualisation. My major interests are: AutoML | Natural Language Processing | Data Visualization | Fairness. GitHub: HubertR21 LinkedIn: Hubert Ruczyński Barbara Rychalska PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP), recommender systems and graph-based learning. Google Scholar: Wp0wHJoAAAAJ LinkedIn: Barbara-Rychalska Bartek Sobieski MSc student in Data Science at Warsaw University of Technology. Interested in deep learning and hyperparameter optimization. GitHub: sobieskibj LinkedIn: Bartłomiej-Sobieski Mikołaj Spytek MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, data vizualization and survival analysis. Google Scholar: 1u49AqYAAAAJ GitHub: mikolajsp LinkedIn: Mikołaj-Spytek Jakub Świstak MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence, NLP and computer vision. GitHub: jswistak LinkedIn: Jakub-Świstak Tomasz Stanisławek PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP). Google Scholar: gq8NY_UAAAAJ GitHub: tstanislawek LinkedIn: Tomasz-Stanisławek Paulina Tomaszewska PhD candidate in Computer Science at Warsaw University of Technology. Gained experience in AI at leading universities during: Deep Learning Summer School at Tsinghua University (China), one-semester exchange at Nanyang Technological University (Singapore) and research internships at Gwangju Institute of Science and Technology (South Korea) and Institute of Science and Technology (Austria). Mainly interested in Deep Learning, Computer Vision and Transfer Learning. Recently, focused on digital pathology. Google Scholar: eO245iMAAAAJ LinkedIn: paulina-tomaszewska Hoang Thien Ly Bachelor student in Maths and Data Analysis at Warsaw University of Technology. Interested in working with data, and learning explainable artificial intelligence methods. Google Scholar: JkysewYAAAAJ GitHub: lhthien09 LinkedIn: hthienly Piotr Wilczyński BSc student in Data Science at Warsaw University of Technology. Interested in Large Language Models, AI Deception and Natural Language Processing. Currently working on my thesis, which applies Computer Vision to medicine. GitHub: wi1lku LinkedIn: Piotr-Wilczyński Jakub Wiśniewski Research Software Engineer and third year Data Science student at Warsaw University of Technology. Developer of tools for bias detection and fairness. Currently researching responsible applications of deep learning. President of Data Science Science Club at WUT. Google Scholar: _6eQsXMAAAAJ GitHub: jakwisn LinkedIn: jakwisn Emilia Wiśnios Research Software Engineer and student of Machine Learning at Faculty of Mathematics, Informatics and Mechanics, University of Warsaw. Interested in natural language processing and reinforcement learning. GitHub: emiliawisnios LinkedIn: emilia-wisnios Paweł Wojciechowski Graduated with a bachelor’s degree in Data Science from Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision, and active learning. GitHub: p-wojciechowski LinkedIn: wojciechowski-p Katarzyna Woźnica PhD candidate in computer science at Warsaw University of Technology. Graduated in mathematical statistics. Interested in automated machine learning especially in hyperparameter tuning for tabular data. Carrying statistical analysis and predictive modelling for healthcare. Google Scholar: tAQS1gQAAAAJ GitHub: woznicak LinkedIn: woznicak Vladimir Zaigrajew PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning, primarily on images, with a focus on representation learning. GitHub: WolodjaZ LinkedIn: vladimir-zaigrajew Artur Żółkowski BSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision and NLP. GitHub: arturzolkowski LinkedIn: Artur-Żółkowski Filip Kołodziejczyk MSc student in Data Science at Warsaw University of Technology. Interested primarly in Large Language Models. Currently researching Red Teaming of such models. At the same time, a DevOps professional. GitHub: FilipKolodziejczyk LinkedIn: filip-kołodziejczyk-00 "],["seminars.html", "Seminars", " Seminars We meet every Monday, at 10 am online or in MI2DataLab (room 044, Faculty of Mathematics and Information Science, Warsaw University of Technology). Join us at http://meet.drwhy.ai/ List of topics and materials from past seminars: https://github.com/MI2DataLab/MI2DataLab_Seminarium "],["papers.html", "Papers", " Papers On the Robustness of Global Feature Effect Explanations Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek ECML PKDD (2024) We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally. Red-Teaming Segment Anything Model Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek CVPR Workshops (2024) The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis of SAM. We analyze the impact of style transfer on segmentation masks. We assess whether the model can be used for attacks on privacy, such as recognizing celebrities’ faces. Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. Adversarial attacks and defenses in explainable artificial intelligence: A survey Hubert Baniecki, Przemysław Biecek Information Fusion (2024) Explanations of machine learning models can be manipulated. We introduce a unified notation and taxonomy of adversarial attacks on explanations. Adversarial examples, data poisoning, and backdoor attacks are key safety issues in XAI. Defense methods like model regularization improve the robustness of explanations. We outline the emerging research directions in adversarial XAI. survex: an R package for explaining machine learning survival models Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N Wright, Przemysław Biecek Bioinformatics (2023) This paper demonstrates the functionalities of the survex package, which provides a comprehensive set of tools for explaining machine learning survival models. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, promoting transparency and responsibility in sensitive areas. Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarks Katarzyna Woźnica, Mateusz Grzyb, Zuzanna Trafas, Przemysław Biecek Machine Learning (2023) This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar datasets. We show that a carefully selected static portfolio of hyperparameter configurations yields good results for anytime optimization, while maintaining the ease of use and implementation. We demonstrate the effectiveness of this approach through an empirical study for the XGBoost algorithm and the newly created metaMIMIC benchmarks of predictive tasks extracted from the MIMIC-IV medical database. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Climate Policy Tracker: Pipeline for automated analysis of public climate policies Artur Żółkowski, Mateusz Krzyziński, Piotr Wilczyński, Stanisław Giziński, Emilia Wiśnios, Bartosz Pieliński, Julian Sienkiewicz, Przemysław Biecek NeurIPS Workshop on Tackling Climate Change with Machine Learning (2022) In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. Explainable expected goal models for performance analysis in football analytics Mustafa Cavus, Przemyslaw Biecek International Conference on Data Science and Advanced Analytics (2022) The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of the score in modern football. This paper proposes an accurate expected goal model trained on 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. Moreover, we demonstrate a practical application of aggregated profiles to explain a group of observations on an accurate expected goal model for monitoring the team and player performance. Multi-omics disease module detection with an explainable Greedy Decision Forest Bastian Pfeifer, Hubert Baniecki, Anna Saranti, Przemyslaw Biecek, Andreas Holzinger Scientific Reports (2022) In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer. Interpretable meta-score for model performance Alicja Gosiewska, Katarzyna Woźnica, Przemysław Biecek Nature Machine Intelligence (2022) Elo-based predictive power (EPP) meta-score that is built on other performance measures and allows for interpretable comparisons of models. Differences between this score have a probabilistic interpretation and can be compared directly between data sets. Furthermore, this meta-score allows for an assessment of ranking fitness. We prove the properties of the Elo-based predictive power meta-score and support them with empirical results on a large-scale benchmark of 30 classification data sets. Additionally, we propose a unified benchmark ontology that provides a uniform description of benchmarks. fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease Angela Lombardi, Domenico Diacono, Nicola Amoroso, Przemysław Biecek, Alfonso Monaco, Loredana Bellantuono, Ester Pantaleo, Giancarlo Logroscino, Roberto De Blasi, Sabina Tangaro, Roberto Bellotti Brain Informatics (2022) In this work, we present a robust framework to (i) perform a threefold classification between healthy control subjects, individuals with cognitive impairment, and subjects with dementia using different cognitive indexes and (ii) analyze the variability of the explainability SHAP values associated with the decisions taken by the predictive models. We demonstrate that the SHAP values can accurately characterize how each index affects a patient’s cognitive status. Furthermore, we show that a longitudinal analysis of SHAP values can provide effective information on Alzheimer’s disease progression. LIMEcraft: handcrafted superpixel selection and inspection for Visual eXplanations Weronika Hryniewska, Adrianna Grudzień, Przemysław Biecek Machine Learning (2022) LIMEcraft enhances the process of explanation by allowing a user to interactively select semantically consistent areas and thoroughly examine the prediction for the image instance in case of many image features. Experiments on several models show that our tool improves model safety by inspecting model fairness for image pieces that may indicate model bias. The code is available at: this URL. Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski et al. Cancers (2022) The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed long non-coding RNAs (lncRNAs) in the early stages of non-small-cell lung cancer (NSCLC). We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python Hubert Baniecki, Wojciech Kretowicz, Piotr Piątyszek, Jakub Wiśniewski, Przemyslaw Biecek Journal of Machine Learning Research (2021) We introduce dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness. It adopts the design crafted through the development of various tools for explainable machine learning; thus, it aims at the unification of existing solutions. This library’s source code and documentation are available under open license at this URL. Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies Weronika Hryniewska, Przemysław Bombiński, Patryk Szatkowski, Paulina Tomaszewska, Artur Przelaskowski, Przemysław Biecek Pattern Recognition (2021) Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model. Towards explainable meta-learning Katarzyna Woźnica, Przemyslaw Biecek ECML PKDD Workshop on eXplainable Knowledge Discovery in Data Mining (2021) To build a new generation of meta-models we need a deeper understanding of the importance and effect of meta-features on the model tunability. In this paper, we propose techniques developed for eXplainable Artificial Intelligence (XAI) to examine and extract knowledge from black-box surrogate models. To our knowledge, this is the first paper that shows how post-hoc explainability can be used to improve the meta-learning. Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering Alicja Gosiewska, Anna Kozak, Przemysław Biecek Decision Support Systems (2021) We propose a framework that uses elastic black boxes as supervisor models to create simpler, less opaque, yet still accurate and interpretable glass box models. The new models were created using newly engineered features extracted with the help of a supervisor model. We supply the analysis using a large-scale benchmark on several tabular data sets from the OpenML database. The first SARS-CoV-2 genetic variants of concern (VOC) in Poland: The concept of a comprehensive approach to monitoring and surveillance of emerging variants Radosław Charkiewicz, Jacek Nikliński, Przemysław Biecek, Joanna Kiśluk, Sławomir Pancewicz, Anna Moniuszko-Malinowska, Robert Flisiak, Adam Krętowski, Janusz Dzięcioł, Marcin Moniuszko, Rafał Gierczyński, Grzegorz Juszczyk, Joanna Reszeć Advances in Medical Sciences (2021) This study shows the first confirmed case of SARS-CoV-2 in Poland with the lineage B.1.351 (known as 501Y.V2 South African variant), as well as another 18 cases with epidemiologically relevant lineage B.1.1.7, known as British variant. Responsible Prediction Making of COVID-19 Mortality (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2021) During the literature review of COVID-19 related prognosis and diagnosis, we found out that most of the predictive models are not faithful to the RAI principles, which can lead to biassed results and wrong reasoning. To solve this problem, we show how novel XAI techniques boost transparency, reproducibility and quality of models. Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. Explanations of Model Predictions with live and breakDown Packages Mateusz Staniak, Przemyslaw Biecek The R Journal (2018) Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used for explaining predictions from complex black box models and attributing parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state-of-the-art solutions, namely, lime (Pedersen and Benesty, 2018) which implements Locally Interpretable Model-agnostic Explanations and iml (Molnar et al., 2018) which implements Shapley values. DALEX: Explainers for Complex Predictive Models in R Przemyslaw Biecek Journal of Machine Learning Research (2018) This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method irrespective of its internal structure. Each explainer is linked with a specific aspect of a model. Every explainer presented here works for a single model or for a collection of models. In the latter case, models can be compared against each other. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. archivist: An R Package for Managing, Recording and Restoring Data Analysis Results Przemyslaw Biecek, Marcin Kosiński Journal of Statistical Software (2017) Everything that exists in R is an object (Chambers 2016). This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. "],["research-grants.html", "Research grants", " Research grants DeMeTeR: 2024-2028 DeMeTeR: Interpreting Diffusion Models Through Representations Diffusion models have been the latest revolution in the domain of generative modelling in computer vision, surpassing the capabilities of long-reigning generative adversarial networks , and are currently being adapted to multiple other domains and modalities. However, we still lack an in-depth understanding of their inner workings from both an empirical and theoretical standpoint. Considering that, the main goals of the DeMeTeR project are: to broaden the practical and theoretical understanding of diffusion-specific latent representations and architecture-specific internal representations of diffusion models, to develop novel methods of manipulating these representations that allow for enhancing safety and explainability of deep learning models Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2023/50/O/ST6/00301. PvSTATEM 2023-2027 PvSTATEM: Serological testing and treatment for P. Vivax: from a cluster-randomised trial in Ethiopia and Madagascar to a mobile-technology supported intervention The PvSTATEM project aims to demonstrate the efficacy and the community acceptability of P. vivax Serological Testing and Treatment (PvSeroTAT), a new intervention for the control and elimination of malaria, in cluster-randomised trials in Ethiopia and Madagascar. The project will also innovate new mobile technologies for the efficient implementation of PvSeroTAT in settings beyond clinical trials. The PvSeroTAT intervention includes a serological diagnostic test that measures antibodies to multiple P. vivax antigens and informs an individual-level treatment decision. However, the results from serological tests can also inform population-level surveillance of malaria. In this Hop-on project, mathematical models, machine learning tools, and digital technologies will be developed so that data generated by the clinical trials in Ethiopia and Madagascar can inform national malaria surveillance programs. Work on this project is financially supported by the HORIZON grant HORIZON-WIDERA-2022-ACCESS-07-01. GliomAI 2024 GliomAI: Artificial Intelligence for Radiogenomic Atlas of Gliomas The new 2021 WHO classification of brain tumours places more emphasis than before on genetic variation in the classification of tumour lesions. However, invasive procedures are required for genetic diagnosis, which pose risks to patients and limit access to molecular profiling. Radiomics, a non-invasive approach, allows the analysis of tumour features using imaging data such as magnetic resonance imaging (MRI), which is used to extract computational independent variables. This approach allows the analysis of heterogeneity, spatial relationships and textural patterns that characterise different tumour phenotypes, however, may not be graspable by human perception. The correlation of such computational variables obtained with genetic findings is called radiogenomics. Multidimensional datasets play a key role in the development of the field of radiogenomics. However, in order to do so, it is necessary to delineate regions of interest within imaging studies - so-called masks - which are ultimately used to extract computational variables. In this project, we plan to develop a novel radiomic database containing not only clinical, genetic and imaging data, but also the previously mentioned segmentation masks of gliomas and their immediate surroundings. To this end, an interdisciplinary research team will be formed, benefiting from the synergistic impact of the two units involved in the project at our Universities. Work on this project is financially supported by Warsaw Medical University and Warsaw University of Technology within the Collaboration Initiative Programme WUM_PW INTEGRA 1. PINEAPPLE 2023-2025 PINEAPPLE: Explainable AI for hyperspectral image analysis In the PINEAPPLE project (exPlaINablE Ai for hyPersPectraL imagE analysis), we will address the important research gap of lack of “trust” into (deep) machine learning algorithms for EO, through tackling two real-life EO downstream tasks (estimating soil parameters from HSI and detecting methane in such imagery) using new deep and classic machine learning algorithms empowered by new explainable AI (XAI) techniques. We believe that PINEAPPLE will be an important step toward not only “uncovering the magic” behind deep learning algorithms (hence building trust in them in EO downstream tasks), but also in showing that XAI techniques can be effectively utilized to improve such data-driven algorithms (both classic and deep machine learning-powered), ultimately leading to better algorithms. Finally, we will put special effort into: unbiasing the validation of existing and emerging algorithms through ensuring their full reproducibility (both at the algorithm and at the data level), and understanding & improving the generalization of such algorithms when fundamentally different data is used for testing (e.g., noisy, with simulated other atmospheric conditions, captured in different area/time, and so forth) Work on this project is financially supported by European Space Agency grant ESA AO/1-11524/22/I-DT. ARES 2022-2026 ARES: Attack-resistant Explanations toward Secure and trustworthy AI Machine learning explainability, fairness, robustness, and security are key elements of trustworthy Artificial Intelligence, an area of strategic importance. In this context, the main goals of the ARES project are: Develop adversarial attacks on state-of-the-art explanations to investigate vulnerabilities and limitations of the existing explainability and fairness approaches in machine learning. Introduce novel robust explanations that are stable against manipulation and intuitive to evaluate. Achieving the first goal primarily impacts various domains of research, which currently use (and explain) black-box models for knowledge discovery and decision-making, by highlighting vulnerabilities and limitations of their explanations. Achieving the second goal impacts more the broad machine learning domain as it aims at improving state-of-the-art by introducing robust explanations toward secure and trustworthy AI. Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2021/43/O/ST6/00347. DARLING 2022-2024 DARLING: Deep Analysis of Regulations with Language Inference, Network analysis and institutional Grammar Aim of the project Developing the tools for automated analysis of content of legal documents leveraging Natural Language Processing, that will help understand the dynamic of change in public policies and variables influencing those changes. Those tools will be firstly used to analyse the case of development of policy subsystem regulating usage of AI in the European Union. Specific goals of the project Developing and evaluating multilingual models for issue classification for legal and public policy documents. Developing embedding-based topic modeling methods for legal and public policy documents suited for analysis of change of the topics between documents. Institutional grammar based analysis of changes in topics between different public policy documents, regulations and public consultation documents. Agent-based models predicting diffusion of issues in public policy documents. Methodology The core of the DARLING project is the issues and topic analysis in documents connected with regulations development using NLP tools. Issues analysis shall allow tracking how different options of AI operationalisation, ways the AI-connected threats are perceived as well as ideas regarding AI regulations are shared among three different types of texts: scientific, expert and legal ones. The extracted issues will then be subject to complex networks analysis and institutional grammar approach. The network analysis, backed by agent-based modeling, will be used to examine the flow of issues among the documents based on their vector-formed characteristics. On the other hand, the Institutional Grammar (IG) will be used to analyze the modality of issues, e.g., the tendency to regulate a specific aspect of AI in a given issue, its deontic character or its conditionality. In result the DARLING project will effect in the development of new methods to analyze legal documents connected to regulation based on deep text processing and links among the documents. An inter-institutional and interdisciplinary team of computer, political sciences and physics of complex systems scientists will elaborate new machine learning approaches to examine the regulation corpora, issues recognition, issues analysis by the means of IG as well as propose new methods of modeling the flow/changes of regulations based on complex networks tools. X-LUNGS 2021-2024 X-LUNGS: Responsible Artificial Intelligence for Lung Diseases The aim of the project is to support the process of identification of lesions visible on CT and lung x-rays. We intend to achieve this goal by building an information system based on artificial intelligence (AI) that will support the radiologist’s work by enriching the images with additional information. The unique feature of the proposed system is a trustworthy artificial intelligence module that: will reduce the image analysis time needed to detect lesions, will make the image evaluation process more transparent, will provide image and textual explanations indicating the rationale behind the proposed recommendation, will be verified for effective collaboration with the radiologist. Work on this project is financially supported from the INFOSTRATEG-I/0022/2021-00 grant funded by Polish National Centre for Research and Development (NCBiR). HOMER 2020-2025 HOMER: Human Oriented autoMated machinE leaRning One of the biggest challenges in the state-of-the-art machine learning is dealing with the complexity of predictive models. Recent techniques like deep neural networks, gradient boosting or random forests create models with thousands or even millions of parameters. This makes decisions generated by these black-box models completely opaque. Model obscurity undermines trust in model decisions, hampers model debugging, blocks model auditability, exposes models to problems with concept drift or data drift. Recently, there has been a huge progress in the area of model interpretability, which results in the first generation of model explainers, methods for better understanding of factors that drive model decisions. Despite this progress, we are still far from methods that provide deep explanations, confronted with domain knowledge that satisfies our ,,Right to explanation’’ as listed in the General Data Protection Regulation (GDPR). In this project I am going to significantly advance next generation of explainers for predictive models. This will be a disruptive change in the way how machine learning models are created, deployed, and maintained. Currently to much time is spend on handcrafted models produced in a tedious and laborious try-and-error process. The proposed Human-Oriented Machine Learning will focus on the true bottleneck in development of new algorithms, i.e. on model-human interfaces. The particular directions I consider are (1) developing an uniform grammar for visual model exploration, (2) establishing a methodology for contrastive explanations that describe similarities and differences among different models, (3) advancing a methodology for non-additive model explanations, (4) creating new human-model interfaces for effective communication between models and humans, (5) introducing new techniques for training of interpretable models based on elastic surrogate black-box models, (6) rising new methods for automated auditing of fairness, biases and performance of predictive models. Work on this project is financially supported from the SONATA BIS grant 2019/34/E/ST6/00052 funded by Polish National Science Centre (NCN). DeCoviD 2020-2022 DeCoviD: Detection of Covid-19 related markers of pulmonary changes using Deep Neural Networks models supported by eXplainable Artificial Intelligence and Cognitive Compressed Sensing Covid-19 is an infectious respiratory disease. A coronavirus infection leaves permanent ramifications in the respiratory system and beyond. In this situation, tools supporting diagnosis and assessment of lung damage after infection and during Covid-19 treatment are crucial. Preliminary results of analysis of CT images and lung xrays suggest that they can help to quickly assess even asymptomatic cases and facilitate prognosis of response to treatment. There are also reports of usefulness of ultrasound images. The aim of the DeCoviD project is to develop methods and tools to support radiologists in the assessment of lung imaging data for the occurrence of changes caused by Covid-19 disease. The developed solution will allow to automate the identification of pathological changes and will support the diagnosis of coexisting lung diseases as well as diseases of other organs visible on chest images. It will also allow to quantify the severity of lung damage caused by the disease Responsible decision support for radiologists requires models based on interpretable features. Such features will be stored in a hybrid knowledge base powered by two research teams from WUT, working on the basis of two, seemingly opposite, paradigms of image data analysis. The eXplainable Artificial Intelligence (XAI) team will use trained deep networks to automatically extract features that are essential for effective disease detection. Cognitive Compressed Sensing (CCS) will build a set of interpretable semantic features using sparse cognitive representations agreed with a group of cooperating radiologists. Combining these two approaches will achieve high effectiveness of the constructed models, combined with high transparency, clarity and stability of the solution. The DeCoviD project is a part of a broader strategy of competence development in the area of deep learning + XAI + medical applications at the Warsaw University of Technology. More information: https://github.com/MI2DataLab/DeCoviD. Work on this project is financially supported by the IDUB against COVID PW. DALEX 2018-2022 DALEX: Descriptive and model Agnostic Local EXplanations Research project objectives. Black boxes are complex machine learning models, for example deep neural network, an ensemble of trees of high-dimensional regression model. They are commonly used due to they high performance. But how to understand the structure of a black-box, a model in which decision rules are too cryptic for humans? The aim of the project is to create a methodology for such exploration. To address this issue we will develop methods, that: (1) identify key variables that mostly determine a model response, (2) explain a single model response in a compact visual way through local approximations, (3) enrich model diagnostic plots. Research project methodology. This project is divided into three subprojects - local approximations od complex models (called LIVE), explanations of particular model predictions (called EXPLAIN) and conditional explanations (called CONDA). Expected impact on the development of science. Explanations of black boxes have fundamental implications for the field of predictive and statistical modelling. The advent of big data forces imposes usage of black boxes that are easily able to overperform classical methods. But the high performance itself does not imply that the model is appropriate. Thus, especially in applications to personalized medicine or some regulated fields, one should scrutinize decision rules incorporated in the model. New methods and tools for exploration of black-box models are useful for quick identification of problems with the model structure and increase the interpretability of a black-box Work on this project is financially supported from the OPUS grant 2017/27/B/ST6/01307 funded by Polish National Science Centre (NCN). MLGenSig 2017-2021 MLGenSig: Machine Learning Methods for building of Integrated Genetic Signatures Research project objectives. The main scientific goal of this project is to develop a methodology for integrated genetic signatures based on data from divergent high-throughput techniques used in molecular biology. Integrated signatures base on ensembles of signatures for RNA-seq, DNA-seq, data as well for methylation profiles and protein expression microarrays. The advent of high throughput methods allows to measure dozens of thousands or even millions features on different levels like DNA / RNA / protein. And nowadays in many large scale studies scientists use data from mRNA seq to assess the state of transcriptome, protein microarrays to asses the state of proteome and DNA-seq / bisulfide methylation to assess genome / methylome. Research methodology. Genetic signatures are widely used in different applications, among others: for assessing genes that differentiate cells that are chemo resistant vs. cells that are not, assess the stage of cell pluripotency, define molecular cancer subtypes. For example, in database Molecular Signatures Database v5.0 one can find thousands of gene sets - genetic signatures for various conditions. There are signatures that characterize some cancer cells, pluripotent cells and other groups. But they usually contain relatively small number of genes (around 100), results with them are hard to replicate and they are collection of features that were found significant when independently tested. In most cases signatures are derived from measurements of the same type. Like signatures based of expression of transcripts based on data from microarrays or RNA-seq, or methylation profile or DNA variation. We are proposing a very different approach. First we are going to use machine-learning techniques to create large collections of signatures. Such signatures base on ensembles of small sub-signatures, are more robust and usually have higher precision. Then out of such signatures we are going to develop methodology for meta-signatures, that integrate information from different types of data (transcriptome, proteome, genome). Great examples of such studies are: Progenitor Cell Biology Consortium (PCBC) and The Cancer Genome Atlas (TCGA) studies. For thousands of patients in different cohorts (for PCBC cohorts based on stemness phenotype, for TCGA based on cancer type) measurements of both mRNA, miRNA, DNA and methylation profiles are available. New, large datasets require new methods that take into account high and dense structure of dependencies between features. The task that we are going to solve is to develop methodology that will create genetic signatures that integrate information from different levels of cell functioning. Then we are going to use data from TCGA and PBCB project to assess the quality of proposed methodology. As a baseline we are going to use following methodologies: DESeq, edgeR (for mRNA), casper (for lternative splicing), MethylKit (for RRBS data) and RPPanalyzer for protein arrays. Here is the skeleton for our approach: (1) Use ensembles in order to building a genetic signature. The first step would be to use random forests to train a new signature. Ensembles of sub-signtures are build on bootstrap subsamples and they votes if given sample fit given signature or not. (2) In order to improve signatures we are going to consider various normalization of raw counts. We start with log and rank transformation. (3) In order to improve the process of training an ensemble we are going to use pre-filtering of genes. (4) Another approach is to use Bayesian based methods, that may incorporate the expert knowledge, like belief-based gaussian modelling Research project impact. Genetic profiling is more and more important and has number of application starting from basic classification up to personalized medicine in which patients are profiled against different signatures. Existing tools for genetic signatures have many citations. This we assume that the methodology for integrated genetic profiling will be a very useful for many research groups. It is hard to overestimate the impact of better genetic profiling on medicine. Moreover we build a team of people with knowledge in cancer genetic profiling Work on this project is financially supported from the OPUS grant 2016/21/B/ST6/02176 funded by Polish National Science Centre (NCN). "],["thesis-proposals.html", "Thesis proposals", " Thesis proposals The MI2.AI team is the place where you can conduct research leading to your engineering, master’s or PhD thesis. As a general rule (although there are exceptions), engineering theses focus on the development of software, master’s theses on the development of a data analysis method, PhD theses on the solution of a larger scientific problem. We are currently working in four areas. Below are general topics on which you can build an interesting thesis Red Teaming AI models Explaining computer vision models with diffusion models: generative models, and diffusion models in particular, offer impressive capabilities for conditional image manipulation, conditional sampling and allow to incorporate external (not seen during training) objectives into the generative process. One of the ways to advance the state of current methodologies for explaining visual classifiers would be to use diffusion models as a tool to find or synthesize explanations. Many projects with varying levels of detail and advancement are available. For an example paper from this research field, see this work developed in our lab. Feel free to contact us if this topic is of interest to you. XAI against Cancer Analysis of the distribution of tumours in the Polish population XAI for Space TODO XAI for Education TODO "],["contact.html", "Contact", " Contact Feel free to contact with Przemyslaw Biecek through mini-pw email or mim-uw email. Our rooms: 44 (DataLab - separate entrance in front of the main entrance) 316 (xLungs) 317 (HOMER) Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warszawa VAT: PL 5250005834 "],["mi2redteam.html", "MI²RedTeam", " MI²RedTeam MI²RedTeam analyses machine and deep learning predictive models through the lens of AI explainability, fairness, security and human trust. We develop methods and tools for explanatory model analysis and apply them in practice. MI²RedTeam is a group of researchers experienced in XAI who perform a rigorous evaluation of AI solutions in order to improve their transparency and security. We apply state-of-the-art methods and introduce new ones to tailor our analysis to the specific predictive task. We openly collaborate on various topics related to explainable and interpretable machine learning. Feel free to reach out to us with research ideas and development opportunities. We help organizations to better understand the vulnerabilities of their AI systems, and take steps to mitigate them. Red-Teaming SAM Red-Teaming Segment Anything Model Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek CVPR Workshops (2024) The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis of SAM. We analyze the impact of style transfer on segmentation masks. We assess whether the model can be used for attacks on privacy, such as recognizing celebrities’ faces. Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. Red-Teaming HSI Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. Adversarial attacks and defenses for XAI Adversarial attacks and defenses in explainable artificial intelligence: A survey Hubert Baniecki, Przemysław Biecek Information Fusion (2024) Explanations of machine learning models can be manipulated. We introduce a unified notation and taxonomy of adversarial attacks on explanations. Adversarial examples, data poisoning, and backdoor attacks are key safety issues in XAI. Defense methods like model regularization improve the robustness of explanations. We outline the emerging research directions in adversarial XAI. Software: survex survex: an R package for explaining machine learning survival models Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N Wright, Przemysław Biecek Bioinformatics (2023) This paper demonstrates the functionalities of the survex package, which provides a comprehensive set of tools for explaining machine learning survival models. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, promoting transparency and responsibility in sensitive areas. SurvSHAP(t) SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. IEMA The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Software: fairmodels fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. Fooling PDP Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Fooling SHAP Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. Models in the Wild Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. Software: auditor auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. "],["mi²cancer.html", "MI²Cancer", " MI²Cancer 2024 How the xLungs project is developed? (polish only) 2023 Machine learning models demonstrate that clinicopathologic variables are comparable to gene expression prognostic signature in predicting survival in uveal melanoma Piotr Donizy, Mateusz Krzyzinski, Anna Markiewicz, Pawel Karpinski, Krzysztof Kotowski, Artur Kowalik, Jolanta Orlowska-Heitzman, Bozena Romanowska-Dixon, Przemyslaw Biecek, Mai P. Hoang European Journal of Cancer (2023) Molecular assays are not accessible to all uveal melanoma patients. We investigate machine learning models on clinicopathologic variables for risk stratification. Machine learning models included random survival forest and survival gradient boosting. They performed similarly or better than gene expression prognostic signature. Readily accessible clinicopathologic variables can provide adequate prognostic information. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Ki67 is a better marker than PRAME in risk stratification of BAP1-positive and BAP1-loss uveal melanomas Piotr Donizy, Mikołaj Spytek, Mateusz Krzyziński, Krzysztof Kotowski, Anna Markiewicz, Bozena Romanowska-Dixon, Przemyslaw Biecek, Mai P Hoang British Journal of Ophthalmology (2023) Accurate risk stratification of uveal melanoma (UM) patients is important for determining the interval and frequency of surveillance. Loss of BAP1 expression has been shown to be strongly associated with UM-related death and metastasis. In this study of 164 enucleated UMs, we assessed the prognostic role of preferentially expressed antigen in melanoma (PRAME) expression and Ki67 proliferation index measured by digital quantitation using QuPath programme in patients with BAP1-positive and BAP1-loss UMs. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski, Elzbieta Tryniszewska, Joanna Reszec, Zofia Dzieciol-Anikiej, Cezary Piwkowski, Robert Gryczka, Rodryg Ramlau Cancers (2023) Although the biological function of lncRNAs has not been fully elucidated, we know that the aberrant expression of lncRNAs can drive the cancer phenotype. Therefore, a growing area of research is focusing on lncRNAs as putative diagnostic biomarkers and therapeutic targets. The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed lncRNA in the early stages of NSCLC. We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. Applied Molecular-Based Quality Control of Biobanked Samples for Multi-Omics Approach Anna Michalska-Falkowska, Jacek Niklinski, Hartmut Juhl, Anetta Sulewska, Joanna Kisluk, Radoslaw Charkiewicz, Michal Ciborowski, Rodryg Ramlau, Robert Gryczka, Cezary Piwkowski, Miroslaw Kozlowski, Borys Miskiewicz, Przemyslaw Biecek, Karolina Wnorowska, Zofia Dzieciol-Anikiej, Karine Sargsyan, Wojciech Naumnik, Robert Mroz, Joanna Reszec-Gielazyn Cancers (2023) This study highlights the significance of quality assurance in biobanking facilities, specifically in the context of high-throughput research and novel molecular techniques. We established specific quality management workflows utilizing biospecimens collected from oncological patients in Polish clinics Merkel Cell Carcinoma of Unknown Primary: Immunohistochemical and Molecular Analyses Reveal Distinct UV-Signatures Piotr Donizy, Joanna Wróblewska, Dora Dias-Santagata, Katarzyna Woznica, Przemyslaw Biecek, Mark Mochel, Cheng-Lin Wu, Janusz Kopczynski, Malgorzata Pieniazek, Janusz Ryś, Andrzej Marszalek, Mai Hoang Cancers (2023) Similar to primary cutaneous Merkel cell carcinomas, virus-negative unknown primary tumors exhibited UV signatures and frequent high tumor mutational burdens, whereas few molecular alterations were noted in virus-positive tumors. Although additional studies are warranted for the virus-positive cases, our findings are supportive of a cutaneous metastatic origin for virus-negative Merkel cell carcinomas of unknown primary. miRNA Studies in Glaucoma: A Comprehensive Review of Current Knowledge and Future Perspectives Margarita Dobrzycka, Anetta Sulewska, Przemyslaw Biecek, Radoslaw Charkiewicz, Piotr Karabowicz, Angelika Charkiewicz, Kinga Golaszewska, Patrycja Milewska, Anna Michalska-Falkowska, Karolina Nowak, Jacek Niklinski, Joanna Konopińska International Journal of Molecular Sciences (2023) miRNA research in glaucoma has provided significant insights into the molecular mechanisms of the disease, offering potential biomarkers, diagnostic tools, and therapeutic targets. However, addressing challenges such as variability and limited tissue accessibility is essential, and further investigations and validation will contribute to a deeper understanding of the functional significance of miRNAs in glaucoma. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. 2022 Amelanotic Uveal Melanomas Evaluated by Indirect Ophthalmoscopy Reveal Better Long-Term Prognosis Than Pigmented Primary Tumours—A Single Centre Experience Anna Markiewicz, Piotr Donizy, Monika Nowa, Mateusz Krzyziński, Martyna Elas, Przemysław Płonka, Jolanta Orłowska-Heitzmann, Przemysław Biecek, Mai P. Hoang, Bożena Romanowska-Dixon Cancers (2022) Patients with amelanotic uveal melanomas (those without pigment) lived longer and the eventual spread of the neoplastic process occurred later than in patients with heavily pigmented tumours. In heavily pigmented uveal melanomas, we found features on histopathological examination that were associated with an unfavourable prognosis. In the two separate groups of uveal melanomas with different degrees of pigmentation, we observed that amelanotic tumours with a lower clinical stage had the best prognosis. 2021 Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. 2017 Molecular chaperones in the acquisition of cancer cell chemoresistance with mutated TP53 and MDM2 up-regulation Zuzanna Tracz-Gaszewska, Marta Klimczak, Przemyslaw Biecek, Marcin Herok, Marcin Kosinski, Maciej Olszewski, Patrycja Czerwińska, Milena Wiech, Maciej Wiznerowicz, Alicja Zylicz, Maciej Zylicz, Bartosz Wawrzynow Oncotarget (2017) Utilizing the TCGA PANCAN12 dataset we discovered that cancer patients with mutations in TP53 tumor suppressor and overexpression of MDM2 oncogene exhibited decreased survival post treatment. Our findings demonstrate that molecular chaperones aid cancer cells in surviving the cytotoxic effect of chemotherapeutics and may have therapeutic implications. "],["mi²space.html", "MI²Space", " MI²Space MI²Space Team develops methods, software, and systems for the validation, debugging and auditing of artificial intelligence algorithms used in space missions. The research is being conducted for the European Space Agency. Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. "],["mi²betabit.html", "MI²BetaBit", " MI²BetaBit Beta Bit is a series of books about data analysis, data visualisation and machine learning using the adventures of two scientists - mathematician Beta and computer scientist Bit. Together they have interesting experiences analysing a wide variety of data. Because data analysis is one of the most interesting adventures! Explanatory Model Analysis Explanatory Model Analysis Explore, Explain, and Examine Predictive Models. With examples in R and Python Przemysław Biecek, Tomasz Burzykowski Chapman and Hall/CRC, New York (2021) Chaos Game Chaos Game EN: Are you curious about fractals? The Chaos Game is the book for you. You will learn the mathematical basis behind these figures, find out what algorithm can be used to code them, write code in your favourite programming language (Python, R, Julia?) and also explore the bibliographies of three mathematicians associated with the development of mathematics around these shapes. This is the next book in the Beta Bit series for anyone interested in computational mathematics and data analysis. PL: Jesteś ciekawy czym są fraktale? Gra w Chaos to książka dla Ciebie. Poznasz matematyczne podstawy tych figur, dowiesz się, jaki algorytm można wykorzystać do ich zaprogramowania, napiszesz kod w swoim ulubionym języku programowania (Python, R, Julia?), a także poznasz bibliografie trzech matematyków związanych z rozwojem matematyki wokół tych kształtów. To kolejna książka z serii Beta Bit dla wszystkich zainteresowanych matematyką obliczeniową i analizą danych. Flipbook online [ENG] Flipbook online [POL] Wykresy od kuchni Wykresy od kuchni PL: Jak tworzyć dobre wykresy? Dobre, czyli takie, które z przyjemnością się ogląda, z których można wyciągnąć wiele informacji, które są zrozumiałe dla szerokiego odbiorcy, a jednocześnie docenią je smakosze. Na bazie doświadczeń z prowadzenia tych zajęć powstały Wykresy od kuchni. To zbiór krótkich wykładów omawiających różne wątki przydatne w lepszym zrozumieniu tego, jak działa komunikacja z użyciem wykresów statystycznych. Na kolejnych stronach pojawi się wiele analogii do przyrządzania posiłków, ponieważ zarówno w kuchni, jak i w przygotowaniu wykresów statystycznych potrzebna jest praktyka, znajomość pewnych fundamentalnych prawideł, garść sprawdzonych przepisów i dużo zapału do eksperymentowania. Będąc tak uzbrojonym, każdy adept sztuki kulinarnej jest skazany na sukces. Flipbook online [POL] The Hitchhiker’s Guide to Responsible Machine Learning The Hitchhiker’s Guide to Responsible Machine Learning EN: A one-of-a-kind 52-page story about responsible machine learning. Beta and Bit use decision trees, random forests, and AutoML tools to build a risk model after a covid infection, and then use explainable artificial intelligence tools to analyze the behavior of that model. The description of the data analysis process is intertwined with descriptions of ML tools and code snippets. All examples are fully reproducible! PL: Jedyna w swoim rodzaju 52-stronicowa opowieść o odpowiedzialnym uczeniu maszynowym. Beta i Bit używają drzew decyzyjnych, lasów losowych i narzędzi AutoML do budowy modelu ryzyka po zakażeniu covid, a następnie używają narzędzi wyjaśnialnej sztucznej inteligencji by przeanalizować działanie tego modelu. Opis procesu analizy danych przeplata się na opisem kolejnych narzędzi i przykładami kodu. Wszystkie wyniki są całkowicie odtwarzalne! Flipbook online Przemysław Biecek, Anna Kozak, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 W pogoni za nieskończonością. Szeregi W pogoni za nieskończonością. Szeregi EN: What does hiking in the mountains have to do with the convergence of series? Quite a lot! We start with the paradoxes related to infinity, but step by step we learn the techniques of geometric series. In this book, the conditions for convergence are explained, together with numerous examples. The comic ends with a collection of exercises with different levels of difficulty. PL: Co wspólnego ma chodzenie po górach ze zbieżnością szeregów? Otóż całkiem sporo! Zaczynamy od paradoksów związanych z nieskończonością, ale krok po kroku poznajemy techniki szeregów geometrycznych. W tej pozycji wyjaśnione są warunki zbieżności wraz z licznymi przykładami. Komiks kończy zbiór zadań o różnych poziomach trudności. Flipbook online Przemysław Biecek, Łukasz Maciejewski, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 Przewodnik po pakiecie R Przewodnik po pakiecie R EN: The Guide to the R package was the first published Polish book focused on the R language. The current fourth edition consists of four parts: Basics of using R (+tidyverse, shiny, knitr and other goodies), Programming in R (object-oriented, package development, class system), Statistics with R (statistical tests, models, exploration techniques) and Visualization with R (graphics, lattice and ggplot2 packages). PL: Przewodnik po pakiecie R był pierwszą wydaną polskojęzyczną książką poświęconą językowi R. Aktualne czwarte wydanie składa się z czterech części: Podstaw posługiwania się językiem R (+tidyverse, shiny, knitr i inne smaczki), Programowanie w R (obiektowe, tworzenie pakietów, system klas), Statystyka z R (testy statystyczne, modele, techniki eksploracji) i Wizualizacja z R (pakiety graphics, lattice i ggplot2). Wersja online, Książka w księgarnii. Przemysław Biecek Wydawnictwo GiS. 2008-2021 Analiza danych z programem R Analiza danych z programem R EN: An academic textbook describing estimation and testing topics for linear models with fixed effects, random effects and mixed effects. The theoretical introduction is complemented by numerous examples for one-way and multivariate ANOVA, one and multiple random components. The examples focus on biological and medical applications and are based on real analyses of real data. PL: Podręcznik akademicki opisujący zagadnienia estymacji i testowania dla modeli liniowych z efektami stałymi, losowymi i mieszanymi. Wprowadzenie teoretyczne jest uzupełnione o liczne przykłady dla jednokierunkowej i wielokierunkowej ANOVA, jednym i wieloma komponentami losowymi. Przykłady dotyczą głównie zastosowań biologicznych i medycznych i bazują na prawdziwych analizach rzeczywistych danych. Książka w księgarnii. Przemysław Biecek Wydawnictwo Naukowe PWN 2013-2018 Eseje o sztuce wizualizacji danych Eseje o sztuce wizualizacji danych EN: Discover! Reveal! Explain! These three roles can be fulfilled by good statistical graphics. Good means understandable, faithful to the data, aesthetic. How to create such graphics? A collection of essays on the art of displaying data systematises knowledge useful in designing and producing good data visualisations. It is not easy. On the one hand, we can fall into the trap of a colourful mush full of numbers, which is sometimes proudly called infographics. On the other hand, we can fall into the trap of graphics that perfectly reproduce the complexity of numbers, and thus completely incomprehensible. Somewhere in the middle is a graphic that explains, that informs, that is aesthetically pleasing and informative. PL: Odkrywać! Ujawniać! Objaśniać! Te trzy role może spełniać dobra grafika statystyczna. Dobra czyli zrozumiała, wierna danym, estetyczna. Jak tworzyć taką grafikę? Zbiór esejów o sztuce pokazywania danych systematyzuje wiedzę przydatną do projektowania i wykonania dobrej wizualizacji danych. Nie jest to proste. Z jednej strony możemy wpaść w pułapkę pstrokatej papki najeżonej liczbami, którą czasem dumnie nazywa się infografiką. Z drugiej strony wpaść można w pułapkę grafiki idealnie odwzorowującej złożoność liczb a przez to zupełnie niezrozumiałej. Gdzieś po środku jest grafika, która wyjaśnia, która informuje, która jest estetyczna i informatywna. Książka online, Książka w księgarnii. Przemysław Biecek Wydawnictwo SmarterPoland 2008-2021 Pogromcy Danych Pogromcy Danych EN: Data Crunchers is the first MOOC (Massive Open Online Course) developed in Polish for data scientists. Two modules were developed in 2015: the first one is an introduction to R, with loading data, overview of syntax, basic data types, descriptive statistics and pipelined processing. The second module is dedicated to data visualisation and statistical modelling. More than 8,000 people have registered on the Data Crunchers platform. PL: Pogromcy Danych to pierwszy MOOC (Massive Open Online Course) opracowany w języku polskim do analizy danych. W roku 2015 powstały dwa moduły: pierwszy jest wprowadzeniem do programu R, przez wczytywanie danych, omówienie składni, podstawowych typów danych, statystyk opisowych oraz przetwarzania potokowego. Drugi moduł poświęcony jest wizualizacji danych oraz modelowaniu statystycznemu. W platformie Pogromców Danych zarejestrowało się ponad 8000 osób. Przetwarzanie danych w programie R, Wizualizacja i modelowanie, Strona WWW. Przemysław Biecek ICM UW. 2015 Wykresy unplugged Wykresy unplugged EN: Can you create clear charts without any electricity? An illustrated collection of exercises showing eight of the most popular ways to visualise data, with do-it-yourself challenges. Grab your crayons and start creating fantastic charts. PL: Czy można tworzyć czytelne wykresy bez użycia prądu? Ilustrowany zbiór ćwiczeń przedstawiających osiem najpopularniejszych sposobów wizualizacji danych, wraz z zadaniami do samodzielnego wykonania. Weź kredki i zacznij tworzyć fantastyczne wykresy. Flipbook online, Komiks w księgarnii. Przemysław Biecek, Ewa Baranowska, Piotr Sobczyk Fundacja Naukowa SmarterPoland.pl. 2018 W pogoni za nieskończonością W pogoni za nieskończonością EN: Two mathematicians share stories about infinity. In the first Beta attends a lecture on the properties of prime numbers. In the second, Bit breaks into the Palace of Culture and Science. How should we talk about mathematics? PL: Dwójka matematyków wymienia się opowiadaniami o nieskończoności. W pierwszym Beta bierze udział w wykładzie o właściwościach liczb pierwszych. W drugim Bit włamuje się do Pałacu Kultury i Nauki. Jak opowiadać o matematyce? Flipbook online, Komiks w księgarnii. Przemysław Biecek, Łukasz Maciejewski, Tomasz Samojlik, Sebastian Szpakowski Fundacja Naukowa SmarterPoland.pl. 2018 Jak długo żyją Muffinki? Jak długo żyją Muffinki? EN: A collection of three stories for children showing statistical relationships in the world around us. Beautifully illustrated stories about the distribution of height according to age, the life span of dogs or measuring the weight of trees. PL: Zbiór trzech opowiadań dla dzieci pokazującym zależności statystyczne w świecie wokół nas. Pięknie ilustrowane opowiadania o rozkładzie wzrostu w zależności od wieku, czasie życia psów czy pomiarze wagi drzew. Online: Jak szybko urosnę, Jak długo żyją Muffinki. Przemysław Biecek Fundacja Naukowa SmarterPoland.pl.2016 Pieczara Pietraszki Pieczara Pietraszki EN: How linear regression can help in getting home, and why it’s not worth hacking into a mad mathematician’s office. A short story describing the adventures of two teenagers Beta and Bit moving around historic Warsaw. PL: W jaki sposób regresja liniowa może pomóc w powrocie do domu, oraz dlaczego nie warto włamywać się do pokoju szalonego matematyka? Lekkie opowiadanie opisujące przygody dwójki nastolatków Bety i Bita w historycznej Warszawie. Online: W jezyku Polskim, In English, По-Русски. Magda Chudzian, Przemysław Biecek Fundacja Naukowa SmarterPoland.pl. 2015 How to weight a dog with a ruler? How to weight a dog with a ruler? EN: Workshop materials for children aged 8-10. Kids measure different parameters of their body, such as arm span or height. Then they create a graph summarizing the collected data and look for relations between the measured features. It just so happens that parts of the human body are proportional to each other and you can use a ruler to find this relationship. Part of the StatTub project. PL: Materiały do warsztaty dla dzieci w wieku 8-10. Dzieci mierzą różne parametry swojego ciała, takie jak rozpiętość ramion lub wzrost. Następnie tworzą wykres podsumowujący zebrane dane i szukają zależności pomiędzy zmierzonymi cechami. Tak się składa, że części ciała ludzkiego są do siebie proporcjonalne i można z użyciem linijki znaleźć tę relację. Część projektu StatTuba. Online: English, Polish, Chinese, Simplified Chinese, Czech, German, Spanish, Spanish (Latin America), French, Dutch, Vietnamese. Przemysław Biecek, Klaudia Korniluk Fundacja Naukowa SmarterPoland.pl.2016-2021 "],["mi²solutions.html", "MI²Solutions", " MI²Solutions Hire a team of experienced researchers. The blue team will help you develop good predictive models, create a responsible solution tailored to your needs. The red team will help you find and analyse any weaknesses in your predictive models. It will help you confront them with domain knowledge and make sure they are resilient to future changes in the data. If you need tailor-made solutions for your individual needs, we are happy to help you too. Contact us, we can develop software for you, deploy it, provide training, discuss your needs, verify the quality of your existing solutions. Below you will find a sample offer for trainings or deployments. Research as a service Our team has experience not only in groundbreaking research, but also in deploying these research into business. There are many ways we can help, for example help in delivery of champion-challenger evaluations in which we look for potential to increase the effectiveness of predictive models in your company. take care of the whole life cycle of the predictive models, from reproducibility of results to constant monitoring and continuous improvement of the model. audit models and analyse the sensitivity and vulnerability of the model to incorrect or unexpected behaviours. We would be happy to discuss how we could help with your organisation! "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "MI² MI².AI", " MI² MI².AI On a mission to responsibly build machine learning predictive models. MI².AI is a group of mathematicians and computer scientists who love to play with predictive models. We are spread between Warsaw University of Technology and University of Warsaw. Here we have workshops and seminars, here we are forging new ideas, creating tools, solving problems, doing consulting and sharing our positive attitude. Feel free to jump in. Mission Machine learning is like atomic energy. We develop leaders, skills, methods, tools and good practices so that predictive models can be deployed responsibly and sustainably. Vision MI² is a group of experts supporting global initiatives aimed at responsible and sustainable machine learning. We support the development of future leaders of responsible machine learning through internships, PhDs, postdoctoral fellowships and so on. We seek for research grants and business projects to conduct both scientific and applied research. We develop and maintain the software and infrastructure necessary to build responsible and sustainable ML. We develop cooperation with international teams working on similar topics. We support companies to implement best practices related to responsible modelling in their operation. We conduct workshops and training on responsible predictive modelling. "],["the-team.html", "The Team", " The Team Members Przemysław Biecek, PhD, DSc (Team Leader) Hubert Baniecki, PhD student Mustafa Cavus, PhD Maciej Chrabąszcz, PhD student Weronika Hryniewska-Guzik, PhD student Filip Kołodziejczyk, MSc student Mateusz Krzyziński, MSc student Tymoteusz Kwieciński, BSc student Stanisław Łaniewski, PhD student Wiktoria Mieleszczenko-Kowszewicz, PhD Nuno Sepúlveda, PhD Bartek Sobieski, MSc student Mikołaj Spytek, MSc student Jakub Świstak, MSc student Paulina Tomaszewska, PhD student Piotr Wilczyński, BSc student Katarzyna Woźnica, PhD student Vladimir Zaigrajew, PhD student Collaborators Mariusz Adamek, Prof, MD Przemysław Bombiński, PhD, MD André Fonseca, PhD student Stanisław Giziński, MSc student Katarzyna Kobylińska, PhD student Piotr Komorowski, MSc Anna Kozak, MSc Marcin Luckner, PhD João Malato, PhD student Bartek Pieliński, PhD, DSc Hanna Piotrowska, MA Barbara Rychalska, PhD Elżbieta Sienkiewicz, PhD Julian Sienkiewicz, PhD Tomasz Stanisławek, PhD Adrian Stańdo, MSc student Patryk Szatkowski, PhD student, MD Emilia Wiśnios, MSc student Jakub Wiśniewski, MSc student Mateusz Wójcik, MSc student Alumni Piotr Czarnecki, MSc Alicja Gosiewska, MSc Adrianna Grudzień, BSc Mateusz Grzyb, MSc student Paulina Kaczyńska, MSc student Maria Kałuska, BSc Marcin Kosiński, MSc Adam Kozłowski, MSc Wojciech Kretowicz, BSc Michał Kuźba, MSc Szymon Maksymiuk, BSc Tomasz Mikołajczyk, PhD Katarzyna Pękala, MSc Piotr Piątyszek, BSc student Hubert Ruczyński, MSc student Adam Rydelek, BSc Bartosz Sawicki, BSc Patryk Słowakiewicz, BSc Michał Sokólski, MSc Mateusz Stączek, BSc Szymon Szmajdziński, BSc Zuzanna Trafas, BSc Hoang Thien Ly, BSc Kinga Ułasik, BSc Anna Wróblewska, PhD Paweł Wojciechowski, BSc Hanna Zdulska, BSc Artur Żółkowski, BSc Przemysław Biecek My personal mission is to enhance human capabilities by supporting them through access to data-driven and knowledge-based predictions. I execute it by developing methods and tools for responsible machine learning, trustworthy artificial intelligence and reliable software engineering. I work as an associate professor at Warsaw University of Technology and the University of Warsaw. I graduated in software engineering and mathematical statistics and now work on model visualisation, explanatory model analysis, predictive modelling and data science for healthcare. In 2016, I formed the research group MI² which develops methods and tools for predictive model analysis. Google Scholar: Af0O75cAAAAJ GitHub: pbiecek LinkedIn: pbiecek Mariusz Adamek I work at two Medical Universities (Silesia and Gdańsk) holding a Professorship in Medicine and Health Sciences. My interests are focused on lung cancer prevention and screening, the latter by means of low-dose computed tomography (LDCT) with special emphasis put on molecular biology methods, prediction models and image analysis aimed to enhance the performance of lung screening outcomes. Website: www.mariuszadamek.io Hubert Baniecki I’m a PhD student in Computer Science at the University of Warsaw. Prior, I did my MSc (2022) and BSc (2021) in Data Science at Warsaw University of Technology. My main research interest is explainable machine learning, with particular emphasis on adversarial attacks & explanation evaluation. Website: hbaniecki.com Mustafa Cavus I work as an assistant professor at the Eskisehir Technical University. I joined the MI² DataLab as a post-doc researcher in 2021. I work on glocal explanations and imbalanced learning. Google Scholar: I63d1WIAAAAJ&hl GitHub: mcavus LinkedIn: mcavus Twitter: mcavus Julian Sienkiewicz I work as an assistant professor at Faculty of Physics, WUT. My main research area links with sociophysics, complex networks and agent-based models. In the scope of MI² DataLab I follow my other interest - scientometrics. Google Scholar: mIwu11QAAAAJ LinkedIn: julek-sienkiewicz-873829 Maciej Chrabąszcz I am pursuing a PhD in Computer Science at Warsaw University of Technology, where I also obtained my MSc in Mathematical Statistics in 2023. My main research interests lie in the fields of responsible and explainable machine learning, with a focus on Red Teaming foundation models. GitHub: maciejchrabaszcz LinkedIn: maciej-chrabaszcz Stanisław Giziński A Research Software Engineer and student of Machine Learning at Faculty of Mathematics Informatics and Mechanic, University of Warsaw. His work in the lab focuses on using natural language processing and network analysis to better understand the spread of AI public policies. Interested also in applying machine learning in bioinformatics. Google Scholar: Stanisław Giziński GitHub: Gizzio LinkedIn: stanislaw-gizinski Mateusz Grzyb MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence and scientific computing, but above all simply enjoys programming. GitHub: mgrzyb99 Weronika Hryniewska-Guzik PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning modelling on medical images in the context of explainability and responsible AI. Google Scholar: aJeg3IQAAAAJ GitHub: Hryniewska LinkedIn: weronikahryniewska Paulina Kaczyńska I am working towards a Master’s degree in Machine Learning at University of Warsaw. I am interested in Natural Language Processing and ML applications in social sciences GitHub: Kaczyniec Piotr Komorowski Master’s student in Machine Learning at the University of Warsaw. Mainly interested in image processing and XAI applied to medical images. GitHub: piotr-komorowski LinkedIn: Piotr-Komorowski Anna Kozak Graduated in mathematical statistics at Warsaw University of Technology. Interested in explainable artificial intelligence and data visualization. Organizes projects related to education. Google Scholar: JIrqf9kAAAAJ GitHub: kozaka93 LinkedIn: kozakanna Mateusz Krzyziński MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, with particular emphasis on XAI methods for survival analysis models and XAI applications in the medical field. Also an enthusiast of data visualization. Google Scholar: i_r7EUgAAAAJ GitHub: krzyzinskim LinkedIn: krzyzinskim Tymoteusz Kwieciński BSc student in Data Science at Warsaw University of Technology. Particularly interested in explainable artificial intelligence, computer vision and NLP. GitHub: Fersoil LinkedIn: Tymoteusz-Kwieciński Stanisław Łaniewski PhD student in Quantitative Psychology and Economics at University of Warsaw, Machine Learning Researcher at MI2 Data Lab, Msc in Actuarial Science and Mathematical Finance at University of Amsterdam, former Quantitative Researcher at Flow Traders His research focuses on enhancing classical methods used in discrete choice and finance with machine learning and how to apply them to explain behavioral phenomena and heuristics. He is also keen on finding balance between best predictive models and their explainability. Avid gamer who applies statistical techniques to deepen the understanding of best strategies LinkedIn: Stanisław-Łaniewski Wiktoria Mieleszczenko-Kowszewicz PhD in social science, graduated from an interdisciplinary doctoral program: information and communication technologies & psychology. Researcher interested in the use of LLMs in psychometrics and developing responsible AI solutions for positive societal impact. LinkedIn: Wiktoria Mieleszczenko-Kowszewicz Piotr Piątyszek Undergraduate Data Science student at Warsaw University of Technology. Works as a research software engineer on enhancing accessibility and completeness of explainable AI. During pandemic contributes to a system of monitoring covid variants. Github: piotrpiatyszek Bartosz Pieliński I am an Assistant Professor at the Faculty of Political Science and International Studies at Warsaw University. I am interested in applying quantitative methods to study public policies. I am a founding member of the Institutional Grammar Research Initiative, which is focused on developing a new way of analysing social rules. I have participated in several research projects covering social policy, non-profit organizations, social enterprises, and international organizations. Website: https://pielinski.info/ Google Scholar: hnWiaVEAAAAJ LinkedIn: Bartosz Pieliński Hanna Piotrowska Information designer, focusing mainly on data visualization, branding and book design, with a strong interest in Data Science and perception studies. Winner of numerous awards, including The Kantar Information Is Beautiful Awards, HOW International Design Awards, Polish Graphic Design Awards and KTR. LinkedIn: hanna-piotrowska Twitter: hannapio Behance: hannapio. Hubert Ruczyński I am working towards Masters’s degree in Data Science at Warsaw University of Technology. I am also teaching students about data exploration and visualisation. My major interests are: AutoML | Natural Language Processing | Data Visualization | Fairness. GitHub: HubertR21 LinkedIn: Hubert Ruczyński Barbara Rychalska PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP), recommender systems and graph-based learning. Google Scholar: Wp0wHJoAAAAJ LinkedIn: Barbara-Rychalska Bartek Sobieski MSc student in Data Science at Warsaw University of Technology. Interested in deep learning and hyperparameter optimization. GitHub: sobieskibj LinkedIn: Bartłomiej-Sobieski Mikołaj Spytek MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, data vizualization and survival analysis. Google Scholar: 1u49AqYAAAAJ GitHub: mikolajsp LinkedIn: Mikołaj-Spytek Jakub Świstak MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence, NLP and computer vision. GitHub: jswistak LinkedIn: Jakub-Świstak Tomasz Stanisławek PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP). Google Scholar: gq8NY_UAAAAJ GitHub: tstanislawek LinkedIn: Tomasz-Stanisławek Paulina Tomaszewska PhD candidate in Computer Science at Warsaw University of Technology. Gained experience in AI at leading universities during: Deep Learning Summer School at Tsinghua University (China), one-semester exchange at Nanyang Technological University (Singapore) and research internships at Gwangju Institute of Science and Technology (South Korea) and Institute of Science and Technology (Austria). Mainly interested in Deep Learning, Computer Vision and Transfer Learning. Recently, focused on digital pathology. Google Scholar: eO245iMAAAAJ LinkedIn: paulina-tomaszewska Hoang Thien Ly Bachelor student in Maths and Data Analysis at Warsaw University of Technology. Interested in working with data, and learning explainable artificial intelligence methods. Google Scholar: JkysewYAAAAJ GitHub: lhthien09 LinkedIn: hthienly Piotr Wilczyński BSc student in Data Science at Warsaw University of Technology. Interested in Large Language Models, AI Deception and Natural Language Processing. Currently working on my thesis, which applies Computer Vision to medicine. GitHub: wi1lku LinkedIn: Piotr-Wilczyński Jakub Wiśniewski Research Software Engineer and third year Data Science student at Warsaw University of Technology. Developer of tools for bias detection and fairness. Currently researching responsible applications of deep learning. President of Data Science Science Club at WUT. Google Scholar: _6eQsXMAAAAJ GitHub: jakwisn LinkedIn: jakwisn Emilia Wiśnios Research Software Engineer and student of Machine Learning at Faculty of Mathematics, Informatics and Mechanics, University of Warsaw. Interested in natural language processing and reinforcement learning. GitHub: emiliawisnios LinkedIn: emilia-wisnios Paweł Wojciechowski Graduated with a bachelor’s degree in Data Science from Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision, and active learning. GitHub: p-wojciechowski LinkedIn: wojciechowski-p Katarzyna Woźnica PhD candidate in computer science at Warsaw University of Technology. Graduated in mathematical statistics. Interested in automated machine learning especially in hyperparameter tuning for tabular data. Carrying statistical analysis and predictive modelling for healthcare. Google Scholar: tAQS1gQAAAAJ GitHub: woznicak LinkedIn: woznicak Vladimir Zaigrajew PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning, primarily on images, with a focus on representation learning. GitHub: WolodjaZ LinkedIn: vladimir-zaigrajew Artur Żółkowski BSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision and NLP. GitHub: arturzolkowski LinkedIn: Artur-Żółkowski Filip Kołodziejczyk MSc student in Data Science at Warsaw University of Technology. Interested primarly in Large Language Models. Currently researching Red Teaming of such models. At the same time, a DevOps professional. GitHub: FilipKolodziejczyk LinkedIn: filip-kołodziejczyk-00 "],["seminars.html", "Seminars", " Seminars We meet every Monday, at 10 am online or in MI2DataLab (room 044, Faculty of Mathematics and Information Science, Warsaw University of Technology). Join us at http://meet.drwhy.ai/ List of topics and materials from past seminars: https://github.com/MI2DataLab/MI2DataLab_Seminarium "],["papers.html", "Papers", " Papers On the Robustness of Global Feature Effect Explanations Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek ECML PKDD (2024) We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally. Red-Teaming Segment Anything Model Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek CVPR Workshops (2024) The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis of SAM. We analyze the impact of style transfer on segmentation masks. We assess whether the model can be used for attacks on privacy, such as recognizing celebrities’ faces. Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. Adversarial attacks and defenses in explainable artificial intelligence: A survey Hubert Baniecki, Przemysław Biecek Information Fusion (2024) Explanations of machine learning models can be manipulated. We introduce a unified notation and taxonomy of adversarial attacks on explanations. Adversarial examples, data poisoning, and backdoor attacks are key safety issues in XAI. Defense methods like model regularization improve the robustness of explanations. We outline the emerging research directions in adversarial XAI. survex: an R package for explaining machine learning survival models Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N Wright, Przemysław Biecek Bioinformatics (2023) This paper demonstrates the functionalities of the survex package, which provides a comprehensive set of tools for explaining machine learning survival models. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, promoting transparency and responsibility in sensitive areas. Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarks Katarzyna Woźnica, Mateusz Grzyb, Zuzanna Trafas, Przemysław Biecek Machine Learning (2023) This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar datasets. We show that a carefully selected static portfolio of hyperparameter configurations yields good results for anytime optimization, while maintaining the ease of use and implementation. We demonstrate the effectiveness of this approach through an empirical study for the XGBoost algorithm and the newly created metaMIMIC benchmarks of predictive tasks extracted from the MIMIC-IV medical database. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Climate Policy Tracker: Pipeline for automated analysis of public climate policies Artur Żółkowski, Mateusz Krzyziński, Piotr Wilczyński, Stanisław Giziński, Emilia Wiśnios, Bartosz Pieliński, Julian Sienkiewicz, Przemysław Biecek NeurIPS Workshop on Tackling Climate Change with Machine Learning (2022) In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. Explainable expected goal models for performance analysis in football analytics Mustafa Cavus, Przemyslaw Biecek International Conference on Data Science and Advanced Analytics (2022) The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of the score in modern football. This paper proposes an accurate expected goal model trained on 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. Moreover, we demonstrate a practical application of aggregated profiles to explain a group of observations on an accurate expected goal model for monitoring the team and player performance. Multi-omics disease module detection with an explainable Greedy Decision Forest Bastian Pfeifer, Hubert Baniecki, Anna Saranti, Przemyslaw Biecek, Andreas Holzinger Scientific Reports (2022) In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer. Interpretable meta-score for model performance Alicja Gosiewska, Katarzyna Woźnica, Przemysław Biecek Nature Machine Intelligence (2022) Elo-based predictive power (EPP) meta-score that is built on other performance measures and allows for interpretable comparisons of models. Differences between this score have a probabilistic interpretation and can be compared directly between data sets. Furthermore, this meta-score allows for an assessment of ranking fitness. We prove the properties of the Elo-based predictive power meta-score and support them with empirical results on a large-scale benchmark of 30 classification data sets. Additionally, we propose a unified benchmark ontology that provides a uniform description of benchmarks. fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease Angela Lombardi, Domenico Diacono, Nicola Amoroso, Przemysław Biecek, Alfonso Monaco, Loredana Bellantuono, Ester Pantaleo, Giancarlo Logroscino, Roberto De Blasi, Sabina Tangaro, Roberto Bellotti Brain Informatics (2022) In this work, we present a robust framework to (i) perform a threefold classification between healthy control subjects, individuals with cognitive impairment, and subjects with dementia using different cognitive indexes and (ii) analyze the variability of the explainability SHAP values associated with the decisions taken by the predictive models. We demonstrate that the SHAP values can accurately characterize how each index affects a patient’s cognitive status. Furthermore, we show that a longitudinal analysis of SHAP values can provide effective information on Alzheimer’s disease progression. LIMEcraft: handcrafted superpixel selection and inspection for Visual eXplanations Weronika Hryniewska, Adrianna Grudzień, Przemysław Biecek Machine Learning (2022) LIMEcraft enhances the process of explanation by allowing a user to interactively select semantically consistent areas and thoroughly examine the prediction for the image instance in case of many image features. Experiments on several models show that our tool improves model safety by inspecting model fairness for image pieces that may indicate model bias. The code is available at: this URL. Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski et al. Cancers (2022) The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed long non-coding RNAs (lncRNAs) in the early stages of non-small-cell lung cancer (NSCLC). We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python Hubert Baniecki, Wojciech Kretowicz, Piotr Piątyszek, Jakub Wiśniewski, Przemyslaw Biecek Journal of Machine Learning Research (2021) We introduce dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness. It adopts the design crafted through the development of various tools for explainable machine learning; thus, it aims at the unification of existing solutions. This library’s source code and documentation are available under open license at this URL. Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies Weronika Hryniewska, Przemysław Bombiński, Patryk Szatkowski, Paulina Tomaszewska, Artur Przelaskowski, Przemysław Biecek Pattern Recognition (2021) Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model. Towards explainable meta-learning Katarzyna Woźnica, Przemyslaw Biecek ECML PKDD Workshop on eXplainable Knowledge Discovery in Data Mining (2021) To build a new generation of meta-models we need a deeper understanding of the importance and effect of meta-features on the model tunability. In this paper, we propose techniques developed for eXplainable Artificial Intelligence (XAI) to examine and extract knowledge from black-box surrogate models. To our knowledge, this is the first paper that shows how post-hoc explainability can be used to improve the meta-learning. Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering Alicja Gosiewska, Anna Kozak, Przemysław Biecek Decision Support Systems (2021) We propose a framework that uses elastic black boxes as supervisor models to create simpler, less opaque, yet still accurate and interpretable glass box models. The new models were created using newly engineered features extracted with the help of a supervisor model. We supply the analysis using a large-scale benchmark on several tabular data sets from the OpenML database. The first SARS-CoV-2 genetic variants of concern (VOC) in Poland: The concept of a comprehensive approach to monitoring and surveillance of emerging variants Radosław Charkiewicz, Jacek Nikliński, Przemysław Biecek, Joanna Kiśluk, Sławomir Pancewicz, Anna Moniuszko-Malinowska, Robert Flisiak, Adam Krętowski, Janusz Dzięcioł, Marcin Moniuszko, Rafał Gierczyński, Grzegorz Juszczyk, Joanna Reszeć Advances in Medical Sciences (2021) This study shows the first confirmed case of SARS-CoV-2 in Poland with the lineage B.1.351 (known as 501Y.V2 South African variant), as well as another 18 cases with epidemiologically relevant lineage B.1.1.7, known as British variant. Responsible Prediction Making of COVID-19 Mortality (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2021) During the literature review of COVID-19 related prognosis and diagnosis, we found out that most of the predictive models are not faithful to the RAI principles, which can lead to biassed results and wrong reasoning. To solve this problem, we show how novel XAI techniques boost transparency, reproducibility and quality of models. Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. Explanations of Model Predictions with live and breakDown Packages Mateusz Staniak, Przemyslaw Biecek The R Journal (2018) Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used for explaining predictions from complex black box models and attributing parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state-of-the-art solutions, namely, lime (Pedersen and Benesty, 2018) which implements Locally Interpretable Model-agnostic Explanations and iml (Molnar et al., 2018) which implements Shapley values. DALEX: Explainers for Complex Predictive Models in R Przemyslaw Biecek Journal of Machine Learning Research (2018) This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method irrespective of its internal structure. Each explainer is linked with a specific aspect of a model. Every explainer presented here works for a single model or for a collection of models. In the latter case, models can be compared against each other. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. archivist: An R Package for Managing, Recording and Restoring Data Analysis Results Przemyslaw Biecek, Marcin Kosiński Journal of Statistical Software (2017) Everything that exists in R is an object (Chambers 2016). This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. "],["research-grants.html", "Research grants", " Research grants DeMeTeR: 2024-2028 DeMeTeR: Interpreting Diffusion Models Through Representations Diffusion models have been the latest revolution in the domain of generative modelling in computer vision, surpassing the capabilities of long-reigning generative adversarial networks , and are currently being adapted to multiple other domains and modalities. However, we still lack an in-depth understanding of their inner workings from both an empirical and theoretical standpoint. Considering that, the main goals of the DeMeTeR project are: to broaden the practical and theoretical understanding of diffusion-specific latent representations and architecture-specific internal representations of diffusion models, to develop novel methods of manipulating these representations that allow for enhancing safety and explainability of deep learning models Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2023/50/O/ST6/00301. PvSTATEM 2023-2027 PvSTATEM: Serological testing and treatment for P. Vivax: from a cluster-randomised trial in Ethiopia and Madagascar to a mobile-technology supported intervention The PvSTATEM project aims to demonstrate the efficacy and the community acceptability of P. vivax Serological Testing and Treatment (PvSeroTAT), a new intervention for the control and elimination of malaria, in cluster-randomised trials in Ethiopia and Madagascar. The project will also innovate new mobile technologies for the efficient implementation of PvSeroTAT in settings beyond clinical trials. The PvSeroTAT intervention includes a serological diagnostic test that measures antibodies to multiple P. vivax antigens and informs an individual-level treatment decision. However, the results from serological tests can also inform population-level surveillance of malaria. In this Hop-on project, mathematical models, machine learning tools, and digital technologies will be developed so that data generated by the clinical trials in Ethiopia and Madagascar can inform national malaria surveillance programs. Work on this project is financially supported by the HORIZON grant HORIZON-WIDERA-2022-ACCESS-07-01. GliomAI 2024 GliomAI: Artificial Intelligence for Radiogenomic Atlas of Gliomas The new 2021 WHO classification of brain tumours places more emphasis than before on genetic variation in the classification of tumour lesions. However, invasive procedures are required for genetic diagnosis, which pose risks to patients and limit access to molecular profiling. Radiomics, a non-invasive approach, allows the analysis of tumour features using imaging data such as magnetic resonance imaging (MRI), which is used to extract computational independent variables. This approach allows the analysis of heterogeneity, spatial relationships and textural patterns that characterise different tumour phenotypes, however, may not be graspable by human perception. The correlation of such computational variables obtained with genetic findings is called radiogenomics. Multidimensional datasets play a key role in the development of the field of radiogenomics. However, in order to do so, it is necessary to delineate regions of interest within imaging studies - so-called masks - which are ultimately used to extract computational variables. In this project, we plan to develop a novel radiomic database containing not only clinical, genetic and imaging data, but also the previously mentioned segmentation masks of gliomas and their immediate surroundings. To this end, an interdisciplinary research team will be formed, benefiting from the synergistic impact of the two units involved in the project at our Universities. Work on this project is financially supported by Warsaw Medical University and Warsaw University of Technology within the Collaboration Initiative Programme WUM_PW INTEGRA 1. PINEAPPLE 2023-2025 PINEAPPLE: Explainable AI for hyperspectral image analysis In the PINEAPPLE project (exPlaINablE Ai for hyPersPectraL imagE analysis), we will address the important research gap of lack of “trust” into (deep) machine learning algorithms for EO, through tackling two real-life EO downstream tasks (estimating soil parameters from HSI and detecting methane in such imagery) using new deep and classic machine learning algorithms empowered by new explainable AI (XAI) techniques. We believe that PINEAPPLE will be an important step toward not only “uncovering the magic” behind deep learning algorithms (hence building trust in them in EO downstream tasks), but also in showing that XAI techniques can be effectively utilized to improve such data-driven algorithms (both classic and deep machine learning-powered), ultimately leading to better algorithms. Finally, we will put special effort into: unbiasing the validation of existing and emerging algorithms through ensuring their full reproducibility (both at the algorithm and at the data level), and understanding & improving the generalization of such algorithms when fundamentally different data is used for testing (e.g., noisy, with simulated other atmospheric conditions, captured in different area/time, and so forth) Work on this project is financially supported by European Space Agency grant ESA AO/1-11524/22/I-DT. ARES 2022-2026 ARES: Attack-resistant Explanations toward Secure and trustworthy AI Machine learning explainability, fairness, robustness, and security are key elements of trustworthy Artificial Intelligence, an area of strategic importance. In this context, the main goals of the ARES project are: Develop adversarial attacks on state-of-the-art explanations to investigate vulnerabilities and limitations of the existing explainability and fairness approaches in machine learning. Introduce novel robust explanations that are stable against manipulation and intuitive to evaluate. Achieving the first goal primarily impacts various domains of research, which currently use (and explain) black-box models for knowledge discovery and decision-making, by highlighting vulnerabilities and limitations of their explanations. Achieving the second goal impacts more the broad machine learning domain as it aims at improving state-of-the-art by introducing robust explanations toward secure and trustworthy AI. Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2021/43/O/ST6/00347. DARLING 2022-2024 DARLING: Deep Analysis of Regulations with Language Inference, Network analysis and institutional Grammar Aim of the project Developing the tools for automated analysis of content of legal documents leveraging Natural Language Processing, that will help understand the dynamic of change in public policies and variables influencing those changes. Those tools will be firstly used to analyse the case of development of policy subsystem regulating usage of AI in the European Union. Specific goals of the project Developing and evaluating multilingual models for issue classification for legal and public policy documents. Developing embedding-based topic modeling methods for legal and public policy documents suited for analysis of change of the topics between documents. Institutional grammar based analysis of changes in topics between different public policy documents, regulations and public consultation documents. Agent-based models predicting diffusion of issues in public policy documents. Methodology The core of the DARLING project is the issues and topic analysis in documents connected with regulations development using NLP tools. Issues analysis shall allow tracking how different options of AI operationalisation, ways the AI-connected threats are perceived as well as ideas regarding AI regulations are shared among three different types of texts: scientific, expert and legal ones. The extracted issues will then be subject to complex networks analysis and institutional grammar approach. The network analysis, backed by agent-based modeling, will be used to examine the flow of issues among the documents based on their vector-formed characteristics. On the other hand, the Institutional Grammar (IG) will be used to analyze the modality of issues, e.g., the tendency to regulate a specific aspect of AI in a given issue, its deontic character or its conditionality. In result the DARLING project will effect in the development of new methods to analyze legal documents connected to regulation based on deep text processing and links among the documents. An inter-institutional and interdisciplinary team of computer, political sciences and physics of complex systems scientists will elaborate new machine learning approaches to examine the regulation corpora, issues recognition, issues analysis by the means of IG as well as propose new methods of modeling the flow/changes of regulations based on complex networks tools. X-LUNGS 2021-2024 X-LUNGS: Responsible Artificial Intelligence for Lung Diseases The aim of the project is to support the process of identification of lesions visible on CT and lung x-rays. We intend to achieve this goal by building an information system based on artificial intelligence (AI) that will support the radiologist’s work by enriching the images with additional information. The unique feature of the proposed system is a trustworthy artificial intelligence module that: will reduce the image analysis time needed to detect lesions, will make the image evaluation process more transparent, will provide image and textual explanations indicating the rationale behind the proposed recommendation, will be verified for effective collaboration with the radiologist. Work on this project is financially supported from the INFOSTRATEG-I/0022/2021-00 grant funded by Polish National Centre for Research and Development (NCBiR). HOMER 2020-2025 HOMER: Human Oriented autoMated machinE leaRning One of the biggest challenges in the state-of-the-art machine learning is dealing with the complexity of predictive models. Recent techniques like deep neural networks, gradient boosting or random forests create models with thousands or even millions of parameters. This makes decisions generated by these black-box models completely opaque. Model obscurity undermines trust in model decisions, hampers model debugging, blocks model auditability, exposes models to problems with concept drift or data drift. Recently, there has been a huge progress in the area of model interpretability, which results in the first generation of model explainers, methods for better understanding of factors that drive model decisions. Despite this progress, we are still far from methods that provide deep explanations, confronted with domain knowledge that satisfies our ,,Right to explanation’’ as listed in the General Data Protection Regulation (GDPR). In this project I am going to significantly advance next generation of explainers for predictive models. This will be a disruptive change in the way how machine learning models are created, deployed, and maintained. Currently to much time is spend on handcrafted models produced in a tedious and laborious try-and-error process. The proposed Human-Oriented Machine Learning will focus on the true bottleneck in development of new algorithms, i.e. on model-human interfaces. The particular directions I consider are (1) developing an uniform grammar for visual model exploration, (2) establishing a methodology for contrastive explanations that describe similarities and differences among different models, (3) advancing a methodology for non-additive model explanations, (4) creating new human-model interfaces for effective communication between models and humans, (5) introducing new techniques for training of interpretable models based on elastic surrogate black-box models, (6) rising new methods for automated auditing of fairness, biases and performance of predictive models. Work on this project is financially supported from the SONATA BIS grant 2019/34/E/ST6/00052 funded by Polish National Science Centre (NCN). DeCoviD 2020-2022 DeCoviD: Detection of Covid-19 related markers of pulmonary changes using Deep Neural Networks models supported by eXplainable Artificial Intelligence and Cognitive Compressed Sensing Covid-19 is an infectious respiratory disease. A coronavirus infection leaves permanent ramifications in the respiratory system and beyond. In this situation, tools supporting diagnosis and assessment of lung damage after infection and during Covid-19 treatment are crucial. Preliminary results of analysis of CT images and lung xrays suggest that they can help to quickly assess even asymptomatic cases and facilitate prognosis of response to treatment. There are also reports of usefulness of ultrasound images. The aim of the DeCoviD project is to develop methods and tools to support radiologists in the assessment of lung imaging data for the occurrence of changes caused by Covid-19 disease. The developed solution will allow to automate the identification of pathological changes and will support the diagnosis of coexisting lung diseases as well as diseases of other organs visible on chest images. It will also allow to quantify the severity of lung damage caused by the disease Responsible decision support for radiologists requires models based on interpretable features. Such features will be stored in a hybrid knowledge base powered by two research teams from WUT, working on the basis of two, seemingly opposite, paradigms of image data analysis. The eXplainable Artificial Intelligence (XAI) team will use trained deep networks to automatically extract features that are essential for effective disease detection. Cognitive Compressed Sensing (CCS) will build a set of interpretable semantic features using sparse cognitive representations agreed with a group of cooperating radiologists. Combining these two approaches will achieve high effectiveness of the constructed models, combined with high transparency, clarity and stability of the solution. The DeCoviD project is a part of a broader strategy of competence development in the area of deep learning + XAI + medical applications at the Warsaw University of Technology. More information: https://github.com/MI2DataLab/DeCoviD. Work on this project is financially supported by the IDUB against COVID PW. DALEX 2018-2022 DALEX: Descriptive and model Agnostic Local EXplanations Research project objectives. Black boxes are complex machine learning models, for example deep neural network, an ensemble of trees of high-dimensional regression model. They are commonly used due to they high performance. But how to understand the structure of a black-box, a model in which decision rules are too cryptic for humans? The aim of the project is to create a methodology for such exploration. To address this issue we will develop methods, that: (1) identify key variables that mostly determine a model response, (2) explain a single model response in a compact visual way through local approximations, (3) enrich model diagnostic plots. Research project methodology. This project is divided into three subprojects - local approximations od complex models (called LIVE), explanations of particular model predictions (called EXPLAIN) and conditional explanations (called CONDA). Expected impact on the development of science. Explanations of black boxes have fundamental implications for the field of predictive and statistical modelling. The advent of big data forces imposes usage of black boxes that are easily able to overperform classical methods. But the high performance itself does not imply that the model is appropriate. Thus, especially in applications to personalized medicine or some regulated fields, one should scrutinize decision rules incorporated in the model. New methods and tools for exploration of black-box models are useful for quick identification of problems with the model structure and increase the interpretability of a black-box Work on this project is financially supported from the OPUS grant 2017/27/B/ST6/01307 funded by Polish National Science Centre (NCN). MLGenSig 2017-2021 MLGenSig: Machine Learning Methods for building of Integrated Genetic Signatures Research project objectives. The main scientific goal of this project is to develop a methodology for integrated genetic signatures based on data from divergent high-throughput techniques used in molecular biology. Integrated signatures base on ensembles of signatures for RNA-seq, DNA-seq, data as well for methylation profiles and protein expression microarrays. The advent of high throughput methods allows to measure dozens of thousands or even millions features on different levels like DNA / RNA / protein. And nowadays in many large scale studies scientists use data from mRNA seq to assess the state of transcriptome, protein microarrays to asses the state of proteome and DNA-seq / bisulfide methylation to assess genome / methylome. Research methodology. Genetic signatures are widely used in different applications, among others: for assessing genes that differentiate cells that are chemo resistant vs. cells that are not, assess the stage of cell pluripotency, define molecular cancer subtypes. For example, in database Molecular Signatures Database v5.0 one can find thousands of gene sets - genetic signatures for various conditions. There are signatures that characterize some cancer cells, pluripotent cells and other groups. But they usually contain relatively small number of genes (around 100), results with them are hard to replicate and they are collection of features that were found significant when independently tested. In most cases signatures are derived from measurements of the same type. Like signatures based of expression of transcripts based on data from microarrays or RNA-seq, or methylation profile or DNA variation. We are proposing a very different approach. First we are going to use machine-learning techniques to create large collections of signatures. Such signatures base on ensembles of small sub-signatures, are more robust and usually have higher precision. Then out of such signatures we are going to develop methodology for meta-signatures, that integrate information from different types of data (transcriptome, proteome, genome). Great examples of such studies are: Progenitor Cell Biology Consortium (PCBC) and The Cancer Genome Atlas (TCGA) studies. For thousands of patients in different cohorts (for PCBC cohorts based on stemness phenotype, for TCGA based on cancer type) measurements of both mRNA, miRNA, DNA and methylation profiles are available. New, large datasets require new methods that take into account high and dense structure of dependencies between features. The task that we are going to solve is to develop methodology that will create genetic signatures that integrate information from different levels of cell functioning. Then we are going to use data from TCGA and PBCB project to assess the quality of proposed methodology. As a baseline we are going to use following methodologies: DESeq, edgeR (for mRNA), casper (for lternative splicing), MethylKit (for RRBS data) and RPPanalyzer for protein arrays. Here is the skeleton for our approach: (1) Use ensembles in order to building a genetic signature. The first step would be to use random forests to train a new signature. Ensembles of sub-signtures are build on bootstrap subsamples and they votes if given sample fit given signature or not. (2) In order to improve signatures we are going to consider various normalization of raw counts. We start with log and rank transformation. (3) In order to improve the process of training an ensemble we are going to use pre-filtering of genes. (4) Another approach is to use Bayesian based methods, that may incorporate the expert knowledge, like belief-based gaussian modelling Research project impact. Genetic profiling is more and more important and has number of application starting from basic classification up to personalized medicine in which patients are profiled against different signatures. Existing tools for genetic signatures have many citations. This we assume that the methodology for integrated genetic profiling will be a very useful for many research groups. It is hard to overestimate the impact of better genetic profiling on medicine. Moreover we build a team of people with knowledge in cancer genetic profiling Work on this project is financially supported from the OPUS grant 2016/21/B/ST6/02176 funded by Polish National Science Centre (NCN). "],["thesis-proposals.html", "Thesis proposals", " Thesis proposals The MI2.AI team is the place where you can conduct research leading to your engineering, master’s or PhD thesis. As a general rule (although there are exceptions), engineering theses focus on the development of software, master’s theses on the development of a data analysis method, PhD theses on the solution of a larger scientific problem. We are currently working red teaming and explainable AI. Below are general topics on which you can build an interesting thesis. Red Teaming AI models Explaining computer vision models with diffusion models: generative models, and diffusion models in particular, offer impressive capabilities for conditional image manipulation, conditional sampling and allow to incorporate external (not seen during training) objectives into the generative process. One of the ways to advance the state of current methodologies for explaining visual classifiers would be to use diffusion models as a tool to find or synthesize explanations. Many projects with varying levels of detail and advancement are available. For an example paper from this research field, see this work developed in our lab. Feel free to contact us if this topic is of interest to you. Explainable machine learning BSc thesis: Robustness of global machine learning explanations when features are dependent. Description: This project aims to directly follow our recent work with theoretical and experimental analysis on how feature dependence, i.e. correlation and interactions, impacts the robustness of global machine learning explanations, i.e. feature importance and effects, to model and data perturbation. For context, refer to these three papers: AIj 2021, NeurIPS 2023, ECML PKDD 2024. Effort: 1-2 people with interest in statistical learning for tabular data. Supervision: Hubert Baniecki and Przemysław Biecek. XAI against Cancer Analysis of the distribution of tumours in the Polish population XAI for Space TODO "],["contact.html", "Contact", " Contact Feel free to contact with Przemyslaw Biecek through mini-pw email or mim-uw email. Our rooms: 44 (DataLab - separate entrance in front of the main entrance) 316 (xLungs) 317 (HOMER) Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warszawa VAT: PL 5250005834 "],["mi2redteam.html", "MI²RedTeam", " MI²RedTeam MI²RedTeam analyses machine and deep learning predictive models through the lens of AI explainability, fairness, security and human trust. We develop methods and tools for explanatory model analysis and apply them in practice. MI²RedTeam is a group of researchers experienced in XAI who perform a rigorous evaluation of AI solutions in order to improve their transparency and security. We apply state-of-the-art methods and introduce new ones to tailor our analysis to the specific predictive task. We openly collaborate on various topics related to explainable and interpretable machine learning. Feel free to reach out to us with research ideas and development opportunities. We help organizations to better understand the vulnerabilities of their AI systems, and take steps to mitigate them. Red-Teaming SAM Red-Teaming Segment Anything Model Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek CVPR Workshops (2024) The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis of SAM. We analyze the impact of style transfer on segmentation masks. We assess whether the model can be used for attacks on privacy, such as recognizing celebrities’ faces. Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. Red-Teaming HSI Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. Adversarial attacks and defenses for XAI Adversarial attacks and defenses in explainable artificial intelligence: A survey Hubert Baniecki, Przemysław Biecek Information Fusion (2024) Explanations of machine learning models can be manipulated. We introduce a unified notation and taxonomy of adversarial attacks on explanations. Adversarial examples, data poisoning, and backdoor attacks are key safety issues in XAI. Defense methods like model regularization improve the robustness of explanations. We outline the emerging research directions in adversarial XAI. Software: survex survex: an R package for explaining machine learning survival models Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N Wright, Przemysław Biecek Bioinformatics (2023) This paper demonstrates the functionalities of the survex package, which provides a comprehensive set of tools for explaining machine learning survival models. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, promoting transparency and responsibility in sensitive areas. SurvSHAP(t) SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. IEMA The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Software: fairmodels fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. Fooling PDP Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Fooling SHAP Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. Models in the Wild Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. Software: auditor auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. "],["mi²cancer.html", "MI²Cancer", " MI²Cancer 2024 How the xLungs project is developed? (polish only) 2023 Machine learning models demonstrate that clinicopathologic variables are comparable to gene expression prognostic signature in predicting survival in uveal melanoma Piotr Donizy, Mateusz Krzyzinski, Anna Markiewicz, Pawel Karpinski, Krzysztof Kotowski, Artur Kowalik, Jolanta Orlowska-Heitzman, Bozena Romanowska-Dixon, Przemyslaw Biecek, Mai P. Hoang European Journal of Cancer (2023) Molecular assays are not accessible to all uveal melanoma patients. We investigate machine learning models on clinicopathologic variables for risk stratification. Machine learning models included random survival forest and survival gradient boosting. They performed similarly or better than gene expression prognostic signature. Readily accessible clinicopathologic variables can provide adequate prognostic information. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Ki67 is a better marker than PRAME in risk stratification of BAP1-positive and BAP1-loss uveal melanomas Piotr Donizy, Mikołaj Spytek, Mateusz Krzyziński, Krzysztof Kotowski, Anna Markiewicz, Bozena Romanowska-Dixon, Przemyslaw Biecek, Mai P Hoang British Journal of Ophthalmology (2023) Accurate risk stratification of uveal melanoma (UM) patients is important for determining the interval and frequency of surveillance. Loss of BAP1 expression has been shown to be strongly associated with UM-related death and metastasis. In this study of 164 enucleated UMs, we assessed the prognostic role of preferentially expressed antigen in melanoma (PRAME) expression and Ki67 proliferation index measured by digital quantitation using QuPath programme in patients with BAP1-positive and BAP1-loss UMs. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski, Elzbieta Tryniszewska, Joanna Reszec, Zofia Dzieciol-Anikiej, Cezary Piwkowski, Robert Gryczka, Rodryg Ramlau Cancers (2023) Although the biological function of lncRNAs has not been fully elucidated, we know that the aberrant expression of lncRNAs can drive the cancer phenotype. Therefore, a growing area of research is focusing on lncRNAs as putative diagnostic biomarkers and therapeutic targets. The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed lncRNA in the early stages of NSCLC. We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. Applied Molecular-Based Quality Control of Biobanked Samples for Multi-Omics Approach Anna Michalska-Falkowska, Jacek Niklinski, Hartmut Juhl, Anetta Sulewska, Joanna Kisluk, Radoslaw Charkiewicz, Michal Ciborowski, Rodryg Ramlau, Robert Gryczka, Cezary Piwkowski, Miroslaw Kozlowski, Borys Miskiewicz, Przemyslaw Biecek, Karolina Wnorowska, Zofia Dzieciol-Anikiej, Karine Sargsyan, Wojciech Naumnik, Robert Mroz, Joanna Reszec-Gielazyn Cancers (2023) This study highlights the significance of quality assurance in biobanking facilities, specifically in the context of high-throughput research and novel molecular techniques. We established specific quality management workflows utilizing biospecimens collected from oncological patients in Polish clinics Merkel Cell Carcinoma of Unknown Primary: Immunohistochemical and Molecular Analyses Reveal Distinct UV-Signatures Piotr Donizy, Joanna Wróblewska, Dora Dias-Santagata, Katarzyna Woznica, Przemyslaw Biecek, Mark Mochel, Cheng-Lin Wu, Janusz Kopczynski, Malgorzata Pieniazek, Janusz Ryś, Andrzej Marszalek, Mai Hoang Cancers (2023) Similar to primary cutaneous Merkel cell carcinomas, virus-negative unknown primary tumors exhibited UV signatures and frequent high tumor mutational burdens, whereas few molecular alterations were noted in virus-positive tumors. Although additional studies are warranted for the virus-positive cases, our findings are supportive of a cutaneous metastatic origin for virus-negative Merkel cell carcinomas of unknown primary. miRNA Studies in Glaucoma: A Comprehensive Review of Current Knowledge and Future Perspectives Margarita Dobrzycka, Anetta Sulewska, Przemyslaw Biecek, Radoslaw Charkiewicz, Piotr Karabowicz, Angelika Charkiewicz, Kinga Golaszewska, Patrycja Milewska, Anna Michalska-Falkowska, Karolina Nowak, Jacek Niklinski, Joanna Konopińska International Journal of Molecular Sciences (2023) miRNA research in glaucoma has provided significant insights into the molecular mechanisms of the disease, offering potential biomarkers, diagnostic tools, and therapeutic targets. However, addressing challenges such as variability and limited tissue accessibility is essential, and further investigations and validation will contribute to a deeper understanding of the functional significance of miRNAs in glaucoma. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. 2022 Amelanotic Uveal Melanomas Evaluated by Indirect Ophthalmoscopy Reveal Better Long-Term Prognosis Than Pigmented Primary Tumours—A Single Centre Experience Anna Markiewicz, Piotr Donizy, Monika Nowa, Mateusz Krzyziński, Martyna Elas, Przemysław Płonka, Jolanta Orłowska-Heitzmann, Przemysław Biecek, Mai P. Hoang, Bożena Romanowska-Dixon Cancers (2022) Patients with amelanotic uveal melanomas (those without pigment) lived longer and the eventual spread of the neoplastic process occurred later than in patients with heavily pigmented tumours. In heavily pigmented uveal melanomas, we found features on histopathological examination that were associated with an unfavourable prognosis. In the two separate groups of uveal melanomas with different degrees of pigmentation, we observed that amelanotic tumours with a lower clinical stage had the best prognosis. 2021 Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. 2017 Molecular chaperones in the acquisition of cancer cell chemoresistance with mutated TP53 and MDM2 up-regulation Zuzanna Tracz-Gaszewska, Marta Klimczak, Przemyslaw Biecek, Marcin Herok, Marcin Kosinski, Maciej Olszewski, Patrycja Czerwińska, Milena Wiech, Maciej Wiznerowicz, Alicja Zylicz, Maciej Zylicz, Bartosz Wawrzynow Oncotarget (2017) Utilizing the TCGA PANCAN12 dataset we discovered that cancer patients with mutations in TP53 tumor suppressor and overexpression of MDM2 oncogene exhibited decreased survival post treatment. Our findings demonstrate that molecular chaperones aid cancer cells in surviving the cytotoxic effect of chemotherapeutics and may have therapeutic implications. "],["mi²space.html", "MI²Space", " MI²Space MI²Space Team develops methods, software, and systems for the validation, debugging and auditing of artificial intelligence algorithms used in space missions. The research is being conducted for the European Space Agency. Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew, Agata M. Wijata, Jakub Nalepa, Nicolas Longépé, Przemyslaw Biecek ICLR Workshops (2024) Remote sensing applications require machine learning models that are reliable and robust, highlighting the importance of red teaming for uncovering flaws and biases. We introduce a novel red teaming approach for hyperspectral image analysis, specifically for soil parameter estimation in the Hyperview challenge. Utilizing SHAP for red teaming, we enhanced the top-performing model based on our findings. Additionally, we introduced a new visualization technique to improve model understanding in the hyperspectral domain. "],["mi²betabit.html", "MI²BetaBit", " MI²BetaBit Beta Bit is a series of books about data analysis, data visualisation and machine learning using the adventures of two scientists - mathematician Beta and computer scientist Bit. Together they have interesting experiences analysing a wide variety of data. Because data analysis is one of the most interesting adventures! Explanatory Model Analysis Explanatory Model Analysis Explore, Explain, and Examine Predictive Models. With examples in R and Python Przemysław Biecek, Tomasz Burzykowski Chapman and Hall/CRC, New York (2021) Chaos Game Chaos Game EN: Are you curious about fractals? The Chaos Game is the book for you. You will learn the mathematical basis behind these figures, find out what algorithm can be used to code them, write code in your favourite programming language (Python, R, Julia?) and also explore the bibliographies of three mathematicians associated with the development of mathematics around these shapes. This is the next book in the Beta Bit series for anyone interested in computational mathematics and data analysis. PL: Jesteś ciekawy czym są fraktale? Gra w Chaos to książka dla Ciebie. Poznasz matematyczne podstawy tych figur, dowiesz się, jaki algorytm można wykorzystać do ich zaprogramowania, napiszesz kod w swoim ulubionym języku programowania (Python, R, Julia?), a także poznasz bibliografie trzech matematyków związanych z rozwojem matematyki wokół tych kształtów. To kolejna książka z serii Beta Bit dla wszystkich zainteresowanych matematyką obliczeniową i analizą danych. Flipbook online [ENG] Flipbook online [POL] Wykresy od kuchni Wykresy od kuchni PL: Jak tworzyć dobre wykresy? Dobre, czyli takie, które z przyjemnością się ogląda, z których można wyciągnąć wiele informacji, które są zrozumiałe dla szerokiego odbiorcy, a jednocześnie docenią je smakosze. Na bazie doświadczeń z prowadzenia tych zajęć powstały Wykresy od kuchni. To zbiór krótkich wykładów omawiających różne wątki przydatne w lepszym zrozumieniu tego, jak działa komunikacja z użyciem wykresów statystycznych. Na kolejnych stronach pojawi się wiele analogii do przyrządzania posiłków, ponieważ zarówno w kuchni, jak i w przygotowaniu wykresów statystycznych potrzebna jest praktyka, znajomość pewnych fundamentalnych prawideł, garść sprawdzonych przepisów i dużo zapału do eksperymentowania. Będąc tak uzbrojonym, każdy adept sztuki kulinarnej jest skazany na sukces. Flipbook online [POL] The Hitchhiker’s Guide to Responsible Machine Learning The Hitchhiker’s Guide to Responsible Machine Learning EN: A one-of-a-kind 52-page story about responsible machine learning. Beta and Bit use decision trees, random forests, and AutoML tools to build a risk model after a covid infection, and then use explainable artificial intelligence tools to analyze the behavior of that model. The description of the data analysis process is intertwined with descriptions of ML tools and code snippets. All examples are fully reproducible! PL: Jedyna w swoim rodzaju 52-stronicowa opowieść o odpowiedzialnym uczeniu maszynowym. Beta i Bit używają drzew decyzyjnych, lasów losowych i narzędzi AutoML do budowy modelu ryzyka po zakażeniu covid, a następnie używają narzędzi wyjaśnialnej sztucznej inteligencji by przeanalizować działanie tego modelu. Opis procesu analizy danych przeplata się na opisem kolejnych narzędzi i przykładami kodu. Wszystkie wyniki są całkowicie odtwarzalne! Flipbook online Przemysław Biecek, Anna Kozak, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 W pogoni za nieskończonością. Szeregi W pogoni za nieskończonością. Szeregi EN: What does hiking in the mountains have to do with the convergence of series? Quite a lot! We start with the paradoxes related to infinity, but step by step we learn the techniques of geometric series. In this book, the conditions for convergence are explained, together with numerous examples. The comic ends with a collection of exercises with different levels of difficulty. PL: Co wspólnego ma chodzenie po górach ze zbieżnością szeregów? Otóż całkiem sporo! Zaczynamy od paradoksów związanych z nieskończonością, ale krok po kroku poznajemy techniki szeregów geometrycznych. W tej pozycji wyjaśnione są warunki zbieżności wraz z licznymi przykładami. Komiks kończy zbiór zadań o różnych poziomach trudności. Flipbook online Przemysław Biecek, Łukasz Maciejewski, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 Przewodnik po pakiecie R Przewodnik po pakiecie R EN: The Guide to the R package was the first published Polish book focused on the R language. The current fourth edition consists of four parts: Basics of using R (+tidyverse, shiny, knitr and other goodies), Programming in R (object-oriented, package development, class system), Statistics with R (statistical tests, models, exploration techniques) and Visualization with R (graphics, lattice and ggplot2 packages). PL: Przewodnik po pakiecie R był pierwszą wydaną polskojęzyczną książką poświęconą językowi R. Aktualne czwarte wydanie składa się z czterech części: Podstaw posługiwania się językiem R (+tidyverse, shiny, knitr i inne smaczki), Programowanie w R (obiektowe, tworzenie pakietów, system klas), Statystyka z R (testy statystyczne, modele, techniki eksploracji) i Wizualizacja z R (pakiety graphics, lattice i ggplot2). Wersja online, Książka w księgarnii. Przemysław Biecek Wydawnictwo GiS. 2008-2021 Analiza danych z programem R Analiza danych z programem R EN: An academic textbook describing estimation and testing topics for linear models with fixed effects, random effects and mixed effects. The theoretical introduction is complemented by numerous examples for one-way and multivariate ANOVA, one and multiple random components. The examples focus on biological and medical applications and are based on real analyses of real data. PL: Podręcznik akademicki opisujący zagadnienia estymacji i testowania dla modeli liniowych z efektami stałymi, losowymi i mieszanymi. Wprowadzenie teoretyczne jest uzupełnione o liczne przykłady dla jednokierunkowej i wielokierunkowej ANOVA, jednym i wieloma komponentami losowymi. Przykłady dotyczą głównie zastosowań biologicznych i medycznych i bazują na prawdziwych analizach rzeczywistych danych. Książka w księgarnii. Przemysław Biecek Wydawnictwo Naukowe PWN 2013-2018 Eseje o sztuce wizualizacji danych Eseje o sztuce wizualizacji danych EN: Discover! Reveal! Explain! These three roles can be fulfilled by good statistical graphics. Good means understandable, faithful to the data, aesthetic. How to create such graphics? A collection of essays on the art of displaying data systematises knowledge useful in designing and producing good data visualisations. It is not easy. On the one hand, we can fall into the trap of a colourful mush full of numbers, which is sometimes proudly called infographics. On the other hand, we can fall into the trap of graphics that perfectly reproduce the complexity of numbers, and thus completely incomprehensible. Somewhere in the middle is a graphic that explains, that informs, that is aesthetically pleasing and informative. PL: Odkrywać! Ujawniać! Objaśniać! Te trzy role może spełniać dobra grafika statystyczna. Dobra czyli zrozumiała, wierna danym, estetyczna. Jak tworzyć taką grafikę? Zbiór esejów o sztuce pokazywania danych systematyzuje wiedzę przydatną do projektowania i wykonania dobrej wizualizacji danych. Nie jest to proste. Z jednej strony możemy wpaść w pułapkę pstrokatej papki najeżonej liczbami, którą czasem dumnie nazywa się infografiką. Z drugiej strony wpaść można w pułapkę grafiki idealnie odwzorowującej złożoność liczb a przez to zupełnie niezrozumiałej. Gdzieś po środku jest grafika, która wyjaśnia, która informuje, która jest estetyczna i informatywna. Książka online, Książka w księgarnii. Przemysław Biecek Wydawnictwo SmarterPoland 2008-2021 Pogromcy Danych Pogromcy Danych EN: Data Crunchers is the first MOOC (Massive Open Online Course) developed in Polish for data scientists. Two modules were developed in 2015: the first one is an introduction to R, with loading data, overview of syntax, basic data types, descriptive statistics and pipelined processing. The second module is dedicated to data visualisation and statistical modelling. More than 8,000 people have registered on the Data Crunchers platform. PL: Pogromcy Danych to pierwszy MOOC (Massive Open Online Course) opracowany w języku polskim do analizy danych. W roku 2015 powstały dwa moduły: pierwszy jest wprowadzeniem do programu R, przez wczytywanie danych, omówienie składni, podstawowych typów danych, statystyk opisowych oraz przetwarzania potokowego. Drugi moduł poświęcony jest wizualizacji danych oraz modelowaniu statystycznemu. W platformie Pogromców Danych zarejestrowało się ponad 8000 osób. Przetwarzanie danych w programie R, Wizualizacja i modelowanie, Strona WWW. Przemysław Biecek ICM UW. 2015 Wykresy unplugged Wykresy unplugged EN: Can you create clear charts without any electricity? An illustrated collection of exercises showing eight of the most popular ways to visualise data, with do-it-yourself challenges. Grab your crayons and start creating fantastic charts. PL: Czy można tworzyć czytelne wykresy bez użycia prądu? Ilustrowany zbiór ćwiczeń przedstawiających osiem najpopularniejszych sposobów wizualizacji danych, wraz z zadaniami do samodzielnego wykonania. Weź kredki i zacznij tworzyć fantastyczne wykresy. Flipbook online, Komiks w księgarnii. Przemysław Biecek, Ewa Baranowska, Piotr Sobczyk Fundacja Naukowa SmarterPoland.pl. 2018 W pogoni za nieskończonością W pogoni za nieskończonością EN: Two mathematicians share stories about infinity. In the first Beta attends a lecture on the properties of prime numbers. In the second, Bit breaks into the Palace of Culture and Science. How should we talk about mathematics? PL: Dwójka matematyków wymienia się opowiadaniami o nieskończoności. W pierwszym Beta bierze udział w wykładzie o właściwościach liczb pierwszych. W drugim Bit włamuje się do Pałacu Kultury i Nauki. Jak opowiadać o matematyce? Flipbook online, Komiks w księgarnii. Przemysław Biecek, Łukasz Maciejewski, Tomasz Samojlik, Sebastian Szpakowski Fundacja Naukowa SmarterPoland.pl. 2018 Jak długo żyją Muffinki? Jak długo żyją Muffinki? EN: A collection of three stories for children showing statistical relationships in the world around us. Beautifully illustrated stories about the distribution of height according to age, the life span of dogs or measuring the weight of trees. PL: Zbiór trzech opowiadań dla dzieci pokazującym zależności statystyczne w świecie wokół nas. Pięknie ilustrowane opowiadania o rozkładzie wzrostu w zależności od wieku, czasie życia psów czy pomiarze wagi drzew. Online: Jak szybko urosnę, Jak długo żyją Muffinki. Przemysław Biecek Fundacja Naukowa SmarterPoland.pl.2016 Pieczara Pietraszki Pieczara Pietraszki EN: How linear regression can help in getting home, and why it’s not worth hacking into a mad mathematician’s office. A short story describing the adventures of two teenagers Beta and Bit moving around historic Warsaw. PL: W jaki sposób regresja liniowa może pomóc w powrocie do domu, oraz dlaczego nie warto włamywać się do pokoju szalonego matematyka? Lekkie opowiadanie opisujące przygody dwójki nastolatków Bety i Bita w historycznej Warszawie. Online: W jezyku Polskim, In English, По-Русски. Magda Chudzian, Przemysław Biecek Fundacja Naukowa SmarterPoland.pl. 2015 How to weight a dog with a ruler? How to weight a dog with a ruler? EN: Workshop materials for children aged 8-10. Kids measure different parameters of their body, such as arm span or height. Then they create a graph summarizing the collected data and look for relations between the measured features. It just so happens that parts of the human body are proportional to each other and you can use a ruler to find this relationship. Part of the StatTub project. PL: Materiały do warsztaty dla dzieci w wieku 8-10. Dzieci mierzą różne parametry swojego ciała, takie jak rozpiętość ramion lub wzrost. Następnie tworzą wykres podsumowujący zebrane dane i szukają zależności pomiędzy zmierzonymi cechami. Tak się składa, że części ciała ludzkiego są do siebie proporcjonalne i można z użyciem linijki znaleźć tę relację. Część projektu StatTuba. Online: English, Polish, Chinese, Simplified Chinese, Czech, German, Spanish, Spanish (Latin America), French, Dutch, Vietnamese. Przemysław Biecek, Klaudia Korniluk Fundacja Naukowa SmarterPoland.pl.2016-2021 "],["mi²solutions.html", "MI²Solutions", " MI²Solutions Hire a team of experienced researchers. The blue team will help you develop good predictive models, create a responsible solution tailored to your needs. The red team will help you find and analyse any weaknesses in your predictive models. It will help you confront them with domain knowledge and make sure they are resilient to future changes in the data. If you need tailor-made solutions for your individual needs, we are happy to help you too. Contact us, we can develop software for you, deploy it, provide training, discuss your needs, verify the quality of your existing solutions. Below you will find a sample offer for trainings or deployments. Research as a service Our team has experience not only in groundbreaking research, but also in deploying these research into business. There are many ways we can help, for example help in delivery of champion-challenger evaluations in which we look for potential to increase the effectiveness of predictive models in your company. take care of the whole life cycle of the predictive models, from reproducibility of results to constant monitoring and continuous improvement of the model. audit models and analyse the sensitivity and vulnerability of the model to incorrect or unexpected behaviours. We would be happy to discuss how we could help with your organisation! "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/seminars.html b/docs/seminars.html index a21224a..903c79e 100644 --- a/docs/seminars.html +++ b/docs/seminars.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/the-team.html b/docs/the-team.html index 8180bfe..5d66929 100644 --- a/docs/the-team.html +++ b/docs/the-team.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • diff --git a/docs/thesis-proposals.html b/docs/thesis-proposals.html index beadf08..9f6dc81 100644 --- a/docs/thesis-proposals.html +++ b/docs/thesis-proposals.html @@ -109,9 +109,9 @@
  • Thesis proposals
  • Contact
  • @@ -146,12 +146,18 @@

    Thesis proposals

    The MI2.AI team is the place where you can conduct research leading to your engineering, master’s or PhD thesis. As a general rule (although there are exceptions), engineering theses focus on the development of software, master’s theses on the development of a data analysis method, PhD theses on the solution of a larger scientific problem.

    -

    We are currently working in four areas. Below are general topics on which you can build an interesting thesis

    +

    We are currently working red teaming and explainable AI. Below are general topics on which you can build an interesting thesis.

    Red Teaming AI models

    -
      +
      1. Explaining computer vision models with diffusion models: generative models, and diffusion models in particular, offer impressive capabilities for conditional image manipulation, conditional sampling and allow to incorporate external (not seen during training) objectives into the generative process. One of the ways to advance the state of current methodologies for explaining visual classifiers would be to use diffusion models as a tool to find or synthesize explanations. Many projects with varying levels of detail and advancement are available. For an example paper from this research field, see this work developed in our lab. Feel free to contact us if this topic is of interest to you.
      2. -
    + +
    +
    +

    Explainable machine learning

    +
      +
    1. BSc thesis: Robustness of global machine learning explanations when features are dependent. Description: This project aims to directly follow our recent work with theoretical and experimental analysis on how feature dependence, i.e. correlation and interactions, impacts the robustness of global machine learning explanations, i.e. feature importance and effects, to model and data perturbation. For context, refer to these three papers: AIj 2021, NeurIPS 2023, ECML PKDD 2024. Effort: 1-2 people with interest in statistical learning for tabular data. Supervision: Hubert Baniecki and Przemysław Biecek.
    2. +

    XAI against Cancer

    @@ -164,12 +170,9 @@

    XAI for Space -

    XAI for Education

    -
      -
    • TODO
    • -
    +