From eacdfd6e9a22d4640d9f3555b93886493ae447ec Mon Sep 17 00:00:00 2001 From: runner Date: Wed, 4 Oct 2023 20:00:29 +0000 Subject: [PATCH] Render book --- docs/search_index.json | 2 +- docs/the-team.html | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/search_index.json b/docs/search_index.json index 3c19446..2ad28f4 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "MI² MI².AI", " MI² MI².AI MI².AI is a group of mathematicians and computer scientists that love to play with predictive models. We are spread between Warsaw University of Technology and University of Warsaw. Here we have workshops, seminars, here we are forging new ideas, creating tools, solving problems, doing consulting and sharing our positive attitude. Feel free to jump in. Mission Machine learning is like atomic energy. We develop leaders, skills, methods, tools and good practices so that predictive models can be deployed responsibly and sustainably. Vision MI² is a group of experts supporting global initiatives aimed at responsible and sustainable machine learning. We support the development of future leaders of responsible machine learning through internships, PhDs, postdoctoral fellowships and so on. We seek for research grants and business projects to conduct both scientific and applied research. We develop and maintain software and infrastructure necessary to build responsible and sustainable ML. We develop cooperation with international teams working on similar topics. We support companies to implement best practices related to responsible modelling in their operation. We conduct workshops and training on responsible predictive modelling. "],["the-team.html", "The Team", " The Team Members Przemysław Biecek, PhD, DSc (Team Leader) Hubert Baniecki, PhD student Mustafa Cavus, PhD Maciej Chrabąszcz, MSc student Mateusz Grzyb, MSc student Stanisław Giziński, MSc student Weronika Hryniewska, PhD student Piotr Komorowski, MSc student Mateusz Krzyziński, MSc student Tymoteusz Kwieciński, BSc student Stanisław Łaniewski, PhD student Piotr Piątyszek, BSc student Hubert Ruczyński, MSc student Barbara Rychalska, PhD student Nuno Sepúlveda, PhD Bartek Sobieski, MSc student Mikołaj Spytek, MSc student Tomasz Stanisławek, PhD Hoang Thien Ly, BSc student Paulina Tomaszewska, PhD student Piotr Wilczyński, BSc student Emilia Wiśnios, MSc student Paweł Wojciechowski, BSc Katarzyna Woźnica, PhD student Vladimir Zaigrajew, PhD student Artur Żółkowski, BSc student Collaborators Mariusz Adamek, Prof, MD Przemysław Bombiński, PhD, MD André Fonseca, PhD student Katarzyna Kobylińska, PhD student Anna Kozak, MSc Marcin Luckner, PhD João Malato, PhD student Bartek Pieliński, PhD, DSc Hanna Piotrowska, MA Elżbieta Sienkiewicz, PhD Julian Sienkiewicz, PhD Adrian Stańdo, MSc student Patryk Szatkowski, PhD student, MD Jakub Wiśniewski, MSc student Alumni Piotr Czarnecki, MSc Alicja Gosiewska, MSc Adrianna Grudzień, BSc Maria Kałuska, BSc Marcin Kosiński, MSc Adam Kozłowski, MSc Wojciech Kretowicz, BSc Michał Kuźba, MSc Szymon Maksymiuk, BSc Tomasz Mikołajczyk, PhD Katarzyna Pękala, MSc Adam Rydelek, BSc Bartosz Sawicki, BSc Patryk Słowakiewicz, BSc Michał Sokólski, MSc Mateusz Stączek, BSc Szymon Szmajdziński, BSc Zuzanna Trafas, BSc Kinga Ułasik, BSc Anna Wróblewska, PhD Hanna Zdulska, BSc Przemysław Biecek My personal mission is to enhance human capabilities by supporting them through access to data-driven and knowledge-based predictions. I execute it by developing methods and tools for responsible machine learning, trustworthy artificial intelligence and reliable software engineering. I work as an associate professor at Warsaw University of Technology and the University of Warsaw. I graduated in software engineering and mathematical statistics and now work on model visualisation, explanatory model analysis, predictive modelling and data science for healthcare. In 2016, I formed the research group MI² which develops methods and tools for predictive model analysis. Google Scholar: Af0O75cAAAAJ GitHub: pbiecek LinkedIn: pbiecek Mariusz Adamek I work at two Medical Universities (Silesia and Gdańsk) holding a Professorship in Medicine and Health Sciences. My interests are focused on lung cancer prevention and screening, the latter by means of low-dose computed tomography (LDCT) with special emphasis put on molecular biology methods, prediction models and image analysis aimed to enhance the performance of lung screening outcomes. Website: www.mariuszadamek.io Hubert Baniecki I’m a PhD student in Computer Science at the University of Warsaw. I previously did my MSc (2022) and BSc (2021) in Data Science at Warsaw University of Technology. My main research interest is explainable machine learning, with particular emphasis on adversarial attacks & evaluation of explanations. I care about human-model interaction with applications in biomedicine. I support the development and maintenance of several open-source Python & R packages for building predictive models responsibly. Website: hbaniecki.com Mustafa Cavus I work as an assistant professor at Warsaw University of Technology and the Eskisehir Technical University. I joined the MI² DataLab as a post-doc researcher in 2021. I work on explainable artificial intelligence and AutoML. Google Scholar: I63d1WIAAAAJ&hl GitHub: mcavus LinkedIn: mcavus Twitter: mcavus Julian Sienkiewicz I work as an assistant professor at Faculty of Physics, WUT. My main research area links with sociophysics, complex networks and agent-based models. In the scope of MI² DataLab I follow my other interest - scientometrics. Google Scholar: mIwu11QAAAAJ LinkedIn: julek-sienkiewicz-873829 Maciej Chrabąszcz Master’s student in mathematical statisctics at Warsaw University of Technology. Interested in deep learning on text and images, explainable and responsible AI. GitHub: maciejchrabaszcz LinkedIn: maciej-chrabaszcz Stanisław Giziński A Research Software Engineer and student of Machine Learning at Faculty of Mathematics Informatics and Mechanic, University of Warsaw. His work in the lab focuses on using natural language processing and network analysis to better understand the spread of AI public policies. Interested also in applying machine learning in bioinformatics. Google Scholar: Stanisław Giziński GitHub: Gizzio LinkedIn: stanislaw-gizinski Mateusz Grzyb MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence and scientific computing, but above all simply enjoys programming. GitHub: mgrzyb99 Weronika Hryniewska PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning modelling on medical images in the context of explainability and responsible AI. Google Scholar: aJeg3IQAAAAJ GitHub: Hryniewska LinkedIn: weronikahryniewska Piotr Komorowski Master’s student in Machine Learning at the University of Warsaw. Mainly interested in image processing and XAI applied to medical images. GitHub: piotr-komorowski LinkedIn: Piotr-Komorowski Anna Kozak Graduated in mathematical statistics at Warsaw University of Technology. Interested in explainable artificial intelligence and data visualization. Organizes projects related to education. Google Scholar: JIrqf9kAAAAJ GitHub: kozaka93 LinkedIn: kozakanna Mateusz Krzyziński MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, with particular emphasis on XAI methods for survival analysis models and XAI applications in the medical field. Also an enthusiast of data visualization. Google Scholar: i_r7EUgAAAAJ GitHub: krzyzinskim LinkedIn: krzyzinskim Tymoteusz Kwieciński BSc student in Data Science at Warsaw University of Technology. Particularly interested in explainable artificial intelligence, computer vision and NLP. GitHub: Fersoil LinkedIn: Tymoteusz-Kwieciński Stanisław Łaniewski PhD student in Quantitative Psychology and Economics at University of Warsaw, Machine Learning Researcher at MI2 Data Lab, Msc in Actuarial Science and Mathematical Finance at University of Amsterdam, former Quantitative Researcher at Flow Traders His research focuses on enhancing classical methods used in discrete choice and finance with machine learning and how to apply them to explain behavioral phenomena and heuristics. He is also keen on finding balance between best predictive models and their explainability. Avid gamer who applies statistical techniques to deepen the understanding of best strategies LinkedIn: Stanisław-Łaniewski Piotr Piątyszek Undergraduate Data Science student at Warsaw University of Technology. Works as a research software engineer on enhancing accessibility and completeness of explainable AI. During pandemic contributes to a system of monitoring covid variants. Github: piotrpiatyszek Bartosz Pieliński I am an Assistant Professor at the Faculty of Political Science and International Studies at Warsaw University. I am interested in applying quantitative methods to study public policies. I am a founding member of the Institutional Grammar Research Initiative, which is focused on developing a new way of analysing social rules. I have participated in several research projects covering social policy, non-profit organizations, social enterprises, and international organizations. Website: https://pielinski.info/ Google Scholar: hnWiaVEAAAAJ LinkedIn: Bartosz Pieliński Hanna Piotrowska Information designer, focusing mainly on data visualization, branding and book design, with a strong interest in Data Science and perception studies. Winner of numerous awards, including The Kantar Information Is Beautiful Awards, HOW International Design Awards, Polish Graphic Design Awards and KTR. LinkedIn: hanna-piotrowska Twitter: hannapio Behance: hannapio. Hubert Ruczyński I am working towards Masters’s degree in Data Science at Warsaw University of Technology. I am also teaching students about data exploration and visualisation. My major interests are: AutoML | Natural Language Processing | Data Visualization | Fairness. GitHub: HubertR21 LinkedIn: Hubert Ruczyński Barbara Rychalska PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP), recommender systems and graph-based learning. Google Scholar: Wp0wHJoAAAAJ LinkedIn: Barbara-Rychalska Bartek Sobieski MSc student in Data Science at Warsaw University of Technology. Interested in deep learning and hyperparameter optimization. GitHub: sobieskibj LinkedIn: Bartłomiej-Sobieski Mikołaj Spytek MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, data vizualization and survival analysis. Google Scholar: 1u49AqYAAAAJ GitHub: mikolajsp LinkedIn: Mikołaj-Spytek Tomasz Stanisławek PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP). Google Scholar: gq8NY_UAAAAJ GitHub: tstanislawek LinkedIn: Tomasz-Stanisławek Paulina Tomaszewska PhD candidate in Computer Science at Warsaw University of Technology. Gained experience in AI at leading universities during: Deep Learning Summer School at Tsinghua University (China), one-semester exchange at Nanyang Technological University (Singapore) and research internships at Gwangju Institute of Science and Technology (South Korea) and Institute of Science and Technology (Austria). Mainly interested in Deep Learning, Computer Vision and Transfer Learning. Recently, focused on digital pathology. Google Scholar: eO245iMAAAAJ LinkedIn: paulina-tomaszewska Hoang Thien Ly Bachelor student in Maths and Data Analysis at Warsaw University of Technology. Interested in working with data, and learning explainable artificial intelligence methods. Google Scholar: JkysewYAAAAJ GitHub: lhthien09 LinkedIn: hthienly Piotr Wilczyński BSc student in Data Science at Warsaw University of Technology. Interested in ontologies, semantic similarity, hyperparameter optimization and NLP. GitHub: wi1lku LinkedIn: Piotr-Wilczyński Jakub Wiśniewski Research Software Engineer and third year Data Science student at Warsaw University of Technology. Developer of tools for bias detection and fairness. Currently researching responsible applications of deep learning. President of Data Science Science Club at WUT. Google Scholar: _6eQsXMAAAAJ GitHub: jakwisn LinkedIn: jakwisn Emilia Wiśnios Research Software Engineer and student of Machine Learning at Faculty of Mathematics, Informatics and Mechanics, University of Warsaw. Interested in natural language processing and reinforcement learning. GitHub: emiliawisnios LinkedIn: emilia-wisnios Paweł Wojciechowski Graduated with a bachelor’s degree in Data Science from Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision, and active learning. GitHub: p-wojciechowski LinkedIn: wojciechowski-p Katarzyna Woźnica PhD candidate in computer science at Warsaw University of Technology. Graduated in mathematical statistics. Interested in automated machine learning especially in hyperparameter tuning for tabular data. Carrying statistical analysis and predictive modelling for healthcare. Google Scholar: tAQS1gQAAAAJ GitHub: woznicak LinkedIn: woznicak Vladimir Zaigrajew PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning, primarily on images, with a focus on representation learning. GitHub: WolodjaZ LinkedIn: vladimir-zaigrajew Artur Żółkowski BSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision and NLP. GitHub: arturzolkowski LinkedIn: Artur-Żółkowski "],["open-positions.html", "Open Positions", " Open Positions We have open call for HOMER and xLungs projects for following positions. If you are interested in any of them please send your CV and Motivation Letter to przemyslaw.biecek at pw.edu.pl. We reserve the right to contact only selected candidates. PhD positions in 2023 Call for PhD Student: Research Opportunity in Foundation Model Redteaming We are seeking a highly motivated and talented PhD student to join MI2.AI research team in the exciting field of foundation model redteaming. As the advancements development of AI models (especially transformer based with attention mechanism), it becomes crucial to ensure the robustness and security of the foundation models that underlie these advancements. Project Description: The selected PhD student will work on investigating potential vulnerabilities and biases in foundation models, such as language models (one PhD position) and 2D/3D computer vision models (one PhD position). Redteaming involves conducting adversarial assessments to identify weaknesses, uncover potential attack vectors, and develop defensive strategies for foundation models/transformer models. The research will explore various aspects, including data poisoning attacks, adversarial samples generation, and the development of countermeasures. Requirements: Proficiency in programming languages commonly used in AI research, such as Python, R or Julia. Familiarity with deep learning frameworks and libraries, such as PyTorch. Solid understanding of foundation models, including their architecture and training processes. Excellent problem-solving skills and ability to work independently and as part of a team. Good communication skills and proficiency in academic writing. Interested candidates are requested to submit the following documents to przemyslaw.biecek at pw.edu.pl. We look forward to receiving your applications and welcoming an enthusiastic and dedicated PhD student to contribute to our foundation model redteaming research endeavors. Apply before June 30 the end of the day (June 10 in case of interest in MIM UW). Deep Learning Engineer Required: Background in Computer Science, Mathematics, Statistics or similar. Experience in Deep Learning for 2d/3d image data (*torch is a plus) Interest in medical applications Scope of work: Training of machine learning models for tabular and image data Responsible ML solutions for the healthcare domain Offer: Excellent atmosphere for work in a young and very active lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Elastic working hours that can be combined with studies Research Software Engineer Required: Background in Computer Science, Mathematics, Statistics or similar. Experience in Scientific Programming (R and/or Python and/or C++) Interest in applications, machine learning and explainable artificial intelligence. Scope of work: Training of machine learning models for tabular and image data Interpretable solutions for tabular and image data Interactive interfaces Responsible ML solutions for the healthcare domain Offer: Excellent atmosphere for work in a young and very active lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Elastic working hours that can be combined with studies Post-doc I am looking for a post-doc to join the MI2.AI team for one or two years within HOMER project. The ideal fit is someone with (experience in Medical Image Analysis with deep learning models) OR (someone with experience in AutoML for tabular data). Experience or interest in XAI / fairness will always be a huge plus in our team. We have a young and very energetic team focused on growth in AI. Full time position. Salary above academic average. Little or no didactics. Strong focus on doing things that are meaningful. Required: PhD in Computer Science, Mathematics, Statistics or similar. (Different background? We encourage cross-domain short research visits). Experience in Scientific Programming (R and/or Python and/or C++) Good track of Scientific Records Scope of work: Automated model exploration Interpretable measures for model performance Meta/transfer learning in automated model development Automated model validation Experiments with XAI for deep learning Offer: Excellent atmosphere for work in a young and very active ML lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Full-time job at Warsaw University of Technology plus results-driven extras contract for 6 months (short visit) / 12 months or 24 months (long visit) "],["contact.html", "Contact", " Contact Our rooms: 44 (DataLab - separate entrance in front of the main entrance) 316 (xLungs) 317 (HOMER) Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warszawa VAT: PL 5250005834 "],["mi2redteam.html", "MI²RedTeam", " MI²RedTeam MI²RedTeam analyses machine and deep learning predictive models through the lens of AI explainability, fairness, security and human trust. We develop methods and tools for explanatory model analysis and apply them in practice. MI²RedTeam is a group of researchers experienced in XAI who perform a rigorous evaluation of AI solutions in order to improve their transparency and security. We apply state-of-the-art methods and introduce new ones to tailor our analysis to the specific predictive task. We openly collaborate on various topics related to explainable and interpretable machine learning. Feel free to reach out to us with research ideas and development opportunities. We help organizations to better understand the vulnerabilities of their AI systems, and take steps to mitigate them. Our current core research topics of interest include: [ARES] Attack-Resistant Explanations towards Secure AI, i.e. a critical evaluation of the state-of-the-art analysis techniques [xSurvival] Explanatory analysis of machine learning survival models [Large Model Analysis] Explanatory analysis of large models, e.g. transformers Methods and methodologies introduced by our team: Evaluating explanations of vision transformers InteractiveEMA towards human-model interaction in explainable machine learning for tabular data SurvSHAP(t) for time-dependent analysis of machine learning survival models LIMEcraft for human-guided visual explanations of deep neural networks Fooling PD & Manipulating SHAP for stress-testing widely-applied explanation methods Checklist towards responsible deep learning on medical images SAFE for lifting interpretability-performance trade-off via automated feature engineering WildNLP for stress-testing deep learning models in NLP Explanatory Model Analysis towards comprehensive examination of predictive models Tools developed by our team: DALEX, breakDown, auditor & modelStudio for explainable machine learning in R dalex for explainable and fair machine learning in Python survex dedicated to explaining machine learning survival models fairmodels for fairness analysis of machine learning classification models Applications supported by our team: In medicine, we analyzed hundreds of models predicting among others: survival in uveal melanoma eye cancer, survival in sepsis, type of lung cancer, lung cancer risk in screening, lung cancer mortality, COVID-19 mortality, hospital length of stay, progression of Alzheimer’s disease. In credit scoring, we analyzed the transparency, auditability, and explainability of machine learning models. In football analytics, we analyzed expected goal models for performance analysis. … This initiative is generously supported by the following institutions. "],["mi²research.html", "MI²Research", " MI²Research On a mission to responsibly build machine learning predictive models. Responsible and sustainable predictive modelling is still a new and developing area. We are conducting a number of studies in this domain that examine predictive models applied to tabular data, computer vision or natural language processing models. We investigate the stability and robustness of various methods, work on explainability and transparency for simple and complex models. As part of our this effort, we develop open source software packages (usually in R and Python) for model explanatory analysis, publish scientific articles describing new methods or investigating properties of already known methods, and create educational materials, recommendations and examples of application in specific domains. If you want to find out more about what we are working on, check out our seminar, which is always open to those interested in responsible and sustainable data science. "],["papers.html", "Papers", " Papers Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarks Katarzyna Woźnica, Mateusz Grzyb, Zuzanna Trafas, Przemysław Biecek Machine Learning (2023) This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar datasets. We show that a carefully selected static portfolio of hyperparameter configurations yields good results for anytime optimization, while maintaining the ease of use and implementation. We demonstrate the effectiveness of this approach through an empirical study for the XGBoost algorithm and the newly created metaMIMIC benchmarks of predictive tasks extracted from the MIMIC-IV medical database. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Climate Policy Tracker: Pipeline for automated analysis of public climate policies Artur Żółkowski, Mateusz Krzyziński, Piotr Wilczyński, Stanisław Giziński, Emilia Wiśnios, Bartosz Pieliński, Julian Sienkiewicz, Przemysław Biecek NeurIPS Workshop on Tackling Climate Change with Machine Learning (2022) In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. Explainable expected goal models for performance analysis in football analytics Mustafa Cavus, Przemyslaw Biecek International Conference on Data Science and Advanced Analytics (2022) The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of the score in modern football. This paper proposes an accurate expected goal model trained on 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. Moreover, we demonstrate a practical application of aggregated profiles to explain a group of observations on an accurate expected goal model for monitoring the team and player performance. Multi-omics disease module detection with an explainable Greedy Decision Forest Bastian Pfeifer, Hubert Baniecki, Anna Saranti, Przemyslaw Biecek, Andreas Holzinger Scientific Reports (2022) In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer. Interpretable meta-score for model performance Alicja Gosiewska, Katarzyna Woźnica, Przemysław Biecek Nature Machine Intelligence (2022) Elo-based predictive power (EPP) meta-score that is built on other performance measures and allows for interpretable comparisons of models. Differences between this score have a probabilistic interpretation and can be compared directly between data sets. Furthermore, this meta-score allows for an assessment of ranking fitness. We prove the properties of the Elo-based predictive power meta-score and support them with empirical results on a large-scale benchmark of 30 classification data sets. Additionally, we propose a unified benchmark ontology that provides a uniform description of benchmarks. fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease Angela Lombardi, Domenico Diacono, Nicola Amoroso, Przemysław Biecek, Alfonso Monaco, Loredana Bellantuono, Ester Pantaleo, Giancarlo Logroscino, Roberto De Blasi, Sabina Tangaro, Roberto Bellotti Brain Informatics (2022) In this work, we present a robust framework to (i) perform a threefold classification between healthy control subjects, individuals with cognitive impairment, and subjects with dementia using different cognitive indexes and (ii) analyze the variability of the explainability SHAP values associated with the decisions taken by the predictive models. We demonstrate that the SHAP values can accurately characterize how each index affects a patient’s cognitive status. Furthermore, we show that a longitudinal analysis of SHAP values can provide effective information on Alzheimer’s disease progression. LIMEcraft: handcrafted superpixel selection and inspection for Visual eXplanations Weronika Hryniewska, Adrianna Grudzień, Przemysław Biecek Machine Learning (2022) LIMEcraft enhances the process of explanation by allowing a user to interactively select semantically consistent areas and thoroughly examine the prediction for the image instance in case of many image features. Experiments on several models show that our tool improves model safety by inspecting model fairness for image pieces that may indicate model bias. The code is available at: this URL. Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski et al. Cancers (2022) The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed long non-coding RNAs (lncRNAs) in the early stages of non-small-cell lung cancer (NSCLC). We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python Hubert Baniecki, Wojciech Kretowicz, Piotr Piątyszek, Jakub Wiśniewski, Przemyslaw Biecek Journal of Machine Learning Research (2021) We introduce dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness. It adopts the design crafted through the development of various tools for explainable machine learning; thus, it aims at the unification of existing solutions. This library’s source code and documentation are available under open license at this URL. Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies Weronika Hryniewska, Przemysław Bombiński, Patryk Szatkowski, Paulina Tomaszewska, Artur Przelaskowski, Przemysław Biecek Pattern Recognition (2021) Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model. Towards explainable meta-learning Katarzyna Woźnica, Przemyslaw Biecek ECML PKDD Workshop on eXplainable Knowledge Discovery in Data Mining (2021) To build a new generation of meta-models we need a deeper understanding of the importance and effect of meta-features on the model tunability. In this paper, we propose techniques developed for eXplainable Artificial Intelligence (XAI) to examine and extract knowledge from black-box surrogate models. To our knowledge, this is the first paper that shows how post-hoc explainability can be used to improve the meta-learning. Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering Alicja Gosiewska, Anna Kozak, Przemysław Biecek Decision Support Systems (2021) We propose a framework that uses elastic black boxes as supervisor models to create simpler, less opaque, yet still accurate and interpretable glass box models. The new models were created using newly engineered features extracted with the help of a supervisor model. We supply the analysis using a large-scale benchmark on several tabular data sets from the OpenML database. The first SARS-CoV-2 genetic variants of concern (VOC) in Poland: The concept of a comprehensive approach to monitoring and surveillance of emerging variants Radosław Charkiewicz, Jacek Nikliński, Przemysław Biecek, Joanna Kiśluk, Sławomir Pancewicz, Anna Moniuszko-Malinowska, Robert Flisiak, Adam Krętowski, Janusz Dzięcioł, Marcin Moniuszko, Rafał Gierczyński, Grzegorz Juszczyk, Joanna Reszeć Advances in Medical Sciences (2021) This study shows the first confirmed case of SARS-CoV-2 in Poland with the lineage B.1.351 (known as 501Y.V2 South African variant), as well as another 18 cases with epidemiologically relevant lineage B.1.1.7, known as British variant. Responsible Prediction Making of COVID-19 Mortality (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2021) During the literature review of COVID-19 related prognosis and diagnosis, we found out that most of the predictive models are not faithful to the RAI principles, which can lead to biassed results and wrong reasoning. To solve this problem, we show how novel XAI techniques boost transparency, reproducibility and quality of models. Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. Explanations of Model Predictions with live and breakDown Packages Mateusz Staniak, Przemyslaw Biecek The R Journal (2018) Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used for explaining predictions from complex black box models and attributing parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state-of-the-art solutions, namely, lime (Pedersen and Benesty, 2018) which implements Locally Interpretable Model-agnostic Explanations and iml (Molnar et al., 2018) which implements Shapley values. DALEX: Explainers for Complex Predictive Models in R Przemyslaw Biecek Journal of Machine Learning Research (2018) This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method irrespective of its internal structure. Each explainer is linked with a specific aspect of a model. Every explainer presented here works for a single model or for a collection of models. In the latter case, models can be compared against each other. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. archivist: An R Package for Managing, Recording and Restoring Data Analysis Results Przemyslaw Biecek, Marcin Kosiński Journal of Statistical Software (2017) Everything that exists in R is an object (Chambers 2016). This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. "],["software.html", "Software", " Software DALEX XAI with DALEX for R and Python survex Explainable machine learning in survival analysis Arena Interactive tool for the exploration and comparison of models’ explanations fairmodels Fairness with fairmodels COVID-19 COVID-19 Risk Score archivist Model governance with R (MLOps) "],["books.html", "Books", " Books Explanatory Model Analysis Explore, Explain, and Examine Predictive Models. With examples in R and Python Przemysław Biecek, Tomasz Burzykowski Chapman and Hall/CRC, New York (2021) Odkrywać! Ujawniać! Objaśniać! Zbiór esejów o sztuce prezentowania danych Essays on the art of data visualisation Przemysław Biecek University of Warsaw Press (2016) Analiza danych z programem R. Modele liniowe z efektami stałymi, losowymi i mieszanymi Data analysis in R. Linear models with fixed, random and mixed effects Przemysław Biecek Polish Scientific Publishers PWN (2013) "],["seminars.html", "Seminars", " Seminars We meet every Monday, at 10 am online or in MI2DataLab (room 044, Faculty of Mathematics and Information Science, Warsaw University of Technology). Join us at https://meet.google.com/nno-okiz-bxy (or http://meet.drwhy.ai/) List of topics and materials from past seminars: https://github.com/MI2DataLab/MI2DataLab_Seminarium "],["research-grants.html", "Research grants", " Research grants ARES 2022-2026 ARES: Attack-resistant Explanations toward Secure and trustworthy AI Machine learning explainability, fairness, robustness, and security are key elements of trustworthy Artificial Intelligence, an area of strategic importance. In this context, the main goals of the ARES project are: Develop adversarial attacks on state-of-the-art explanations to investigate vulnerabilities and limitations of the existing explainability and fairness approaches in machine learning. Introduce novel robust explanations that are stable against manipulation and intuitive to evaluate. Achieving the first goal primarily impacts various domains of research, which currently use (and explain) black-box models for knowledge discovery and decision-making, by highlighting vulnerabilities and limitations of their explanations. Achieving the second goal impacts more the broad machine learning domain as it aims at improving state-of-the-art by introducing robust explanations toward secure and trustworthy AI. Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2021/43/O/ST6/00347. DARLING 2022-2024 DARLING: Deep Analysis of Regulations with Language Inference, Network analysis and institutional Grammar Aim of the project Developing the tools for automated analysis of content of legal documents leveraging Natural Language Processing, that will help understand the dynamic of change in public policies and variables influencing those changes. Those tools will be firstly used to analyse the case of development of policy subsystem regulating usage of AI in the European Union. Specific goals of the project Developing and evaluating multilingual models for issue classification for legal and public policy documents. Developing embedding-based topic modeling methods for legal and public policy documents suited for analysis of change of the topics between documents. Institutional grammar based analysis of changes in topics between different public policy documents, regulations and public consultation documents. Agent-based models predicting diffusion of issues in public policy documents. Methodology The core of the DARLING project is the issues and topic analysis in documents connected with regulations development using NLP tools. Issues analysis shall allow tracking how different options of AI operationalisation, ways the AI-connected threats are perceived as well as ideas regarding AI regulations are shared among three different types of texts: scientific, expert and legal ones. The extracted issues will then be subject to complex networks analysis and institutional grammar approach. The network analysis, backed by agent-based modeling, will be used to examine the flow of issues among the documents based on their vector-formed characteristics. On the other hand, the Institutional Grammar (IG) will be used to analyze the modality of issues, e.g., the tendency to regulate a specific aspect of AI in a given issue, its deontic character or its conditionality. In result the DARLING project will effect in the development of new methods to analyze legal documents connected to regulation based on deep text processing and links among the documents. An inter-institutional and interdisciplinary team of computer, political sciences and physics of complex systems scientists will elaborate new machine learning approaches to examine the regulation corpora, issues recognition, issues analysis by the means of IG as well as propose new methods of modeling the flow/changes of regulations based on complex networks tools. X-LUNGS 2021-2024 X-LUNGS: Responsible Artificial Intelligence for Lung Diseases The aim of the project is to support the process of identification of lesions visible on CT and lung x-rays. We intend to achieve this goal by building an information system based on artificial intelligence (AI) that will support the radiologist’s work by enriching the images with additional information. The unique feature of the proposed system is a trustworthy artificial intelligence module that: will reduce the image analysis time needed to detect lesions, will make the image evaluation process more transparent, will provide image and textual explanations indicating the rationale behind the proposed recommendation, will be verified for effective collaboration with the radiologist. Work on this project is financially supported from the INFOSTRATEG-I/0022/2021-00 grant funded by Polish National Centre for Research and Development (NCBiR). HOMER 2020-2025 HOMER: Human Oriented autoMated machinE leaRning One of the biggest challenges in the state-of-the-art machine learning is dealing with the complexity of predictive models. Recent techniques like deep neural networks, gradient boosting or random forests create models with thousands or even millions of parameters. This makes decisions generated by these black-box models completely opaque. Model obscurity undermines trust in model decisions, hampers model debugging, blocks model auditability, exposes models to problems with concept drift or data drift. Recently, there has been a huge progress in the area of model interpretability, which results in the first generation of model explainers, methods for better understanding of factors that drive model decisions. Despite this progress, we are still far from methods that provide deep explanations, confronted with domain knowledge that satisfies our ,,Right to explanation’’ as listed in the General Data Protection Regulation (GDPR). In this project I am going to significantly advance next generation of explainers for predictive models. This will be a disruptive change in the way how machine learning models are created, deployed, and maintained. Currently to much time is spend on handcrafted models produced in a tedious and laborious try-and-error process. The proposed Human-Oriented Machine Learning will focus on the true bottleneck in development of new algorithms, i.e. on model-human interfaces. The particular directions I consider are (1) developing an uniform grammar for visual model exploration, (2) establishing a methodology for contrastive explanations that describe similarities and differences among different models, (3) advancing a methodology for non-additive model explanations, (4) creating new human-model interfaces for effective communication between models and humans, (5) introducing new techniques for training of interpretable models based on elastic surrogate black-box models, (6) rising new methods for automated auditing of fairness, biases and performance of predictive models. Work on this project is financially supported from the SONATA BIS grant 2019/34/E/ST6/00052 funded by Polish National Science Centre (NCN). DeCoviD 2020-2022 DeCoviD: Detection of Covid-19 related markers of pulmonary changes using Deep Neural Networks models supported by eXplainable Artificial Intelligence and Cognitive Compressed Sensing Covid-19 is an infectious respiratory disease. A coronavirus infection leaves permanent ramifications in the respiratory system and beyond. In this situation, tools supporting diagnosis and assessment of lung damage after infection and during Covid-19 treatment are crucial. Preliminary results of analysis of CT images and lung xrays suggest that they can help to quickly assess even asymptomatic cases and facilitate prognosis of response to treatment. There are also reports of usefulness of ultrasound images. The aim of the DeCoviD project is to develop methods and tools to support radiologists in the assessment of lung imaging data for the occurrence of changes caused by Covid-19 disease. The developed solution will allow to automate the identification of pathological changes and will support the diagnosis of coexisting lung diseases as well as diseases of other organs visible on chest images. It will also allow to quantify the severity of lung damage caused by the disease Responsible decision support for radiologists requires models based on interpretable features. Such features will be stored in a hybrid knowledge base powered by two research teams from WUT, working on the basis of two, seemingly opposite, paradigms of image data analysis. The eXplainable Artificial Intelligence (XAI) team will use trained deep networks to automatically extract features that are essential for effective disease detection. Cognitive Compressed Sensing (CCS) will build a set of interpretable semantic features using sparse cognitive representations agreed with a group of cooperating radiologists. Combining these two approaches will achieve high effectiveness of the constructed models, combined with high transparency, clarity and stability of the solution. The DeCoviD project is a part of a broader strategy of competence development in the area of deep learning + XAI + medical applications at the Warsaw University of Technology. More information: https://github.com/MI2DataLab/DeCoviD. Work on this project is financially supported by the IDUB against COVID PW. DALEX 2018-2022 DALEX: Descriptive and model Agnostic Local EXplanations Research project objectives. Black boxes are complex machine learning models, for example deep neural network, an ensemble of trees of high-dimensional regression model. They are commonly used due to they high performance. But how to understand the structure of a black-box, a model in which decision rules are too cryptic for humans? The aim of the project is to create a methodology for such exploration. To address this issue we will develop methods, that: (1) identify key variables that mostly determine a model response, (2) explain a single model response in a compact visual way through local approximations, (3) enrich model diagnostic plots. Research project methodology. This project is divided into three subprojects - local approximations od complex models (called LIVE), explanations of particular model predictions (called EXPLAIN) and conditional explanations (called CONDA). Expected impact on the development of science. Explanations of black boxes have fundamental implications for the field of predictive and statistical modelling. The advent of big data forces imposes usage of black boxes that are easily able to overperform classical methods. But the high performance itself does not imply that the model is appropriate. Thus, especially in applications to personalized medicine or some regulated fields, one should scrutinize decision rules incorporated in the model. New methods and tools for exploration of black-box models are useful for quick identification of problems with the model structure and increase the interpretability of a black-box Work on this project is financially supported from the OPUS grant 2017/27/B/ST6/01307 funded by Polish National Science Centre (NCN). MLGenSig 2017-2021 MLGenSig: Machine Learning Methods for building of Integrated Genetic Signatures Research project objectives. The main scientific goal of this project is to develop a methodology for integrated genetic signatures based on data from divergent high-throughput techniques used in molecular biology. Integrated signatures base on ensembles of signatures for RNA-seq, DNA-seq, data as well for methylation profiles and protein expression microarrays. The advent of high throughput methods allows to measure dozens of thousands or even millions features on different levels like DNA / RNA / protein. And nowadays in many large scale studies scientists use data from mRNA seq to assess the state of transcriptome, protein microarrays to asses the state of proteome and DNA-seq / bisulfide methylation to assess genome / methylome. Research methodology. Genetic signatures are widely used in different applications, among others: for assessing genes that differentiate cells that are chemo resistant vs. cells that are not, assess the stage of cell pluripotency, define molecular cancer subtypes. For example, in database Molecular Signatures Database v5.0 one can find thousands of gene sets - genetic signatures for various conditions. There are signatures that characterize some cancer cells, pluripotent cells and other groups. But they usually contain relatively small number of genes (around 100), results with them are hard to replicate and they are collection of features that were found significant when independently tested. In most cases signatures are derived from measurements of the same type. Like signatures based of expression of transcripts based on data from microarrays or RNA-seq, or methylation profile or DNA variation. We are proposing a very different approach. First we are going to use machine-learning techniques to create large collections of signatures. Such signatures base on ensembles of small sub-signatures, are more robust and usually have higher precision. Then out of such signatures we are going to develop methodology for meta-signatures, that integrate information from different types of data (transcriptome, proteome, genome). Great examples of such studies are: Progenitor Cell Biology Consortium (PCBC) and The Cancer Genome Atlas (TCGA) studies. For thousands of patients in different cohorts (for PCBC cohorts based on stemness phenotype, for TCGA based on cancer type) measurements of both mRNA, miRNA, DNA and methylation profiles are available. New, large datasets require new methods that take into account high and dense structure of dependencies between features. The task that we are going to solve is to develop methodology that will create genetic signatures that integrate information from different levels of cell functioning. Then we are going to use data from TCGA and PBCB project to assess the quality of proposed methodology. As a baseline we are going to use following methodologies: DESeq, edgeR (for mRNA), casper (for lternative splicing), MethylKit (for RRBS data) and RPPanalyzer for protein arrays. Here is the skeleton for our approach: (1) Use ensembles in order to building a genetic signature. The first step would be to use random forests to train a new signature. Ensembles of sub-signtures are build on bootstrap subsamples and they votes if given sample fit given signature or not. (2) In order to improve signatures we are going to consider various normalization of raw counts. We start with log and rank transformation. (3) In order to improve the process of training an ensemble we are going to use pre-filtering of genes. (4) Another approach is to use Bayesian based methods, that may incorporate the expert knowledge, like belief-based gaussian modelling Research project impact. Genetic profiling is more and more important and has number of application starting from basic classification up to personalized medicine in which patients are profiled against different signatures. Existing tools for genetic signatures have many citations. This we assume that the methodology for integrated genetic profiling will be a very useful for many research groups. It is hard to overestimate the impact of better genetic profiling on medicine. Moreover we build a team of people with knowledge in cancer genetic profiling Work on this project is financially supported from the OPUS grant 2016/21/B/ST6/02176 funded by Polish National Science Centre (NCN). "],["mi²solutions.html", "MI²Solutions", " MI²Solutions Hire a team of experienced researchers. The blue team will help you develop good predictive models, create a responsible solution tailored to your needs. The red team will help you find and analyse any weaknesses in your predictive models. It will help you confront them with domain knowledge and make sure they are resilient to future changes in the data. If you need tailor-made solutions for your individual needs, we are happy to help you too. Contact us, we can develop software for you, deploy it, provide training, discuss your needs, verify the quality of your existing solutions. Below you will find a sample offer for trainings or deployments. Research as a service Our team has experience not only in groundbreaking research, but also in deploying these research into business. There are many ways we can help, for example help in delivery of champion-challenger evaluations in which we look for potential to increase the effectiveness of predictive models in your company. take care of the whole life cycle of the predictive models, from reproducibility of results to constant monitoring and continuous improvement of the model. audit models and analyse the sensitivity and vulnerability of the model to incorrect or unexpected behaviours. We would be happy to discuss how we could help with your organisation! Trainings Based on our experience in the area of Responsible Machine Learning, developed a unique two-day hands-on training. Jump into the topic of eXplainable Artificial Intelligence with our trainers. Responsible Machine Learning images/training_xai The training is conducted once a month in small groups online. Small groups encourage questions and the interactions within the group. Language Depending on the group’s preference, the hands-on part can be carried out in R (mlr3 + DALEX) or Python (scikit-learn + dalex). The methodology part does not depend on the language. Book your training To book a training please contact with trainings(at)solutions42.ai. "],["mi²education.html", "MI²Education", " MI²Education The demand for predictive modelling skills is growing at a furious rate. Part of our mission is to develop human capital so that predictive modelling is applied responsibly and safely. We take social responsibility seriously and as part of our activities we support the development of data analysis skills among pupils, students and senior professionals alike. "],["teaching.html", "Teaching", " Teaching Programming R 22/23 Summer Programming and data analysis advanced in R lectures, labs, projects - Anna Kozak Exploratory Data Analysis 22/23 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Hubert Ruczyński, Bartłomiej Sobieski Data Visualization 22/23 Winter Data Visualization Techniques for Data Science studies lectures, labs, projects - Anna Kozak labs, projects - Mateusz Krzyziński, Hubert Ruczyński, Mikołaj Spytek Exploratory Data Analysis 21/22 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Katatzyna Woźnica Interpretable Machine Learning 21/22 Summer Interpretable Machine Learning lectures, projects - Przemysław Biecek Case Studies 21/22 Summer Case Studies for Data Science studies lectures - Weronika Hryniewska ML-1 - labs, projects - Anna Kozak ML-2 - labs, projects - Bartłomiej Eljasiak XAI-tabular - labs, projects - Mustafa Cavus AutoML - labs, projects - Katarzyna Woźnica XIC - labs, projects - Hubert Baniecki TL - labs, projects - Paulina Tomaszewska Data - labs, projects - Weronika Hryniewska NLP - labs, projects - Stanisław Giziński Data Visualization 21/22 Winter Data Visualization Techniques for Data Science studies lectures, labs, projects - Anna Kozak labs, projects - Hubert Baniecki Exploratory Data Analysis 20/21 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Krzysztof Spaliński Case Studies 20/21 Summer Case Studies for Data Science studies lectures - Katarzyna Woźnica XAI1 - labs, projects - Anna Kozak XAI2 - labs, projects - Szymon Maksymiuk DL1 - labs, projects - Weronika Hryniewska DL2 - labs, projects - Paulina Tomaszewska ML - labs, projects - Hubert Baniecki RashomonML - labs, projects - Katarzyna Woźnica Interpretable Machine Learning 20/21 Summer Interpretable Machine Learning for Data Science studies XAI stories 2 lectures, projects - Przemysław Biecek Data Visualization 20/21 Winter Data Visualization Techniques for Data Science studies lectures, labs - Alicja Gosiewska projects - Hubert Baniecki Case Studies 19/20 Summer Case Studies for Data Science studies lectures - Alicja Gosiewska Imputation - labs, projects - Katarzyna Woźnica Reproducibility of scientific papers - labs, projects - Alicja Gosiewska Interpretability - labs, projects - Katarzyna Kobylińska Interpretable Machine Learning 19/20 Summer Interpretable Machine Learning for Data Science studies lectures, projects - Przemysław Biecek Data Visualization 19/20 Summer Data Visualization for Data Science studies lectures, labs, projects - Michał Burdukiewicz "],["beta-bit.html", "Beta Bit", " Beta Bit Chaos Game EN: Are you curious about fractals? The Chaos Game is the book for you. You will learn the mathematical basis behind these figures, find out what algorithm can be used to code them, write code in your favourite programming language (Python, R, Julia?) and also explore the bibliographies of three mathematicians associated with the development of mathematics around these shapes. This is the next book in the Beta Bit series for anyone interested in computational mathematics and data analysis. PL: Jesteś ciekawy czym są fraktale? Gra w Chaos to książka dla Ciebie. Poznasz matematyczne podstawy tych figur, dowiesz się, jaki algorytm można wykorzystać do ich zaprogramowania, napiszesz kod w swoim ulubionym języku programowania (Python, R, Julia?), a także poznasz bibliografie trzech matematyków związanych z rozwojem matematyki wokół tych kształtów. To kolejna książka z serii Beta Bit dla wszystkich zainteresowanych matematyką obliczeniową i analizą danych. Flipbook online [ENG] Flipbook online [POL] Wykresy od kuchni PL: Jak tworzyć dobre wykresy? Dobre, czyli takie, które z przyjemnością się ogląda, z których można wyciągnąć wiele informacji, które są zrozumiałe dla szerokiego odbiorcy, a jednocześnie docenią je smakosze. Na bazie doświadczeń z prowadzenia tych zajęć powstały Wykresy od kuchni. To zbiór krótkich wykładów omawiających różne wątki przydatne w lepszym zrozumieniu tego, jak działa komunikacja z użyciem wykresów statystycznych. Na kolejnych stronach pojawi się wiele analogii do przyrządzania posiłków, ponieważ zarówno w kuchni, jak i w przygotowaniu wykresów statystycznych potrzebna jest praktyka, znajomość pewnych fundamentalnych prawideł, garść sprawdzonych przepisów i dużo zapału do eksperymentowania. Będąc tak uzbrojonym, każdy adept sztuki kulinarnej jest skazany na sukces. Flipbook online [POL] The Hitchhiker’s Guide to Responsible Machine Learning EN: A one-of-a-kind 52-page story about responsible machine learning. Beta and Bit use decision trees, random forests, and AutoML tools to build a risk model after a covid infection, and then use explainable artificial intelligence tools to analyze the behavior of that model. The description of the data analysis process is intertwined with descriptions of ML tools and code snippets. All examples are fully reproducible! PL: Jedyna w swoim rodzaju 52-stronicowa opowieść o odpowiedzialnym uczeniu maszynowym. Beta i Bit używają drzew decyzyjnych, lasów losowych i narzędzi AutoML do budowy modelu ryzyka po zakażeniu covid, a następnie używają narzędzi wyjaśnialnej sztucznej inteligencji by przeanalizować działanie tego modelu. Opis procesu analizy danych przeplata się na opisem kolejnych narzędzi i przykładami kodu. Wszystkie wyniki są całkowicie odtwarzalne! Flipbook online Przemysław Biecek, Anna Kozak, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 W pogoni za nieskończonością. Szeregi EN: What does hiking in the mountains have to do with the convergence of series? Quite a lot! We start with the paradoxes related to infinity, but step by step we learn the techniques of geometric series. In this book, the conditions for convergence are explained, together with numerous examples. The comic ends with a collection of exercises with different levels of difficulty. PL: Co wspólnego ma chodzenie po górach ze zbieżnością szeregów? Otóż całkiem sporo! Zaczynamy od paradoksów związanych z nieskończonością, ale krok po kroku poznajemy techniki szeregów geometrycznych. W tej pozycji wyjaśnione są warunki zbieżności wraz z licznymi przykładami. Komiks kończy zbiór zadań o różnych poziomach trudności. Flipbook online Przemysław Biecek, Łukasz Maciejewski, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 Przewodnik po pakiecie R EN: The Guide to the R package was the first published Polish book focused on the R language. The current fourth edition consists of four parts: Basics of using R (+tidyverse, shiny, knitr and other goodies), Programming in R (object-oriented, package development, class system), Statistics with R (statistical tests, models, exploration techniques) and Visualization with R (graphics, lattice and ggplot2 packages). PL: Przewodnik po pakiecie R był pierwszą wydaną polskojęzyczną książką poświęconą językowi R. Aktualne czwarte wydanie składa się z czterech części: Podstaw posługiwania się językiem R (+tidyverse, shiny, knitr i inne smaczki), Programowanie w R (obiektowe, tworzenie pakietów, system klas), Statystyka z R (testy statystyczne, modele, techniki eksploracji) i Wizualizacja z R (pakiety graphics, lattice i ggplot2). Wersja online, Książka w księgarnii. Przemysław Biecek Wydawnictwo GiS. 2008-2021 Analiza danych z programem R EN: An academic textbook describing estimation and testing topics for linear models with fixed effects, random effects and mixed effects. The theoretical introduction is complemented by numerous examples for one-way and multivariate ANOVA, one and multiple random components. The examples focus on biological and medical applications and are based on real analyses of real data. PL: Podręcznik akademicki opisujący zagadnienia estymacji i testowania dla modeli liniowych z efektami stałymi, losowymi i mieszanymi. Wprowadzenie teoretyczne jest uzupełnione o liczne przykłady dla jednokierunkowej i wielokierunkowej ANOVA, jednym i wieloma komponentami losowymi. Przykłady dotyczą głównie zastosowań biologicznych i medycznych i bazują na prawdziwych analizach rzeczywistych danych. Książka w księgarnii. Przemysław Biecek Wydawnictwo Naukowe PWN 2013-2018 Eseje o sztuce wizualizacji danych EN: Discover! Reveal! Explain! These three roles can be fulfilled by good statistical graphics. Good means understandable, faithful to the data, aesthetic. How to create such graphics? A collection of essays on the art of displaying data systematises knowledge useful in designing and producing good data visualisations. It is not easy. On the one hand, we can fall into the trap of a colourful mush full of numbers, which is sometimes proudly called infographics. On the other hand, we can fall into the trap of graphics that perfectly reproduce the complexity of numbers, and thus completely incomprehensible. Somewhere in the middle is a graphic that explains, that informs, that is aesthetically pleasing and informative. PL: Odkrywać! Ujawniać! Objaśniać! Te trzy role może spełniać dobra grafika statystyczna. Dobra czyli zrozumiała, wierna danym, estetyczna. Jak tworzyć taką grafikę? Zbiór esejów o sztuce pokazywania danych systematyzuje wiedzę przydatną do projektowania i wykonania dobrej wizualizacji danych. Nie jest to proste. Z jednej strony możemy wpaść w pułapkę pstrokatej papki najeżonej liczbami, którą czasem dumnie nazywa się infografiką. Z drugiej strony wpaść można w pułapkę grafiki idealnie odwzorowującej złożoność liczb a przez to zupełnie niezrozumiałej. Gdzieś po środku jest grafika, która wyjaśnia, która informuje, która jest estetyczna i informatywna. Książka online, Książka w księgarnii. Przemysław Biecek Wydawnictwo SmarterPoland 2008-2021 Pogromcy Danych EN: Data Crunchers is the first MOOC (Massive Open Online Course) developed in Polish for data scientists. Two modules were developed in 2015: the first one is an introduction to R, with loading data, overview of syntax, basic data types, descriptive statistics and pipelined processing. The second module is dedicated to data visualisation and statistical modelling. More than 8,000 people have registered on the Data Crunchers platform. PL: Pogromcy Danych to pierwszy MOOC (Massive Open Online Course) opracowany w języku polskim do analizy danych. W roku 2015 powstały dwa moduły: pierwszy jest wprowadzeniem do programu R, przez wczytywanie danych, omówienie składni, podstawowych typów danych, statystyk opisowych oraz przetwarzania potokowego. Drugi moduł poświęcony jest wizualizacji danych oraz modelowaniu statystycznemu. W platformie Pogromców Danych zarejestrowało się ponad 8000 osób. Przetwarzanie danych w programie R, Wizualizacja i modelowanie, Strona WWW. Przemysław Biecek ICM UW. 2015 Wykresy unplugged EN: Can you create clear charts without any electricity? An illustrated collection of exercises showing eight of the most popular ways to visualise data, with do-it-yourself challenges. Grab your crayons and start creating fantastic charts. PL: Czy można tworzyć czytelne wykresy bez użycia prądu? Ilustrowany zbiór ćwiczeń przedstawiających osiem najpopularniejszych sposobów wizualizacji danych, wraz z zadaniami do samodzielnego wykonania. Weź kredki i zacznij tworzyć fantastyczne wykresy. Flipbook online, Komiks w księgarnii. Przemysław Biecek, Ewa Baranowska, Piotr Sobczyk Fundacja Naukowa SmarterPoland.pl. 2018 W pogoni za nieskończonością EN: Two mathematicians share stories about infinity. In the first Beta attends a lecture on the properties of prime numbers. In the second, Bit breaks into the Palace of Culture and Science. How should we talk about mathematics? PL: Dwójka matematyków wymienia się opowiadaniami o nieskończoności. W pierwszym Beta bierze udział w wykładzie o właściwościach liczb pierwszych. W drugim Bit włamuje się do Pałacu Kultury i Nauki. Jak opowiadać o matematyce? Flipbook online, Komiks w księgarnii. Przemysław Biecek, Łukasz Maciejewski, Tomasz Samojlik, Sebastian Szpakowski Fundacja Naukowa SmarterPoland.pl. 2018 Jak długo żyją Muffinki? EN: A collection of three stories for children showing statistical relationships in the world around us. Beautifully illustrated stories about the distribution of height according to age, the life span of dogs or measuring the weight of trees. PL: Zbiór trzech opowiadań dla dzieci pokazującym zależności statystyczne w świecie wokół nas. Pięknie ilustrowane opowiadania o rozkładzie wzrostu w zależności od wieku, czasie życia psów czy pomiarze wagi drzew. Online: Jak szybko urosnę, Jak długo żyją Muffinki. Przemysław Biecek Fundacja Naukowa SmarterPoland.pl.2016 Pieczara Pietraszki EN: How linear regression can help in getting home, and why it’s not worth hacking into a mad mathematician’s office. A short story describing the adventures of two teenagers Beta and Bit moving around historic Warsaw. PL: W jaki sposób regresja liniowa może pomóc w powrocie do domu, oraz dlaczego nie warto włamywać się do pokoju szalonego matematyka? Lekkie opowiadanie opisujące przygody dwójki nastolatków Bety i Bita w historycznej Warszawie. Online: W jezyku Polskim, In English, По-Русски. Magda Chudzian, Przemysław Biecek Fundacja Naukowa SmarterPoland.pl. 2015 How to weight a dog with a ruler? EN: Workshop materials for children aged 8-10. Kids measure different parameters of their body, such as arm span or height. Then they create a graph summarizing the collected data and look for relations between the measured features. It just so happens that parts of the human body are proportional to each other and you can use a ruler to find this relationship. Part of the StatTub project. PL: Materiały do warsztaty dla dzieci w wieku 8-10. Dzieci mierzą różne parametry swojego ciała, takie jak rozpiętość ramion lub wzrost. Następnie tworzą wykres podsumowujący zebrane dane i szukają zależności pomiędzy zmierzonymi cechami. Tak się składa, że części ciała ludzkiego są do siebie proporcjonalne i można z użyciem linijki znaleźć tę relację. Część projektu StatTuba. Online: English, Polish, Chinese, Simplified Chinese, Czech, German, Spanish, Spanish (Latin America), French, Dutch, Vietnamese. Przemysław Biecek, Klaudia Korniluk Fundacja Naukowa SmarterPoland.pl.2016-2021 "],["responsibleml-blog.html", "ResponsibleML Blog", " ResponsibleML Blog Read more about the research, solutions and education on our blog: Tools for Explainable, Fair and Responsible ML BASIC XAI with DALEX— Part 1: Introduction Anna Kozak In this post, we will take a closer look at some algorithms used in explainable artificial intelligence. You will find here an introduction to methods of global and local model evaluation. Each description will include a technical introduction, example analysis, and code in R and Python. R packages for eXplainable Artificial Intelligence Przemysław Biecek We have prepared an overview of the most popular R-packages, which can be used to build interpretable models or to explore complex ones. Examples of knitr notebooks for more than 30 packages are available at http://xai-tools.drwhy.ai/. Adversarial attacks on Explainable AI Hubert Baniecki There are various adversarial attacks on machine learning models; hence, ways of defending, e.g. by using Explainable AI methods. Nowadays, attacks on model explanations come to light, so does the defense to such adversary. Here, we introduce fundamental concepts related to the domain. A further reference list is available at https://github.com/hbaniecki/adversarial-explainable-ai. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "MI² MI².AI", " MI² MI².AI MI².AI is a group of mathematicians and computer scientists that love to play with predictive models. We are spread between Warsaw University of Technology and University of Warsaw. Here we have workshops, seminars, here we are forging new ideas, creating tools, solving problems, doing consulting and sharing our positive attitude. Feel free to jump in. Mission Machine learning is like atomic energy. We develop leaders, skills, methods, tools and good practices so that predictive models can be deployed responsibly and sustainably. Vision MI² is a group of experts supporting global initiatives aimed at responsible and sustainable machine learning. We support the development of future leaders of responsible machine learning through internships, PhDs, postdoctoral fellowships and so on. We seek for research grants and business projects to conduct both scientific and applied research. We develop and maintain software and infrastructure necessary to build responsible and sustainable ML. We develop cooperation with international teams working on similar topics. We support companies to implement best practices related to responsible modelling in their operation. We conduct workshops and training on responsible predictive modelling. "],["the-team.html", "The Team", " The Team Members Przemysław Biecek, PhD, DSc (Team Leader) Hubert Baniecki, PhD student Mustafa Cavus, PhD Maciej Chrabąszcz, MSc student Mateusz Grzyb, MSc student Stanisław Giziński, MSc student Weronika Hryniewska-Guzik, PhD student Piotr Komorowski, MSc student Mateusz Krzyziński, MSc student Tymoteusz Kwieciński, BSc student Stanisław Łaniewski, PhD student Piotr Piątyszek, BSc student Hubert Ruczyński, MSc student Barbara Rychalska, PhD student Nuno Sepúlveda, PhD Bartek Sobieski, MSc student Mikołaj Spytek, MSc student Tomasz Stanisławek, PhD Hoang Thien Ly, BSc student Paulina Tomaszewska, PhD student Piotr Wilczyński, BSc student Emilia Wiśnios, MSc student Paweł Wojciechowski, BSc Katarzyna Woźnica, PhD student Vladimir Zaigrajew, PhD student Artur Żółkowski, BSc student Collaborators Mariusz Adamek, Prof, MD Przemysław Bombiński, PhD, MD André Fonseca, PhD student Katarzyna Kobylińska, PhD student Anna Kozak, MSc Marcin Luckner, PhD João Malato, PhD student Bartek Pieliński, PhD, DSc Hanna Piotrowska, MA Elżbieta Sienkiewicz, PhD Julian Sienkiewicz, PhD Adrian Stańdo, MSc student Patryk Szatkowski, PhD student, MD Jakub Wiśniewski, MSc student Alumni Piotr Czarnecki, MSc Alicja Gosiewska, MSc Adrianna Grudzień, BSc Maria Kałuska, BSc Marcin Kosiński, MSc Adam Kozłowski, MSc Wojciech Kretowicz, BSc Michał Kuźba, MSc Szymon Maksymiuk, BSc Tomasz Mikołajczyk, PhD Katarzyna Pękala, MSc Adam Rydelek, BSc Bartosz Sawicki, BSc Patryk Słowakiewicz, BSc Michał Sokólski, MSc Mateusz Stączek, BSc Szymon Szmajdziński, BSc Zuzanna Trafas, BSc Kinga Ułasik, BSc Anna Wróblewska, PhD Hanna Zdulska, BSc Przemysław Biecek My personal mission is to enhance human capabilities by supporting them through access to data-driven and knowledge-based predictions. I execute it by developing methods and tools for responsible machine learning, trustworthy artificial intelligence and reliable software engineering. I work as an associate professor at Warsaw University of Technology and the University of Warsaw. I graduated in software engineering and mathematical statistics and now work on model visualisation, explanatory model analysis, predictive modelling and data science for healthcare. In 2016, I formed the research group MI² which develops methods and tools for predictive model analysis. Google Scholar: Af0O75cAAAAJ GitHub: pbiecek LinkedIn: pbiecek Mariusz Adamek I work at two Medical Universities (Silesia and Gdańsk) holding a Professorship in Medicine and Health Sciences. My interests are focused on lung cancer prevention and screening, the latter by means of low-dose computed tomography (LDCT) with special emphasis put on molecular biology methods, prediction models and image analysis aimed to enhance the performance of lung screening outcomes. Website: www.mariuszadamek.io Hubert Baniecki I’m a PhD student in Computer Science at the University of Warsaw. I previously did my MSc (2022) and BSc (2021) in Data Science at Warsaw University of Technology. My main research interest is explainable machine learning, with particular emphasis on adversarial attacks & evaluation of explanations. I care about human-model interaction with applications in biomedicine. I support the development and maintenance of several open-source Python & R packages for building predictive models responsibly. Website: hbaniecki.com Mustafa Cavus I work as an assistant professor at Warsaw University of Technology and the Eskisehir Technical University. I joined the MI² DataLab as a post-doc researcher in 2021. I work on explainable artificial intelligence and AutoML. Google Scholar: I63d1WIAAAAJ&hl GitHub: mcavus LinkedIn: mcavus Twitter: mcavus Julian Sienkiewicz I work as an assistant professor at Faculty of Physics, WUT. My main research area links with sociophysics, complex networks and agent-based models. In the scope of MI² DataLab I follow my other interest - scientometrics. Google Scholar: mIwu11QAAAAJ LinkedIn: julek-sienkiewicz-873829 Maciej Chrabąszcz Master’s student in mathematical statisctics at Warsaw University of Technology. Interested in deep learning on text and images, explainable and responsible AI. GitHub: maciejchrabaszcz LinkedIn: maciej-chrabaszcz Stanisław Giziński A Research Software Engineer and student of Machine Learning at Faculty of Mathematics Informatics and Mechanic, University of Warsaw. His work in the lab focuses on using natural language processing and network analysis to better understand the spread of AI public policies. Interested also in applying machine learning in bioinformatics. Google Scholar: Stanisław Giziński GitHub: Gizzio LinkedIn: stanislaw-gizinski Mateusz Grzyb MSc student in Data Science at Warsaw University of Technology. Interested in artificial intelligence and scientific computing, but above all simply enjoys programming. GitHub: mgrzyb99 Weronika Hryniewska-Guzik PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning modelling on medical images in the context of explainability and responsible AI. Google Scholar: aJeg3IQAAAAJ GitHub: Hryniewska LinkedIn: weronikahryniewska Piotr Komorowski Master’s student in Machine Learning at the University of Warsaw. Mainly interested in image processing and XAI applied to medical images. GitHub: piotr-komorowski LinkedIn: Piotr-Komorowski Anna Kozak Graduated in mathematical statistics at Warsaw University of Technology. Interested in explainable artificial intelligence and data visualization. Organizes projects related to education. Google Scholar: JIrqf9kAAAAJ GitHub: kozaka93 LinkedIn: kozakanna Mateusz Krzyziński MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, with particular emphasis on XAI methods for survival analysis models and XAI applications in the medical field. Also an enthusiast of data visualization. Google Scholar: i_r7EUgAAAAJ GitHub: krzyzinskim LinkedIn: krzyzinskim Tymoteusz Kwieciński BSc student in Data Science at Warsaw University of Technology. Particularly interested in explainable artificial intelligence, computer vision and NLP. GitHub: Fersoil LinkedIn: Tymoteusz-Kwieciński Stanisław Łaniewski PhD student in Quantitative Psychology and Economics at University of Warsaw, Machine Learning Researcher at MI2 Data Lab, Msc in Actuarial Science and Mathematical Finance at University of Amsterdam, former Quantitative Researcher at Flow Traders His research focuses on enhancing classical methods used in discrete choice and finance with machine learning and how to apply them to explain behavioral phenomena and heuristics. He is also keen on finding balance between best predictive models and their explainability. Avid gamer who applies statistical techniques to deepen the understanding of best strategies LinkedIn: Stanisław-Łaniewski Piotr Piątyszek Undergraduate Data Science student at Warsaw University of Technology. Works as a research software engineer on enhancing accessibility and completeness of explainable AI. During pandemic contributes to a system of monitoring covid variants. Github: piotrpiatyszek Bartosz Pieliński I am an Assistant Professor at the Faculty of Political Science and International Studies at Warsaw University. I am interested in applying quantitative methods to study public policies. I am a founding member of the Institutional Grammar Research Initiative, which is focused on developing a new way of analysing social rules. I have participated in several research projects covering social policy, non-profit organizations, social enterprises, and international organizations. Website: https://pielinski.info/ Google Scholar: hnWiaVEAAAAJ LinkedIn: Bartosz Pieliński Hanna Piotrowska Information designer, focusing mainly on data visualization, branding and book design, with a strong interest in Data Science and perception studies. Winner of numerous awards, including The Kantar Information Is Beautiful Awards, HOW International Design Awards, Polish Graphic Design Awards and KTR. LinkedIn: hanna-piotrowska Twitter: hannapio Behance: hannapio. Hubert Ruczyński I am working towards Masters’s degree in Data Science at Warsaw University of Technology. I am also teaching students about data exploration and visualisation. My major interests are: AutoML | Natural Language Processing | Data Visualization | Fairness. GitHub: HubertR21 LinkedIn: Hubert Ruczyński Barbara Rychalska PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP), recommender systems and graph-based learning. Google Scholar: Wp0wHJoAAAAJ LinkedIn: Barbara-Rychalska Bartek Sobieski MSc student in Data Science at Warsaw University of Technology. Interested in deep learning and hyperparameter optimization. GitHub: sobieskibj LinkedIn: Bartłomiej-Sobieski Mikołaj Spytek MSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, data vizualization and survival analysis. Google Scholar: 1u49AqYAAAAJ GitHub: mikolajsp LinkedIn: Mikołaj-Spytek Tomasz Stanisławek PhD candidate in computer science at Warsaw University of Technology. Mainly interested in deep learning for natural language processing (NLP). Google Scholar: gq8NY_UAAAAJ GitHub: tstanislawek LinkedIn: Tomasz-Stanisławek Paulina Tomaszewska PhD candidate in Computer Science at Warsaw University of Technology. Gained experience in AI at leading universities during: Deep Learning Summer School at Tsinghua University (China), one-semester exchange at Nanyang Technological University (Singapore) and research internships at Gwangju Institute of Science and Technology (South Korea) and Institute of Science and Technology (Austria). Mainly interested in Deep Learning, Computer Vision and Transfer Learning. Recently, focused on digital pathology. Google Scholar: eO245iMAAAAJ LinkedIn: paulina-tomaszewska Hoang Thien Ly Bachelor student in Maths and Data Analysis at Warsaw University of Technology. Interested in working with data, and learning explainable artificial intelligence methods. Google Scholar: JkysewYAAAAJ GitHub: lhthien09 LinkedIn: hthienly Piotr Wilczyński BSc student in Data Science at Warsaw University of Technology. Interested in Large Language Models, AI Deception and Natural Language Processing. Currently working on my thesis, which applies Computer Vision to medicine. GitHub: wi1lku LinkedIn: Piotr-Wilczyński Jakub Wiśniewski Research Software Engineer and third year Data Science student at Warsaw University of Technology. Developer of tools for bias detection and fairness. Currently researching responsible applications of deep learning. President of Data Science Science Club at WUT. Google Scholar: _6eQsXMAAAAJ GitHub: jakwisn LinkedIn: jakwisn Emilia Wiśnios Research Software Engineer and student of Machine Learning at Faculty of Mathematics, Informatics and Mechanics, University of Warsaw. Interested in natural language processing and reinforcement learning. GitHub: emiliawisnios LinkedIn: emilia-wisnios Paweł Wojciechowski Graduated with a bachelor’s degree in Data Science from Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision, and active learning. GitHub: p-wojciechowski LinkedIn: wojciechowski-p Katarzyna Woźnica PhD candidate in computer science at Warsaw University of Technology. Graduated in mathematical statistics. Interested in automated machine learning especially in hyperparameter tuning for tabular data. Carrying statistical analysis and predictive modelling for healthcare. Google Scholar: tAQS1gQAAAAJ GitHub: woznicak LinkedIn: woznicak Vladimir Zaigrajew PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning, primarily on images, with a focus on representation learning. GitHub: WolodjaZ LinkedIn: vladimir-zaigrajew Artur Żółkowski BSc student in Data Science at Warsaw University of Technology. Interested in explainable artificial intelligence, computer vision and NLP. GitHub: arturzolkowski LinkedIn: Artur-Żółkowski "],["open-positions.html", "Open Positions", " Open Positions We have open call for HOMER and xLungs projects for following positions. If you are interested in any of them please send your CV and Motivation Letter to przemyslaw.biecek at pw.edu.pl. We reserve the right to contact only selected candidates. PhD positions in 2023 Call for PhD Student: Research Opportunity in Foundation Model Redteaming We are seeking a highly motivated and talented PhD student to join MI2.AI research team in the exciting field of foundation model redteaming. As the advancements development of AI models (especially transformer based with attention mechanism), it becomes crucial to ensure the robustness and security of the foundation models that underlie these advancements. Project Description: The selected PhD student will work on investigating potential vulnerabilities and biases in foundation models, such as language models (one PhD position) and 2D/3D computer vision models (one PhD position). Redteaming involves conducting adversarial assessments to identify weaknesses, uncover potential attack vectors, and develop defensive strategies for foundation models/transformer models. The research will explore various aspects, including data poisoning attacks, adversarial samples generation, and the development of countermeasures. Requirements: Proficiency in programming languages commonly used in AI research, such as Python, R or Julia. Familiarity with deep learning frameworks and libraries, such as PyTorch. Solid understanding of foundation models, including their architecture and training processes. Excellent problem-solving skills and ability to work independently and as part of a team. Good communication skills and proficiency in academic writing. Interested candidates are requested to submit the following documents to przemyslaw.biecek at pw.edu.pl. We look forward to receiving your applications and welcoming an enthusiastic and dedicated PhD student to contribute to our foundation model redteaming research endeavors. Apply before June 30 the end of the day (June 10 in case of interest in MIM UW). Deep Learning Engineer Required: Background in Computer Science, Mathematics, Statistics or similar. Experience in Deep Learning for 2d/3d image data (*torch is a plus) Interest in medical applications Scope of work: Training of machine learning models for tabular and image data Responsible ML solutions for the healthcare domain Offer: Excellent atmosphere for work in a young and very active lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Elastic working hours that can be combined with studies Research Software Engineer Required: Background in Computer Science, Mathematics, Statistics or similar. Experience in Scientific Programming (R and/or Python and/or C++) Interest in applications, machine learning and explainable artificial intelligence. Scope of work: Training of machine learning models for tabular and image data Interpretable solutions for tabular and image data Interactive interfaces Responsible ML solutions for the healthcare domain Offer: Excellent atmosphere for work in a young and very active lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Elastic working hours that can be combined with studies Post-doc I am looking for a post-doc to join the MI2.AI team for one or two years within HOMER project. The ideal fit is someone with (experience in Medical Image Analysis with deep learning models) OR (someone with experience in AutoML for tabular data). Experience or interest in XAI / fairness will always be a huge plus in our team. We have a young and very energetic team focused on growth in AI. Full time position. Salary above academic average. Little or no didactics. Strong focus on doing things that are meaningful. Required: PhD in Computer Science, Mathematics, Statistics or similar. (Different background? We encourage cross-domain short research visits). Experience in Scientific Programming (R and/or Python and/or C++) Good track of Scientific Records Scope of work: Automated model exploration Interpretable measures for model performance Meta/transfer learning in automated model development Automated model validation Experiments with XAI for deep learning Offer: Excellent atmosphere for work in a young and very active ML lab Conferences and training budget Short visits in cooperating abroad group Access to CPU / GPU clusters Full-time job at Warsaw University of Technology plus results-driven extras contract for 6 months (short visit) / 12 months or 24 months (long visit) "],["contact.html", "Contact", " Contact Our rooms: 44 (DataLab - separate entrance in front of the main entrance) 316 (xLungs) 317 (HOMER) Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warszawa VAT: PL 5250005834 "],["mi2redteam.html", "MI²RedTeam", " MI²RedTeam MI²RedTeam analyses machine and deep learning predictive models through the lens of AI explainability, fairness, security and human trust. We develop methods and tools for explanatory model analysis and apply them in practice. MI²RedTeam is a group of researchers experienced in XAI who perform a rigorous evaluation of AI solutions in order to improve their transparency and security. We apply state-of-the-art methods and introduce new ones to tailor our analysis to the specific predictive task. We openly collaborate on various topics related to explainable and interpretable machine learning. Feel free to reach out to us with research ideas and development opportunities. We help organizations to better understand the vulnerabilities of their AI systems, and take steps to mitigate them. Our current core research topics of interest include: [ARES] Attack-Resistant Explanations towards Secure AI, i.e. a critical evaluation of the state-of-the-art analysis techniques [xSurvival] Explanatory analysis of machine learning survival models [Large Model Analysis] Explanatory analysis of large models, e.g. transformers Methods and methodologies introduced by our team: Evaluating explanations of vision transformers InteractiveEMA towards human-model interaction in explainable machine learning for tabular data SurvSHAP(t) for time-dependent analysis of machine learning survival models LIMEcraft for human-guided visual explanations of deep neural networks Fooling PD & Manipulating SHAP for stress-testing widely-applied explanation methods Checklist towards responsible deep learning on medical images SAFE for lifting interpretability-performance trade-off via automated feature engineering WildNLP for stress-testing deep learning models in NLP Explanatory Model Analysis towards comprehensive examination of predictive models Tools developed by our team: DALEX, breakDown, auditor & modelStudio for explainable machine learning in R dalex for explainable and fair machine learning in Python survex dedicated to explaining machine learning survival models fairmodels for fairness analysis of machine learning classification models Applications supported by our team: In medicine, we analyzed hundreds of models predicting among others: survival in uveal melanoma eye cancer, survival in sepsis, type of lung cancer, lung cancer risk in screening, lung cancer mortality, COVID-19 mortality, hospital length of stay, progression of Alzheimer’s disease. In credit scoring, we analyzed the transparency, auditability, and explainability of machine learning models. In football analytics, we analyzed expected goal models for performance analysis. … This initiative is generously supported by the following institutions. "],["mi²research.html", "MI²Research", " MI²Research On a mission to responsibly build machine learning predictive models. Responsible and sustainable predictive modelling is still a new and developing area. We are conducting a number of studies in this domain that examine predictive models applied to tabular data, computer vision or natural language processing models. We investigate the stability and robustness of various methods, work on explainability and transparency for simple and complex models. As part of our this effort, we develop open source software packages (usually in R and Python) for model explanatory analysis, publish scientific articles describing new methods or investigating properties of already known methods, and create educational materials, recommendations and examples of application in specific domains. If you want to find out more about what we are working on, check out our seminar, which is always open to those interested in responsible and sustainable data science. "],["papers.html", "Papers", " Papers Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarks Katarzyna Woźnica, Mateusz Grzyb, Zuzanna Trafas, Przemysław Biecek Machine Learning (2023) This paper proposes a new formulation of the tuning problem, called consolidated learning, more suited to practical challenges faced by model developers, in which a large number of predictive models are created on similar datasets. We show that a carefully selected static portfolio of hyperparameter configurations yields good results for anytime optimization, while maintaining the ease of use and implementation. We demonstrate the effectiveness of this approach through an empirical study for the XGBoost algorithm and the newly created metaMIMIC benchmarks of predictive tasks extracted from the MIMIC-IV medical database. Towards Evaluating Explanations of Vision Transformers for Medical Imaging Piotr Komorowski, Hubert Baniecki, Przemysław Biecek CVPR Workshop on Explainable AI for Computer Vision (2023) This paper investigates the performance of various interpretation methods on a Vision Transformer (ViT) applied to classify chest X-ray images. We introduce the notion of evaluating faithfulness, sensitivity, and complexity of ViT explanations. The obtained results indicate that Layerwise relevance propagation for transformers outperforms Local interpretable model-agnostic explanations and Attention visualization, providing a more accurate and reliable representation of what a ViT has actually learned. Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics Hubert Baniecki, Bartlomiej Sobieski, Przemysław Bombiński, Patryk Szatkowski, Przemysław Biecek International Conference on Artificial Intelligence in Medicine (2023) To what extent can the patient’s length of stay in a hospital be predicted using only an X-ray image? We answer this question by comparing the performance of machine learning survival models on a novel multi-modal dataset created from 1235 images with textual radiology reports annotated by humans. We introduce time-dependent model explanations into the human-AI decision making process. For reproducibility, we open-source code and the TLOS dataset at this URL. SurvSHAP(t): Time-dependent explanations of machine learning survival models Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek Knowledge-Based Systems (2023) In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at this URL. The grammar of interactive explanatory model analysis Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek Data Mining and Knowledge Discovery (2023) This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe human-model interaction. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model may increase the accuracy and confidence of human decision making. Climate Policy Tracker: Pipeline for automated analysis of public climate policies Artur Żółkowski, Mateusz Krzyziński, Piotr Wilczyński, Stanisław Giziński, Emilia Wiśnios, Bartosz Pieliński, Julian Sienkiewicz, Przemysław Biecek NeurIPS Workshop on Tackling Climate Change with Machine Learning (2022) In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. Explainable expected goal models for performance analysis in football analytics Mustafa Cavus, Przemyslaw Biecek International Conference on Data Science and Advanced Analytics (2022) The expected goal provides a more representative measure of the team and player performance which also suit the low-scoring nature of football instead of the score in modern football. This paper proposes an accurate expected goal model trained on 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. Moreover, we demonstrate a practical application of aggregated profiles to explain a group of observations on an accurate expected goal model for monitoring the team and player performance. Multi-omics disease module detection with an explainable Greedy Decision Forest Bastian Pfeifer, Hubert Baniecki, Anna Saranti, Przemyslaw Biecek, Andreas Holzinger Scientific Reports (2022) In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest (GDF) with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Our proposed explainable approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer. Interpretable meta-score for model performance Alicja Gosiewska, Katarzyna Woźnica, Przemysław Biecek Nature Machine Intelligence (2022) Elo-based predictive power (EPP) meta-score that is built on other performance measures and allows for interpretable comparisons of models. Differences between this score have a probabilistic interpretation and can be compared directly between data sets. Furthermore, this meta-score allows for an assessment of ranking fitness. We prove the properties of the Elo-based predictive power meta-score and support them with empirical results on a large-scale benchmark of 30 classification data sets. Additionally, we propose a unified benchmark ontology that provides a uniform description of benchmarks. fairmodels: a Flexible Tool for Bias Detection, Visualization, and Mitigation in Binary Classification Models Jakub Wiśniewski, Przemyslaw Biecek The R Journal (2022) This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in binary classification models quickly and flexibly. It offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented functions and fairness metrics enable model fairness validation from different perspectives. In addition, the package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model. The package is designed to examine a single model and facilitate comparisons between multiple models. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s Disease Angela Lombardi, Domenico Diacono, Nicola Amoroso, Przemysław Biecek, Alfonso Monaco, Loredana Bellantuono, Ester Pantaleo, Giancarlo Logroscino, Roberto De Blasi, Sabina Tangaro, Roberto Bellotti Brain Informatics (2022) In this work, we present a robust framework to (i) perform a threefold classification between healthy control subjects, individuals with cognitive impairment, and subjects with dementia using different cognitive indexes and (ii) analyze the variability of the explainability SHAP values associated with the decisions taken by the predictive models. We demonstrate that the SHAP values can accurately characterize how each index affects a patient’s cognitive status. Furthermore, we show that a longitudinal analysis of SHAP values can provide effective information on Alzheimer’s disease progression. LIMEcraft: handcrafted superpixel selection and inspection for Visual eXplanations Weronika Hryniewska, Adrianna Grudzień, Przemysław Biecek Machine Learning (2022) LIMEcraft enhances the process of explanation by allowing a user to interactively select semantically consistent areas and thoroughly examine the prediction for the image instance in case of many image features. Experiments on several models show that our tool improves model safety by inspecting model fairness for image pieces that may indicate model bias. The code is available at: this URL. Fooling Partial Dependence via Data Poisoning Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek ECML PKDD (2022) We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. Manipulating SHAP via Adversarial Data Perturbations (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2022) We introduce a model-agnostic algorithm for manipulating SHapley Additive exPlanations (SHAP) with perturbation of tabular data. It is evaluated on predictive tasks from healthcare and financial domains to illustrate how crucial is the context of data distribution in interpreting machine learning models. Our method supports checking the stability of the explanations used by various stakeholders apparent in the domain of responsible AI; moreover, the result highlights the explanations’ vulnerability that can be exploited by an adversary. A Signature of 14 Long Non-Coding RNAs (lncRNAs) as a Step towards Precision Diagnosis for NSCLC Anetta Sulewska, Jacek Niklinski, Radoslaw Charkiewicz, Piotr Karabowicz, Przemyslaw Biecek, Hubert Baniecki, Oksana Kowalczuk, Miroslaw Kozlowski, Patrycja Modzelewska, Piotr Majewski et al. Cancers (2022) The aim of the study was the appraisal of the diagnostic value of 14 differentially expressed long non-coding RNAs (lncRNAs) in the early stages of non-small-cell lung cancer (NSCLC). We established two classifiers. The first recognized cancerous from noncancerous tissues, the second successfully discriminated NSCLC subtypes (LUAD vs. LUSC). Our results indicate that the panel of 14 lncRNAs can be a promising tool to support a routine histopathological diagnosis of NSCLC. dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python Hubert Baniecki, Wojciech Kretowicz, Piotr Piątyszek, Jakub Wiśniewski, Przemyslaw Biecek Journal of Machine Learning Research (2021) We introduce dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness. It adopts the design crafted through the development of various tools for explainable machine learning; thus, it aims at the unification of existing solutions. This library’s source code and documentation are available under open license at this URL. Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies Weronika Hryniewska, Przemysław Bombiński, Patryk Szatkowski, Paulina Tomaszewska, Artur Przelaskowski, Przemysław Biecek Pattern Recognition (2021) Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model. Towards explainable meta-learning Katarzyna Woźnica, Przemyslaw Biecek ECML PKDD Workshop on eXplainable Knowledge Discovery in Data Mining (2021) To build a new generation of meta-models we need a deeper understanding of the importance and effect of meta-features on the model tunability. In this paper, we propose techniques developed for eXplainable Artificial Intelligence (XAI) to examine and extract knowledge from black-box surrogate models. To our knowledge, this is the first paper that shows how post-hoc explainability can be used to improve the meta-learning. Prevention is better than cure: a case study of the abnormalities detection in the chest Weronika Hryniewska, Piotr Czarnecki, Jakub Wiśniewski, Przemysław Bombiński, Przemysław Biecek CVPR Workshop on “Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision” (2021) In this paper, we analyze in detail a single use case - a Kaggle competition related to the detection of abnormalities in X-ray lung images. We demonstrate how a series of simple tests for data imbalance exposes faults in the data acquisition and annotation process. Complex models are able to learn such artifacts and it is difficult to remove this bias during or after the training. Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering Alicja Gosiewska, Anna Kozak, Przemysław Biecek Decision Support Systems (2021) We propose a framework that uses elastic black boxes as supervisor models to create simpler, less opaque, yet still accurate and interpretable glass box models. The new models were created using newly engineered features extracted with the help of a supervisor model. We supply the analysis using a large-scale benchmark on several tabular data sets from the OpenML database. The first SARS-CoV-2 genetic variants of concern (VOC) in Poland: The concept of a comprehensive approach to monitoring and surveillance of emerging variants Radosław Charkiewicz, Jacek Nikliński, Przemysław Biecek, Joanna Kiśluk, Sławomir Pancewicz, Anna Moniuszko-Malinowska, Robert Flisiak, Adam Krętowski, Janusz Dzięcioł, Marcin Moniuszko, Rafał Gierczyński, Grzegorz Juszczyk, Joanna Reszeć Advances in Medical Sciences (2021) This study shows the first confirmed case of SARS-CoV-2 in Poland with the lineage B.1.351 (known as 501Y.V2 South African variant), as well as another 18 cases with epidemiologically relevant lineage B.1.1.7, known as British variant. Responsible Prediction Making of COVID-19 Mortality (Student Abstract) Hubert Baniecki, Przemyslaw Biecek AAAI Conference on Artificial Intelligence (2021) During the literature review of COVID-19 related prognosis and diagnosis, we found out that most of the predictive models are not faithful to the RAI principles, which can lead to biassed results and wrong reasoning. To solve this problem, we show how novel XAI techniques boost transparency, reproducibility and quality of models. Models in the Wild: On Corruption Robustness of Neural NLP Systems Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, Przemyslaw Biecek International Conference on Neural Information Processing (2019) In this paper we introduce WildNLP - a framework for testing model stability in a natural setting where text corruptions such as keyboard errors or misspelling occur. We compare robustness of deep learning models from 4 popular NLP tasks: Q&A, NLI, NER and Sentiment Analysis by testing their performance on aspects introduced in the framework. In particular, we focus on a comparison between recent state-of-the-art text representations and non-contextualized word embeddings. In order to improve robustness, we perform adversarial training on selected aspects and check its transferability to the improvement of models with various corruption types. We find that the high performance of models does not ensure sufficient robustness, although modern embedding techniques help to improve it. auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics Alicja Gosiewska, Przemyslaw Biecek The R Journal (2019) This paper describes methodology and tools for model-agnostic auditing. It provides functinos for assessing and comparing the goodness of fit and performance of models. In addition, the package may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. The code presented in this paper are implemented in the auditor package. Its flexible and consistent grammar facilitates the validation models of a large class of models. Explanations of Model Predictions with live and breakDown Packages Mateusz Staniak, Przemyslaw Biecek The R Journal (2018) Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used for explaining predictions from complex black box models and attributing parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state-of-the-art solutions, namely, lime (Pedersen and Benesty, 2018) which implements Locally Interpretable Model-agnostic Explanations and iml (Molnar et al., 2018) which implements Shapley values. DALEX: Explainers for Complex Predictive Models in R Przemyslaw Biecek Journal of Machine Learning Research (2018) This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method irrespective of its internal structure. Each explainer is linked with a specific aspect of a model. Every explainer presented here works for a single model or for a collection of models. In the latter case, models can be compared against each other. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. archivist: An R Package for Managing, Recording and Restoring Data Analysis Results Przemyslaw Biecek, Marcin Kosiński Journal of Statistical Software (2017) Everything that exists in R is an object (Chambers 2016). This article examines what would be possible if we kept copies of all R objects that have ever been created. Not only objects but also their properties, meta-data, relations with other objects and information about context in which they were created. We introduce archivist, an R package designed to improve the management of results of data analysis. "],["software.html", "Software", " Software DALEX XAI with DALEX for R and Python survex Explainable machine learning in survival analysis Arena Interactive tool for the exploration and comparison of models’ explanations fairmodels Fairness with fairmodels COVID-19 COVID-19 Risk Score archivist Model governance with R (MLOps) "],["books.html", "Books", " Books Explanatory Model Analysis Explore, Explain, and Examine Predictive Models. With examples in R and Python Przemysław Biecek, Tomasz Burzykowski Chapman and Hall/CRC, New York (2021) Odkrywać! Ujawniać! Objaśniać! Zbiór esejów o sztuce prezentowania danych Essays on the art of data visualisation Przemysław Biecek University of Warsaw Press (2016) Analiza danych z programem R. Modele liniowe z efektami stałymi, losowymi i mieszanymi Data analysis in R. Linear models with fixed, random and mixed effects Przemysław Biecek Polish Scientific Publishers PWN (2013) "],["seminars.html", "Seminars", " Seminars We meet every Monday, at 10 am online or in MI2DataLab (room 044, Faculty of Mathematics and Information Science, Warsaw University of Technology). Join us at https://meet.google.com/nno-okiz-bxy (or http://meet.drwhy.ai/) List of topics and materials from past seminars: https://github.com/MI2DataLab/MI2DataLab_Seminarium "],["research-grants.html", "Research grants", " Research grants ARES 2022-2026 ARES: Attack-resistant Explanations toward Secure and trustworthy AI Machine learning explainability, fairness, robustness, and security are key elements of trustworthy Artificial Intelligence, an area of strategic importance. In this context, the main goals of the ARES project are: Develop adversarial attacks on state-of-the-art explanations to investigate vulnerabilities and limitations of the existing explainability and fairness approaches in machine learning. Introduce novel robust explanations that are stable against manipulation and intuitive to evaluate. Achieving the first goal primarily impacts various domains of research, which currently use (and explain) black-box models for knowledge discovery and decision-making, by highlighting vulnerabilities and limitations of their explanations. Achieving the second goal impacts more the broad machine learning domain as it aims at improving state-of-the-art by introducing robust explanations toward secure and trustworthy AI. Work on this project is financially supported by the Polish National Science Centre PRELUDIUM BIS grant 2021/43/O/ST6/00347. DARLING 2022-2024 DARLING: Deep Analysis of Regulations with Language Inference, Network analysis and institutional Grammar Aim of the project Developing the tools for automated analysis of content of legal documents leveraging Natural Language Processing, that will help understand the dynamic of change in public policies and variables influencing those changes. Those tools will be firstly used to analyse the case of development of policy subsystem regulating usage of AI in the European Union. Specific goals of the project Developing and evaluating multilingual models for issue classification for legal and public policy documents. Developing embedding-based topic modeling methods for legal and public policy documents suited for analysis of change of the topics between documents. Institutional grammar based analysis of changes in topics between different public policy documents, regulations and public consultation documents. Agent-based models predicting diffusion of issues in public policy documents. Methodology The core of the DARLING project is the issues and topic analysis in documents connected with regulations development using NLP tools. Issues analysis shall allow tracking how different options of AI operationalisation, ways the AI-connected threats are perceived as well as ideas regarding AI regulations are shared among three different types of texts: scientific, expert and legal ones. The extracted issues will then be subject to complex networks analysis and institutional grammar approach. The network analysis, backed by agent-based modeling, will be used to examine the flow of issues among the documents based on their vector-formed characteristics. On the other hand, the Institutional Grammar (IG) will be used to analyze the modality of issues, e.g., the tendency to regulate a specific aspect of AI in a given issue, its deontic character or its conditionality. In result the DARLING project will effect in the development of new methods to analyze legal documents connected to regulation based on deep text processing and links among the documents. An inter-institutional and interdisciplinary team of computer, political sciences and physics of complex systems scientists will elaborate new machine learning approaches to examine the regulation corpora, issues recognition, issues analysis by the means of IG as well as propose new methods of modeling the flow/changes of regulations based on complex networks tools. X-LUNGS 2021-2024 X-LUNGS: Responsible Artificial Intelligence for Lung Diseases The aim of the project is to support the process of identification of lesions visible on CT and lung x-rays. We intend to achieve this goal by building an information system based on artificial intelligence (AI) that will support the radiologist’s work by enriching the images with additional information. The unique feature of the proposed system is a trustworthy artificial intelligence module that: will reduce the image analysis time needed to detect lesions, will make the image evaluation process more transparent, will provide image and textual explanations indicating the rationale behind the proposed recommendation, will be verified for effective collaboration with the radiologist. Work on this project is financially supported from the INFOSTRATEG-I/0022/2021-00 grant funded by Polish National Centre for Research and Development (NCBiR). HOMER 2020-2025 HOMER: Human Oriented autoMated machinE leaRning One of the biggest challenges in the state-of-the-art machine learning is dealing with the complexity of predictive models. Recent techniques like deep neural networks, gradient boosting or random forests create models with thousands or even millions of parameters. This makes decisions generated by these black-box models completely opaque. Model obscurity undermines trust in model decisions, hampers model debugging, blocks model auditability, exposes models to problems with concept drift or data drift. Recently, there has been a huge progress in the area of model interpretability, which results in the first generation of model explainers, methods for better understanding of factors that drive model decisions. Despite this progress, we are still far from methods that provide deep explanations, confronted with domain knowledge that satisfies our ,,Right to explanation’’ as listed in the General Data Protection Regulation (GDPR). In this project I am going to significantly advance next generation of explainers for predictive models. This will be a disruptive change in the way how machine learning models are created, deployed, and maintained. Currently to much time is spend on handcrafted models produced in a tedious and laborious try-and-error process. The proposed Human-Oriented Machine Learning will focus on the true bottleneck in development of new algorithms, i.e. on model-human interfaces. The particular directions I consider are (1) developing an uniform grammar for visual model exploration, (2) establishing a methodology for contrastive explanations that describe similarities and differences among different models, (3) advancing a methodology for non-additive model explanations, (4) creating new human-model interfaces for effective communication between models and humans, (5) introducing new techniques for training of interpretable models based on elastic surrogate black-box models, (6) rising new methods for automated auditing of fairness, biases and performance of predictive models. Work on this project is financially supported from the SONATA BIS grant 2019/34/E/ST6/00052 funded by Polish National Science Centre (NCN). DeCoviD 2020-2022 DeCoviD: Detection of Covid-19 related markers of pulmonary changes using Deep Neural Networks models supported by eXplainable Artificial Intelligence and Cognitive Compressed Sensing Covid-19 is an infectious respiratory disease. A coronavirus infection leaves permanent ramifications in the respiratory system and beyond. In this situation, tools supporting diagnosis and assessment of lung damage after infection and during Covid-19 treatment are crucial. Preliminary results of analysis of CT images and lung xrays suggest that they can help to quickly assess even asymptomatic cases and facilitate prognosis of response to treatment. There are also reports of usefulness of ultrasound images. The aim of the DeCoviD project is to develop methods and tools to support radiologists in the assessment of lung imaging data for the occurrence of changes caused by Covid-19 disease. The developed solution will allow to automate the identification of pathological changes and will support the diagnosis of coexisting lung diseases as well as diseases of other organs visible on chest images. It will also allow to quantify the severity of lung damage caused by the disease Responsible decision support for radiologists requires models based on interpretable features. Such features will be stored in a hybrid knowledge base powered by two research teams from WUT, working on the basis of two, seemingly opposite, paradigms of image data analysis. The eXplainable Artificial Intelligence (XAI) team will use trained deep networks to automatically extract features that are essential for effective disease detection. Cognitive Compressed Sensing (CCS) will build a set of interpretable semantic features using sparse cognitive representations agreed with a group of cooperating radiologists. Combining these two approaches will achieve high effectiveness of the constructed models, combined with high transparency, clarity and stability of the solution. The DeCoviD project is a part of a broader strategy of competence development in the area of deep learning + XAI + medical applications at the Warsaw University of Technology. More information: https://github.com/MI2DataLab/DeCoviD. Work on this project is financially supported by the IDUB against COVID PW. DALEX 2018-2022 DALEX: Descriptive and model Agnostic Local EXplanations Research project objectives. Black boxes are complex machine learning models, for example deep neural network, an ensemble of trees of high-dimensional regression model. They are commonly used due to they high performance. But how to understand the structure of a black-box, a model in which decision rules are too cryptic for humans? The aim of the project is to create a methodology for such exploration. To address this issue we will develop methods, that: (1) identify key variables that mostly determine a model response, (2) explain a single model response in a compact visual way through local approximations, (3) enrich model diagnostic plots. Research project methodology. This project is divided into three subprojects - local approximations od complex models (called LIVE), explanations of particular model predictions (called EXPLAIN) and conditional explanations (called CONDA). Expected impact on the development of science. Explanations of black boxes have fundamental implications for the field of predictive and statistical modelling. The advent of big data forces imposes usage of black boxes that are easily able to overperform classical methods. But the high performance itself does not imply that the model is appropriate. Thus, especially in applications to personalized medicine or some regulated fields, one should scrutinize decision rules incorporated in the model. New methods and tools for exploration of black-box models are useful for quick identification of problems with the model structure and increase the interpretability of a black-box Work on this project is financially supported from the OPUS grant 2017/27/B/ST6/01307 funded by Polish National Science Centre (NCN). MLGenSig 2017-2021 MLGenSig: Machine Learning Methods for building of Integrated Genetic Signatures Research project objectives. The main scientific goal of this project is to develop a methodology for integrated genetic signatures based on data from divergent high-throughput techniques used in molecular biology. Integrated signatures base on ensembles of signatures for RNA-seq, DNA-seq, data as well for methylation profiles and protein expression microarrays. The advent of high throughput methods allows to measure dozens of thousands or even millions features on different levels like DNA / RNA / protein. And nowadays in many large scale studies scientists use data from mRNA seq to assess the state of transcriptome, protein microarrays to asses the state of proteome and DNA-seq / bisulfide methylation to assess genome / methylome. Research methodology. Genetic signatures are widely used in different applications, among others: for assessing genes that differentiate cells that are chemo resistant vs. cells that are not, assess the stage of cell pluripotency, define molecular cancer subtypes. For example, in database Molecular Signatures Database v5.0 one can find thousands of gene sets - genetic signatures for various conditions. There are signatures that characterize some cancer cells, pluripotent cells and other groups. But they usually contain relatively small number of genes (around 100), results with them are hard to replicate and they are collection of features that were found significant when independently tested. In most cases signatures are derived from measurements of the same type. Like signatures based of expression of transcripts based on data from microarrays or RNA-seq, or methylation profile or DNA variation. We are proposing a very different approach. First we are going to use machine-learning techniques to create large collections of signatures. Such signatures base on ensembles of small sub-signatures, are more robust and usually have higher precision. Then out of such signatures we are going to develop methodology for meta-signatures, that integrate information from different types of data (transcriptome, proteome, genome). Great examples of such studies are: Progenitor Cell Biology Consortium (PCBC) and The Cancer Genome Atlas (TCGA) studies. For thousands of patients in different cohorts (for PCBC cohorts based on stemness phenotype, for TCGA based on cancer type) measurements of both mRNA, miRNA, DNA and methylation profiles are available. New, large datasets require new methods that take into account high and dense structure of dependencies between features. The task that we are going to solve is to develop methodology that will create genetic signatures that integrate information from different levels of cell functioning. Then we are going to use data from TCGA and PBCB project to assess the quality of proposed methodology. As a baseline we are going to use following methodologies: DESeq, edgeR (for mRNA), casper (for lternative splicing), MethylKit (for RRBS data) and RPPanalyzer for protein arrays. Here is the skeleton for our approach: (1) Use ensembles in order to building a genetic signature. The first step would be to use random forests to train a new signature. Ensembles of sub-signtures are build on bootstrap subsamples and they votes if given sample fit given signature or not. (2) In order to improve signatures we are going to consider various normalization of raw counts. We start with log and rank transformation. (3) In order to improve the process of training an ensemble we are going to use pre-filtering of genes. (4) Another approach is to use Bayesian based methods, that may incorporate the expert knowledge, like belief-based gaussian modelling Research project impact. Genetic profiling is more and more important and has number of application starting from basic classification up to personalized medicine in which patients are profiled against different signatures. Existing tools for genetic signatures have many citations. This we assume that the methodology for integrated genetic profiling will be a very useful for many research groups. It is hard to overestimate the impact of better genetic profiling on medicine. Moreover we build a team of people with knowledge in cancer genetic profiling Work on this project is financially supported from the OPUS grant 2016/21/B/ST6/02176 funded by Polish National Science Centre (NCN). "],["mi²solutions.html", "MI²Solutions", " MI²Solutions Hire a team of experienced researchers. The blue team will help you develop good predictive models, create a responsible solution tailored to your needs. The red team will help you find and analyse any weaknesses in your predictive models. It will help you confront them with domain knowledge and make sure they are resilient to future changes in the data. If you need tailor-made solutions for your individual needs, we are happy to help you too. Contact us, we can develop software for you, deploy it, provide training, discuss your needs, verify the quality of your existing solutions. Below you will find a sample offer for trainings or deployments. Research as a service Our team has experience not only in groundbreaking research, but also in deploying these research into business. There are many ways we can help, for example help in delivery of champion-challenger evaluations in which we look for potential to increase the effectiveness of predictive models in your company. take care of the whole life cycle of the predictive models, from reproducibility of results to constant monitoring and continuous improvement of the model. audit models and analyse the sensitivity and vulnerability of the model to incorrect or unexpected behaviours. We would be happy to discuss how we could help with your organisation! Trainings Based on our experience in the area of Responsible Machine Learning, developed a unique two-day hands-on training. Jump into the topic of eXplainable Artificial Intelligence with our trainers. Responsible Machine Learning images/training_xai The training is conducted once a month in small groups online. Small groups encourage questions and the interactions within the group. Language Depending on the group’s preference, the hands-on part can be carried out in R (mlr3 + DALEX) or Python (scikit-learn + dalex). The methodology part does not depend on the language. Book your training To book a training please contact with trainings(at)solutions42.ai. "],["mi²education.html", "MI²Education", " MI²Education The demand for predictive modelling skills is growing at a furious rate. Part of our mission is to develop human capital so that predictive modelling is applied responsibly and safely. We take social responsibility seriously and as part of our activities we support the development of data analysis skills among pupils, students and senior professionals alike. "],["teaching.html", "Teaching", " Teaching Programming R 22/23 Summer Programming and data analysis advanced in R lectures, labs, projects - Anna Kozak Exploratory Data Analysis 22/23 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Hubert Ruczyński, Bartłomiej Sobieski Data Visualization 22/23 Winter Data Visualization Techniques for Data Science studies lectures, labs, projects - Anna Kozak labs, projects - Mateusz Krzyziński, Hubert Ruczyński, Mikołaj Spytek Exploratory Data Analysis 21/22 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Katatzyna Woźnica Interpretable Machine Learning 21/22 Summer Interpretable Machine Learning lectures, projects - Przemysław Biecek Case Studies 21/22 Summer Case Studies for Data Science studies lectures - Weronika Hryniewska ML-1 - labs, projects - Anna Kozak ML-2 - labs, projects - Bartłomiej Eljasiak XAI-tabular - labs, projects - Mustafa Cavus AutoML - labs, projects - Katarzyna Woźnica XIC - labs, projects - Hubert Baniecki TL - labs, projects - Paulina Tomaszewska Data - labs, projects - Weronika Hryniewska NLP - labs, projects - Stanisław Giziński Data Visualization 21/22 Winter Data Visualization Techniques for Data Science studies lectures, labs, projects - Anna Kozak labs, projects - Hubert Baniecki Exploratory Data Analysis 20/21 Summer Introduction to exploratory data analysis for Mathematics and data analysis studies lectures, labs, projects - Anna Kozak labs - Krzysztof Spaliński Case Studies 20/21 Summer Case Studies for Data Science studies lectures - Katarzyna Woźnica XAI1 - labs, projects - Anna Kozak XAI2 - labs, projects - Szymon Maksymiuk DL1 - labs, projects - Weronika Hryniewska DL2 - labs, projects - Paulina Tomaszewska ML - labs, projects - Hubert Baniecki RashomonML - labs, projects - Katarzyna Woźnica Interpretable Machine Learning 20/21 Summer Interpretable Machine Learning for Data Science studies XAI stories 2 lectures, projects - Przemysław Biecek Data Visualization 20/21 Winter Data Visualization Techniques for Data Science studies lectures, labs - Alicja Gosiewska projects - Hubert Baniecki Case Studies 19/20 Summer Case Studies for Data Science studies lectures - Alicja Gosiewska Imputation - labs, projects - Katarzyna Woźnica Reproducibility of scientific papers - labs, projects - Alicja Gosiewska Interpretability - labs, projects - Katarzyna Kobylińska Interpretable Machine Learning 19/20 Summer Interpretable Machine Learning for Data Science studies lectures, projects - Przemysław Biecek Data Visualization 19/20 Summer Data Visualization for Data Science studies lectures, labs, projects - Michał Burdukiewicz "],["beta-bit.html", "Beta Bit", " Beta Bit Chaos Game EN: Are you curious about fractals? The Chaos Game is the book for you. You will learn the mathematical basis behind these figures, find out what algorithm can be used to code them, write code in your favourite programming language (Python, R, Julia?) and also explore the bibliographies of three mathematicians associated with the development of mathematics around these shapes. This is the next book in the Beta Bit series for anyone interested in computational mathematics and data analysis. PL: Jesteś ciekawy czym są fraktale? Gra w Chaos to książka dla Ciebie. Poznasz matematyczne podstawy tych figur, dowiesz się, jaki algorytm można wykorzystać do ich zaprogramowania, napiszesz kod w swoim ulubionym języku programowania (Python, R, Julia?), a także poznasz bibliografie trzech matematyków związanych z rozwojem matematyki wokół tych kształtów. To kolejna książka z serii Beta Bit dla wszystkich zainteresowanych matematyką obliczeniową i analizą danych. Flipbook online [ENG] Flipbook online [POL] Wykresy od kuchni PL: Jak tworzyć dobre wykresy? Dobre, czyli takie, które z przyjemnością się ogląda, z których można wyciągnąć wiele informacji, które są zrozumiałe dla szerokiego odbiorcy, a jednocześnie docenią je smakosze. Na bazie doświadczeń z prowadzenia tych zajęć powstały Wykresy od kuchni. To zbiór krótkich wykładów omawiających różne wątki przydatne w lepszym zrozumieniu tego, jak działa komunikacja z użyciem wykresów statystycznych. Na kolejnych stronach pojawi się wiele analogii do przyrządzania posiłków, ponieważ zarówno w kuchni, jak i w przygotowaniu wykresów statystycznych potrzebna jest praktyka, znajomość pewnych fundamentalnych prawideł, garść sprawdzonych przepisów i dużo zapału do eksperymentowania. Będąc tak uzbrojonym, każdy adept sztuki kulinarnej jest skazany na sukces. Flipbook online [POL] The Hitchhiker’s Guide to Responsible Machine Learning EN: A one-of-a-kind 52-page story about responsible machine learning. Beta and Bit use decision trees, random forests, and AutoML tools to build a risk model after a covid infection, and then use explainable artificial intelligence tools to analyze the behavior of that model. The description of the data analysis process is intertwined with descriptions of ML tools and code snippets. All examples are fully reproducible! PL: Jedyna w swoim rodzaju 52-stronicowa opowieść o odpowiedzialnym uczeniu maszynowym. Beta i Bit używają drzew decyzyjnych, lasów losowych i narzędzi AutoML do budowy modelu ryzyka po zakażeniu covid, a następnie używają narzędzi wyjaśnialnej sztucznej inteligencji by przeanalizować działanie tego modelu. Opis procesu analizy danych przeplata się na opisem kolejnych narzędzi i przykładami kodu. Wszystkie wyniki są całkowicie odtwarzalne! Flipbook online Przemysław Biecek, Anna Kozak, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 W pogoni za nieskończonością. Szeregi EN: What does hiking in the mountains have to do with the convergence of series? Quite a lot! We start with the paradoxes related to infinity, but step by step we learn the techniques of geometric series. In this book, the conditions for convergence are explained, together with numerous examples. The comic ends with a collection of exercises with different levels of difficulty. PL: Co wspólnego ma chodzenie po górach ze zbieżnością szeregów? Otóż całkiem sporo! Zaczynamy od paradoksów związanych z nieskończonością, ale krok po kroku poznajemy techniki szeregów geometrycznych. W tej pozycji wyjaśnione są warunki zbieżności wraz z licznymi przykładami. Komiks kończy zbiór zadań o różnych poziomach trudności. Flipbook online Przemysław Biecek, Łukasz Maciejewski, Aleksander Zawada Fundacja Naukowa SmarterPoland.pl. 2022 Przewodnik po pakiecie R EN: The Guide to the R package was the first published Polish book focused on the R language. The current fourth edition consists of four parts: Basics of using R (+tidyverse, shiny, knitr and other goodies), Programming in R (object-oriented, package development, class system), Statistics with R (statistical tests, models, exploration techniques) and Visualization with R (graphics, lattice and ggplot2 packages). PL: Przewodnik po pakiecie R był pierwszą wydaną polskojęzyczną książką poświęconą językowi R. Aktualne czwarte wydanie składa się z czterech części: Podstaw posługiwania się językiem R (+tidyverse, shiny, knitr i inne smaczki), Programowanie w R (obiektowe, tworzenie pakietów, system klas), Statystyka z R (testy statystyczne, modele, techniki eksploracji) i Wizualizacja z R (pakiety graphics, lattice i ggplot2). Wersja online, Książka w księgarnii. Przemysław Biecek Wydawnictwo GiS. 2008-2021 Analiza danych z programem R EN: An academic textbook describing estimation and testing topics for linear models with fixed effects, random effects and mixed effects. The theoretical introduction is complemented by numerous examples for one-way and multivariate ANOVA, one and multiple random components. The examples focus on biological and medical applications and are based on real analyses of real data. PL: Podręcznik akademicki opisujący zagadnienia estymacji i testowania dla modeli liniowych z efektami stałymi, losowymi i mieszanymi. Wprowadzenie teoretyczne jest uzupełnione o liczne przykłady dla jednokierunkowej i wielokierunkowej ANOVA, jednym i wieloma komponentami losowymi. Przykłady dotyczą głównie zastosowań biologicznych i medycznych i bazują na prawdziwych analizach rzeczywistych danych. Książka w księgarnii. Przemysław Biecek Wydawnictwo Naukowe PWN 2013-2018 Eseje o sztuce wizualizacji danych EN: Discover! Reveal! Explain! These three roles can be fulfilled by good statistical graphics. Good means understandable, faithful to the data, aesthetic. How to create such graphics? A collection of essays on the art of displaying data systematises knowledge useful in designing and producing good data visualisations. It is not easy. On the one hand, we can fall into the trap of a colourful mush full of numbers, which is sometimes proudly called infographics. On the other hand, we can fall into the trap of graphics that perfectly reproduce the complexity of numbers, and thus completely incomprehensible. Somewhere in the middle is a graphic that explains, that informs, that is aesthetically pleasing and informative. PL: Odkrywać! Ujawniać! Objaśniać! Te trzy role może spełniać dobra grafika statystyczna. Dobra czyli zrozumiała, wierna danym, estetyczna. Jak tworzyć taką grafikę? Zbiór esejów o sztuce pokazywania danych systematyzuje wiedzę przydatną do projektowania i wykonania dobrej wizualizacji danych. Nie jest to proste. Z jednej strony możemy wpaść w pułapkę pstrokatej papki najeżonej liczbami, którą czasem dumnie nazywa się infografiką. Z drugiej strony wpaść można w pułapkę grafiki idealnie odwzorowującej złożoność liczb a przez to zupełnie niezrozumiałej. Gdzieś po środku jest grafika, która wyjaśnia, która informuje, która jest estetyczna i informatywna. Książka online, Książka w księgarnii. Przemysław Biecek Wydawnictwo SmarterPoland 2008-2021 Pogromcy Danych EN: Data Crunchers is the first MOOC (Massive Open Online Course) developed in Polish for data scientists. Two modules were developed in 2015: the first one is an introduction to R, with loading data, overview of syntax, basic data types, descriptive statistics and pipelined processing. The second module is dedicated to data visualisation and statistical modelling. More than 8,000 people have registered on the Data Crunchers platform. PL: Pogromcy Danych to pierwszy MOOC (Massive Open Online Course) opracowany w języku polskim do analizy danych. W roku 2015 powstały dwa moduły: pierwszy jest wprowadzeniem do programu R, przez wczytywanie danych, omówienie składni, podstawowych typów danych, statystyk opisowych oraz przetwarzania potokowego. Drugi moduł poświęcony jest wizualizacji danych oraz modelowaniu statystycznemu. W platformie Pogromców Danych zarejestrowało się ponad 8000 osób. Przetwarzanie danych w programie R, Wizualizacja i modelowanie, Strona WWW. Przemysław Biecek ICM UW. 2015 Wykresy unplugged EN: Can you create clear charts without any electricity? An illustrated collection of exercises showing eight of the most popular ways to visualise data, with do-it-yourself challenges. Grab your crayons and start creating fantastic charts. PL: Czy można tworzyć czytelne wykresy bez użycia prądu? Ilustrowany zbiór ćwiczeń przedstawiających osiem najpopularniejszych sposobów wizualizacji danych, wraz z zadaniami do samodzielnego wykonania. Weź kredki i zacznij tworzyć fantastyczne wykresy. Flipbook online, Komiks w księgarnii. Przemysław Biecek, Ewa Baranowska, Piotr Sobczyk Fundacja Naukowa SmarterPoland.pl. 2018 W pogoni za nieskończonością EN: Two mathematicians share stories about infinity. In the first Beta attends a lecture on the properties of prime numbers. In the second, Bit breaks into the Palace of Culture and Science. How should we talk about mathematics? PL: Dwójka matematyków wymienia się opowiadaniami o nieskończoności. W pierwszym Beta bierze udział w wykładzie o właściwościach liczb pierwszych. W drugim Bit włamuje się do Pałacu Kultury i Nauki. Jak opowiadać o matematyce? Flipbook online, Komiks w księgarnii. Przemysław Biecek, Łukasz Maciejewski, Tomasz Samojlik, Sebastian Szpakowski Fundacja Naukowa SmarterPoland.pl. 2018 Jak długo żyją Muffinki? EN: A collection of three stories for children showing statistical relationships in the world around us. Beautifully illustrated stories about the distribution of height according to age, the life span of dogs or measuring the weight of trees. PL: Zbiór trzech opowiadań dla dzieci pokazującym zależności statystyczne w świecie wokół nas. Pięknie ilustrowane opowiadania o rozkładzie wzrostu w zależności od wieku, czasie życia psów czy pomiarze wagi drzew. Online: Jak szybko urosnę, Jak długo żyją Muffinki. Przemysław Biecek Fundacja Naukowa SmarterPoland.pl.2016 Pieczara Pietraszki EN: How linear regression can help in getting home, and why it’s not worth hacking into a mad mathematician’s office. A short story describing the adventures of two teenagers Beta and Bit moving around historic Warsaw. PL: W jaki sposób regresja liniowa może pomóc w powrocie do domu, oraz dlaczego nie warto włamywać się do pokoju szalonego matematyka? Lekkie opowiadanie opisujące przygody dwójki nastolatków Bety i Bita w historycznej Warszawie. Online: W jezyku Polskim, In English, По-Русски. Magda Chudzian, Przemysław Biecek Fundacja Naukowa SmarterPoland.pl. 2015 How to weight a dog with a ruler? EN: Workshop materials for children aged 8-10. Kids measure different parameters of their body, such as arm span or height. Then they create a graph summarizing the collected data and look for relations between the measured features. It just so happens that parts of the human body are proportional to each other and you can use a ruler to find this relationship. Part of the StatTub project. PL: Materiały do warsztaty dla dzieci w wieku 8-10. Dzieci mierzą różne parametry swojego ciała, takie jak rozpiętość ramion lub wzrost. Następnie tworzą wykres podsumowujący zebrane dane i szukają zależności pomiędzy zmierzonymi cechami. Tak się składa, że części ciała ludzkiego są do siebie proporcjonalne i można z użyciem linijki znaleźć tę relację. Część projektu StatTuba. Online: English, Polish, Chinese, Simplified Chinese, Czech, German, Spanish, Spanish (Latin America), French, Dutch, Vietnamese. Przemysław Biecek, Klaudia Korniluk Fundacja Naukowa SmarterPoland.pl.2016-2021 "],["responsibleml-blog.html", "ResponsibleML Blog", " ResponsibleML Blog Read more about the research, solutions and education on our blog: Tools for Explainable, Fair and Responsible ML BASIC XAI with DALEX— Part 1: Introduction Anna Kozak In this post, we will take a closer look at some algorithms used in explainable artificial intelligence. You will find here an introduction to methods of global and local model evaluation. Each description will include a technical introduction, example analysis, and code in R and Python. R packages for eXplainable Artificial Intelligence Przemysław Biecek We have prepared an overview of the most popular R-packages, which can be used to build interpretable models or to explore complex ones. Examples of knitr notebooks for more than 30 packages are available at http://xai-tools.drwhy.ai/. Adversarial attacks on Explainable AI Hubert Baniecki There are various adversarial attacks on machine learning models; hence, ways of defending, e.g. by using Explainable AI methods. Nowadays, attacks on model explanations come to light, so does the defense to such adversary. Here, we introduce fundamental concepts related to the domain. A further reference list is available at https://github.com/hbaniecki/adversarial-explainable-ai. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/the-team.html b/docs/the-team.html index 1da4a10..8dd8db1 100644 --- a/docs/the-team.html +++ b/docs/the-team.html @@ -334,7 +334,7 @@

Mateusz GrzybGitHub: mgrzyb99

-

Weronika Hryniewska

+

Weronika Hryniewska-Guzik

PhD candidate in computer science at Warsaw University of Technology. Interested in deep learning modelling on medical images in the context of explainability and responsible AI.

Google Scholar: aJeg3IQAAAAJ

@@ -480,8 +480,8 @@

Hoang Thien Ly

Piotr Wilczyński

-

BSc student in Data Science at Warsaw University of Technology. Interested in ontologies, semantic similarity, hyperparameter optimization and NLP.

-

GitHub: wi1lku

+

BSc student in Data Science at Warsaw University of Technology. Interested in Large Language Models, AI Deception and Natural Language Processing. Currently working on my thesis, which applies Computer Vision to medicine. +GitHub: wi1lku

LinkedIn: Piotr-Wilczyński