This class covers the following topics: Introduction to smart healthcare; health decision support system; wearable medical sensors and deep neural network based disease detection; continual learning based multi-headed neural networks for multi-disease detection; interpretability through differentiable logic networks; interpretability through conformal predictions; medical images and convolutional neural network based disease detection; natural language processing for healthcare; foundation models for healthcare; counterfactual reasoning based personalized medical decision-making.
All assignments are built on papers you will read for the class. Each of these coding tasks exemplifies the paper discussed this week, and reproduces a simplified version of the smart healthcare framework in this paper. The assignments introduce you to the latest advances in Machine Learning (broadly) and especially Deep Learning.
Useful reference to ML concepts: ML Glossary from Google
- Instructor: Niraj K. Jha
- TAs: Margarita Belova, Jiaxin Xiao
- Lectures: M/W 11:00-12:20pm
- Office hours:
- Margarita Belova: Tu: 3-4pm, Th: 10-11am (EQuad B321)
- Jiaxin Xiao: Mo: 10-11am, Wed 2-3pm (EQuad B321)
Each assignment is worth 20 points.
The data for assignments can be accessed on shared disk at Google Drive
Assignment 1: ML classifiers
For this project, you will train and compare various classifiers (decision tree, k-nearest neighbor, Naive Bayes, and logistic regression) to determine whether a patient has breast cancer. We will use the Diagnostic Wisconsin Breast Cancer dataset from the UCI machine learning repository (see details here). The purpose of this first assignment is to recall the basic ML classifiers, Python data science libraries (Numpy, Pandas, Sklearn), and ML concepts (training-validation-test splits, training accuracy, validation accuracy), that we will draw upon in further assignments.
Assignment 2: SCANN
For this project, you will train compact neural networks with a mutating architecture that adapts to data during the training process. You will work with an arrhythmia dataset.
Your goal is to implement SCANN Scheme A. This is a constructive approach. We start with a seed architecture that has a small number of hidden neurons. Using an iterative process, we apply connection growth and neuron growth to grow the network size. We have a maximum number of neurons that we cannot exceed (this is one of the hyperparameters).
Assignment 3: TUTOR
For this project, you will implement the TUTOR framework to train accurate deep neural networks with limited data and optimized computational resources. The workflow includes generating synthetic data using KDE and GMM methods, validating it with semantic integrity classifiers, labeling synthetic data with a random forest classifier, and training baseline models on real data. You will then apply Scheme A: pretraining on synthetic data followed by final training on real data, to demonstrate the benefits of this approach.
Assignment 4: CovidDeep
For this project, you will train deep neural networks on data obtained from COVID-19 patients to predict if the patient is healthy or has COVID-19. This includes three steps: (a) Generate synthetic data using the TUTOR methodology. (b) Use synthetic data to pre-train a neural network, then finish training on real data. (c) Apply the SCANN "grow-and-prune" neural network synthesis paradigm to compress the original network.
After completing grow-and-prune synthesis, you should obtain a compact neural network that has better accuracy than before using this step.
Assignment 5: SoDA
For this project, you will implement a framework for stress detection based on physiological signals. The experimental data are collected from 32 individuals using wearable medical sensors. The dataset features are ECG, Galvanic Skin Response (GSR), Respiration, Blood Oximeter, and Blood Pressure. To reduce the data size, you will employ Principal Component Analysis, which is a versatile unsupervised dimensionality reduction technique. With the reduced dataset, you will train and evaluate k-NN and SVM models with radial basis functions for binary stress classification.
Assignment 6: DOCTOR
DOCTOR is a framework that enables continual learning for disease detection based on wearable medical sensor data.
In this exercise, you will recreate some of the experiments described in the DOCTOR paper. You will perform domain-, class-, and task-incremental learning using a multilayer perceptron (MLP) model that detects diabetes and mental health disorders using replay-based continual learning methods.
BSD-3-Clause. Copyright (c) 2025, JHA-Lab. All rights reserved.
See License file for more details.