Arrhythmia Detection and Classification

This project was done for The University of Washington's Professional Master's Program AI and Healthcare course taught by Dr. Karthik Mohan, and it was a partner project.

Introduction

Data preprocessing, data generation, and model training and comparison, including a SOTA CNN model, for Heartbeat classification, specifically class 'A' for Arrythmia. A Test set F1 of 0.964 was achieved using the SOTA CNN with 1 feature, MLII readings.

Dataset

The Dataset can be found here in the mitbih_databse directory. The data folder contains 44 csv files with corresponding txt files to annotate the csv files. Using the annotated R-peaks from the txt files, the csv files could be broken down into individual heartbeats for each patient. However, only 42 of the patients had the 'MLII' ECG reading so we opted to use just that one feature for the majority of the work.

The data was preprocessed into a dataframe of 98,312 x 360. Here the 98,132 refers to the total number of heartbeats we extracted from the 42 patients, and 360 represents the MLII values for each heartbeat. Heartbeats are created by taking 180 values left of the R-peak and 179 values right of the R-peak, this created a 360-d vector which represented one ECG sensor reading for one heartbeat. This database was collected using a two-channel ambulatory ECG between 1975 and 1979. These R-peaks have been hand annotated by cardiologists after digitization.

To ensure that each patient was normalized with respect to their own heartbeat ECG data, we normalized by patient before concatenating the heartbeats of a single patient to the larger dataframe with all the patients.

Examples of Heartbeats from each class parsed during preprocessing:

Data Imbalance

Since a normal heartbeat is significantly more common to read when taking the ECG of a patient, this created a large dataset imbalance as shown below. Notice there are 6 classes, 'N' representing normal heartbeat.

Autoencoders for Data Generation

To overcome the massive data imbalance presented in the dataset, my partner and I tried a basic autoencoder and a variational autoencoder, and found very comparable results between the two.

Using the Autoencoder, we boosted the samples in the 5 classes that were low. Still, each class had about 3/5 the samples of the N class.

Using the Variational Autoencoder, we also boosted the low classes, but this time leveled the classes out with the N class.

Shown below are the tables for the metrics used to compare each model, as you can see the basic autoencoder actually produces better end results than the variational autoencoder did.

Denoising

We also implemented a feature for data denoising, enabling us to clean each signal so that the deep learning algorithms focus on the big picture of the data fluctuations and not the small jitters that don't contribute to type of heartbeat.

Models

Random Forest:

depth: 20
estimators: 25
min_samples_split: 2

Feed Forward Neural Network:

API: Keras Sequenial Model
Layer Count: 6
Activation: ReLU
Dropout: 0.6
Loss: Categorical Cross Entropy
Optimizer: Adam
Target: Label Binarizer of the 6 classes N,L,R,A,V,U

We used the CNN architecture proposed in the paper “X. Xu and H. Liu, "ECG Heartbeat Classification Using Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 8614-8619, 2020, doi: 10.1109/ACCESS.2020.2964749”.

API: PyTorch
1D-Convolutional Layers: 4
Pooling Layers: 2
Linear Layers: 3
Loss: Cross Entropy
Optimizer: Adam

Training

With the above specs for the NN and CNN, we completed training for the NN and CNN for 9 and 8 epochs respectfully. The CNN had much smoother descent for accuracy and loss curves and proved better to use than the NN.

Results

To measure the correctness of the model we took a look at the metrics by class: precision, recall and f1. With upsampling we were able to achieve a much higher accuracy for class 'A', Arrythmic heartbeat, but still significantly worse than other classes. The CNN paper claims 99.4% accuracy where we achieved 97% accuracy.

After creating a new dataframe with only patients that have both MLII and V1 readings, we were able to achieve a higher f1 score with the CNN than before using only one feature, namely MLII.

The code

Please check out the notebook for more intermediary results and full preprocessing, model creation, training and analysis functions / cells. Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
mitbih_database		mitbih_database
Anomalous_Heartbeat_Classification.ipynb		Anomalous_Heartbeat_Classification.ipynb
Mini Project 1 - Report.pdf		Mini Project 1 - Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arrhythmia Detection and Classification

Introduction

Dataset

Data Imbalance

Autoencoders for Data Generation

Denoising

Models

Training

Results

The code

About

Releases

Packages

Languages

bl-downey/Anomalous_Heartbeat_Detection

Folders and files

Latest commit

History

Repository files navigation

Arrhythmia Detection and Classification

Introduction

Dataset

Data Imbalance

Autoencoders for Data Generation

Denoising

Models

Training

Results

The code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages