An open source benchmark for semi-supervised change point detection of type 1 diabetic meals from continuous glucose monitor time series data. Originally created to present to PyData Global 2024 in association with sktime and skchange.
NOTE: The benchmark is still under heavy development and is subject to change. Consider its current state as alpha v.0.0.1
- 0.0.1 -> Patch/bug fixes
- 0.1.0 -> Minor data set updates where we add new patients, cgms, insulin pumps, or meal timing regimens.
- 1.0.0 -> Major data set updates where we add new modeling tasks, like new transfer learning settings.
property | value |
---|---|
name | T1D Semi-Supervised Change Point Detection Benchmark |
url | https://github.com/Blood-Glucose-Control/t1d-change-point-detection-benchmark |
sameAs | https://github.com/Blood-Glucose-Control/t1d-change-point-detection-benchmark |
description | |
citation | |
license |
This repository contains three main data directories: raw
, processed
, and obfuscated
, each serving different purposes in the data pipeline.
Contains data generated directly from simglucose
simulator.
Characteristics:
- Duration per patient: 90 days
- 30 patients (10 adults, 10 children and 10 adolescents)
- Source: Jinyu Xie. Simglucose v0.2.1 (2018)
- Reference: https://github.com/jxx123/simglucose
Contains processed data derived from data/raw
.
Pattern: {patientNum}_{cgmName}_{insulinPumpName}_{startDate}_{endDate}.csv
Example: ado001_Dexcom_Cozmo_2024-02-01_2024-04-30
Component | Description | Example |
---|---|---|
patientNum | Concatenation of first 3 and last 3 characters from patient name | ado001 (adolescent#001) |
cgmName | CGM device name | Dexcom |
insulinPumpName | Insulin pump device name | Cozmo |
startDate | First day of generated data | 2024-02-01 |
endDate | Last day of generated data | 2024-04-30 |
Contains data obfuscated from data/processed
to simulate human behavior.
Pattern: {patientNum}_{cgmName}_{insulinPumpName}_{startDate}_{endDate}_{loggingBehaviour}_{loggingTiming}.csv
Example: ado001_Dexcom_Cozmo_2024-02-01_2024-04-30_all_normal.csv
Component | Description | Example |
---|---|---|
patientNum | Concatenation of first 3 and last 3 characters from patient name | ado001 (adolescent#001) |
cgmName | CGM device name | Dexcom |
insulinPumpName | Insulin pump device name | Cozmo |
startDate | First day of generated data | 2024-02-01 |
endDate | Last day of generated data | 2024-04-30 |
Filename Indicator | Type | Description | Distribution |
---|---|---|---|
all | All meals | Logs every meal | 20% |
top2 | Multiple meals per day | Logs 1-2 largest meals (on average 1.8 logs/day) | 25% |
once | Once per day | Logs largest meal only | 20% |
weekly | A few times per week | Irregular logging (on average 3 logs/week) | 20% |
none | Never | No logging | 15% |
Note: Distribution percentages are subject to change
Filename Indicator | Pattern | Description | Distribution |
---|---|---|---|
late | Left skewed | Forgetful loggers (gamma distribution) | 38% |
early | Right skewed | Hasty loggers (gamma distribution) | 23% |
average | Normal Distribution | Centered around meal start time | 28% |
punctual | Unchanged | Logs exactly at meal start | 11% |
Note: Distribution percentages are subject to change
Note: Parameters for gamma/distribution are subject to change
Each graph contains 50 randomly generated curves
A benchmark dataset for evaluating change point detection algorithms on Type 1 Diabetes data.
You can install the package using pip:
pip install t1d-cpd-benchmark
The package provides three main functions to load different types of data:
load_raw_data()
: Load raw CGM dataload_processed_data()
: Load processed CGM dataload_obfuscated_data()
: Load obfuscated CGM data
Each function can load either a single patient's data by index or all patients' data.
from t1d_cpd_benchmark.datasets import load_raw_data, load_processed_data, load_obfuscated_data
# Load first patient's raw data
# index for load_raw_data can be 0 to 29
data = load_raw_data(index=0)
print(data.head())
# Load first patient's processed data
# index for load_processed_data can be 0 to 179
data = load_processed_data(index=0)
print(data.head())
# Load first patient's obfuscated data
# index for load_obfuscated_data can be 0 to 59
data = load_obfuscated_data(index=0)
print(data.head())
# Load all patients' data
# Returns concatenated DataFrame of all patients
all_data = load_raw_data()
- Raw Data: Original CGM measurements
- Processed Data: Cleaned and preprocessed CGM data
- Obfuscated Data: Anonymized CGM data
This project is licensed under the MIT License - see the LICENSE file for details.