Model Development - Data Generation Script #90

RobotPsychologist · 2024-10-26T22:58:11Z

This issue captures both: #108 #109

See the README for a better understanding of where files should go.

This ticket relates to #91 - Data Cleaning Script and #96 - Transformations Script.

The dataset_generator.py script should specify which settings to use for the dataset creation. It should generally create the dataset stored in data/interim. It calls all specified data wrangling, processing, and cleaning utilities that should happen outside of sktime's API.

The data stored in data/interim is then used by the data_transformations.py script functions to apply time series machine learning-specific transformations to the data, which is the final data processing stage before modelling. The data transformations should be stored in data/processed. For more information on sktime transformers, see:

dataset_processing.py

Purpose: Handles data loading, saving, and file naming.

Location: 0_meal_identification/meal_identification/meal_identification/datasets/dataset_processing.py

Functions:

get_root_dir: finds the root directory of the project.
load_data: a general data loading utility function that can load data from any data directory.
save_data: a general data saving utility function that can store data in either data/interim or data/processed.
dataset_labeler: an auto-labeller that takes in the configurations from the data processing, cleaning, and generation

to create a labelled dataset that should give the user a good understanding of how the dataset was generated.

dataset_cleaning.py

Purpose: Focuses on cleaning and preprocessing utilities for the dataset, such as handling overlaps and selecting top meals.

Location: 0_meal_identification/meal_identification/meal_identification/datasets/dataset_cleaning.py

Functions:

coerce_time: - a function that allows for time coercion/resampling;
- it should be designed to allow for various resampling techniques, not just the original one I developed,
- likely compute-intensive (bottleneck), so it's important to try to optimize this one as much as possible.
erase_meal_overlap - a function that erases meal overlaps,
- Often, multiple 'ANNOUCE_MEALS' will occur in quick succession, but for our modelling task, we want to combine those into the initial meal start time.
  - This is because it characterizes a period with high BGL variability.
- This is another potential high compute bottleneck.
keep_top_n_carb_meals - for our modelling task, we will want to assess model performance on a different number of top carb meal settings, typically 2 or 3 meals per day we wish to be identified.

dataset_generator.py

Purpose: Handles only the dataset creation process by leveraging functions from both dataset_processing.py and dataset_cleaning.py, it should generally be writing the pre-transform dataset into data/interim

Location: 0_meal_identification/meal_identification/meal_identification/datasets/dataset_generator.py

Functions:

create_dataset

plots.py

Purpose: contains a variety of plotting functions that we will frequently reuse for various tasks, usually related to assessing model performance.

Location: 0_meal_identification/meal_identification/meal_identification/plots.py

plot_announce_meal_histogram

The text was updated successfully, but these errors were encountered:

andytubeee · 2024-11-06T23:29:05Z

.

Adding the dataset_processing.py file for #90.

Adding the dataset_cleaning.py script specified in #90

RobotPsychologist · 2024-11-08T00:06:32Z

@andytubeee @Tony911029

Hopefully, this is enough to get you started.

RobotPsychologist · 2024-11-15T16:34:29Z

Closing this now because I think all the requirements have been fulfilled. New changes to the data generation script will either be enhancements or bug fixes.

RobotPsychologist added the modeldev Developing modeling pipelines for meal annotation task. label Oct 26, 2024

RobotPsychologist moved this to Backlog in @RobotPsychologist's Automatic Meal Detection from Blood Glucose CGM Reading Oct 26, 2024

RobotPsychologist added this to @RobotPsychologist's Automatic Meal Detection from Blood Glucose CGM Reading Oct 26, 2024

RobotPsychologist added this to the Modeling Pipeline Completion milestone Oct 30, 2024

RobotPsychologist self-assigned this Nov 6, 2024

RobotPsychologist assigned andytubeee and Tony911029 Nov 6, 2024

RobotPsychologist added a commit that referenced this issue Nov 7, 2024

Create dataset_processing.py

d215a41

Adding the dataset_processing.py file for #90.

RobotPsychologist mentioned this issue Nov 7, 2024

Create dataset_processing.py #118

Merged

RobotPsychologist added a commit that referenced this issue Nov 7, 2024

Create dataset_cleaning.py

f7bede1

Adding the dataset_cleaning.py script specified in #90

RobotPsychologist mentioned this issue Nov 7, 2024

Create dataset_cleaning.py #119

Merged

RobotPsychologist mentioned this issue Nov 12, 2024

Create change point index from data #68

Closed

RobotPsychologist closed this as completed Nov 15, 2024

github-project-automation bot moved this from In progress to Done in @RobotPsychologist's Automatic Meal Detection from Blood Glucose CGM Reading Nov 15, 2024

RobotPsychologist reopened this Nov 15, 2024

RobotPsychologist moved this from In progress to Done in @RobotPsychologist's Automatic Meal Detection from Blood Glucose CGM Reading Nov 20, 2024

RobotPsychologist closed this as completed by moving to Done in @RobotPsychologist's Automatic Meal Detection from Blood Glucose CGM Reading Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Development - Data Generation Script #90

Model Development - Data Generation Script #90

RobotPsychologist commented Oct 26, 2024 •

edited

Loading

andytubeee commented Nov 6, 2024

RobotPsychologist commented Nov 8, 2024

RobotPsychologist commented Nov 15, 2024

Model Development - Data Generation Script #90

Model Development - Data Generation Script #90

Comments

RobotPsychologist commented Oct 26, 2024 • edited Loading

dataset_processing.py

dataset_cleaning.py

dataset_generator.py

plots.py

andytubeee commented Nov 6, 2024

RobotPsychologist commented Nov 8, 2024

RobotPsychologist commented Nov 15, 2024

RobotPsychologist commented Oct 26, 2024 •

edited

Loading