Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Development - Transformations Script #96

Open
RobotPsychologist opened this issue Oct 30, 2024 · 3 comments · Fixed by #176
Open

Model Development - Transformations Script #96

RobotPsychologist opened this issue Oct 30, 2024 · 3 comments · Fixed by #176
Assignees
Labels
modeldev Developing modeling pipelines for meal annotation task.

Comments

@RobotPsychologist
Copy link
Owner

RobotPsychologist commented Oct 30, 2024

@y-mx @aryavkin

The idea for this ticket is to implement a function that takes the data produced from the data generation pipeline:

  • 0_meal_identification/meal_identification/meal_identification/datasets/dataset_cleaner.py
  • 0_meal_identification/meal_identification/meal_identification/datasets/dataset_generator.py
  • 0_meal_identification/meal_identification/meal_identification/datasets/dataset_operations.py

The above scripts are intended to facilitate the data generation and cleaning that occurs outside of the sktime library.

The transformations script will operate as the connection point between the data generated from above and the training pipeline. The training pipeline should be able to call the transformation script in a loop for extend training runs where we loop through a dictionary of sktime transformation pipelines:

Inside the transformation function itself should be a looping mechanism that loops through a list of provided datasets to apply the transformations to. E.g., we could have two identical data sets but one is the three hour meal window and one is the five hour meal window, but we want to apply the same transformations on both data sets for our experiments.

So one loop of the transformation script should:

  1. Check if the transformed data set already exists in: 0_meal_identification/meal_identification/data/processed
  2. If it does not exist load the specified data set from: 0_meal_identification/meal_identification/data/interim
  3. Apply the transformation pipeline
  4. Store the transformed data for caching if specified, for a given training run, we could create a new subdirectory with the training runs label, and a new directory for each transformation pipeline applied e.g. 0_meal_identification/meal_identification/data/processed/{training run label}/{pipeline label}

Once the looping is complete is should

  • Return a dictionary of the transformed data set(s) if specified
  • Record logs of the transformed data (perhaps create a external log recorder function, we don't need to write this right now just have the function set up to call an external logger function).

Please also right tests like the data team did using pydantic, you can reach out to them for guidance on this to conform to the standards they have been using for consistency. Reach out to @Tony911029 @andytubeee @Phiruby if you have questions regarding this.

@fkiraly Please let me know if this makes sense or if there are any other clarifications required.

@RobotPsychologist RobotPsychologist converted this from a draft issue Oct 30, 2024
@RobotPsychologist RobotPsychologist added the modeldev Developing modeling pipelines for meal annotation task. label Oct 30, 2024
@aryavkin
Copy link
Contributor

aryavkin commented Nov 6, 2024

interested

@y-mx
Copy link
Contributor

y-mx commented Nov 6, 2024

add me please

@RobotPsychologist
Copy link
Owner Author

@Tony911029 and @andytubeee help with unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modeldev Developing modeling pipelines for meal annotation task.
Development

Successfully merging a pull request may close this issue.

3 participants