Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split_gaze_data into trial #859

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

SiQube
Copy link
Member

@SiQube SiQube commented Oct 23, 2024

the newly added gaze files in the copco dataset, are on a per-subject file basis. unfortunately many trials are within one file -- this (a bit hacky) feature gives a user the possibility to split data into trial-level GazeDataFrames.

@github-actions github-actions bot added the enhancement New feature or request label Oct 23, 2024
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (2c4a63e) to head (4751e41).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #859   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           74        74           
  Lines         3419      3441   +22     
  Branches       613       614    +1     
=========================================
+ Hits          3419      3441   +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SiQube SiQube force-pushed the split-gaze-files-into-trial-dataframes branch from c557a78 to 16b7ae7 Compare October 23, 2024 06:46
@SiQube SiQube force-pushed the split-gaze-files-into-trial-dataframes branch from 16b7ae7 to 1c6c769 Compare October 23, 2024 10:54
src/pymovements/dataset/dataset.py Outdated Show resolved Hide resolved
src/pymovements/dataset/dataset.py Outdated Show resolved Hide resolved
dkrako and others added 19 commits November 17, 2024 19:50
Some modules where missing from the html documentation.
This PR adds these.
* docs: Add missing EyeTracker class to html docs

* the eye tracker class was not correctly integrated somehow
updates:
- [github.com/asottile/pyupgrade: v3.18.0 → v3.19.0](asottile/pyupgrade@v3.18.0...v3.19.0)
- [github.com/pre-commit/mirrors-mypy: v1.12.1 → v1.13.0](pre-commit/mirrors-mypy@v1.12.1...v1.13.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates:
- [github.com/kynan/nbstripout: 0.7.1 → 0.8.0](kynan/nbstripout@0.7.1...0.8.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…9.6 (#777)

Updates the requirements on [nbsphinx](https://github.com/spatialaudio/nbsphinx) to permit the latest version.
- [Release notes](https://github.com/spatialaudio/nbsphinx/releases)
- [Changelog](https://github.com/spatialaudio/nbsphinx/blob/master/NEWS.rst)
- [Commits](spatialaudio/nbsphinx@0.8.8...0.9.5)

---
updated-dependencies:
- dependency-name: nbsphinx
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
updates:
- [github.com/nbQA-dev/nbQA: 1.8.7 → 1.9.1](nbQA-dev/nbQA@1.8.7...1.9.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Updates the requirements on [setuptools-git-versioning](https://github.com/dolfinus/setuptools-git-versioning) to permit the latest version.
- [Release notes](https://github.com/dolfinus/setuptools-git-versioning/releases)
- [Changelog](https://github.com/dolfinus/setuptools-git-versioning/blob/master/CHANGELOG.rst)
- [Commits](dolfinus/setuptools-git-versioning@v0.0.1...v2.0.0)

---
updated-dependencies:
- dependency-name: setuptools-git-versioning
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated io file

* updated test file

* Add tests for metadata parsing from ASC file

* Squashed commit of the following:

commit 14d047c
Author: Faizan Ansari <[email protected]>
Date:   Thu Oct 24 22:02:30 2024 +0200

    Remove files from remote directory

commit aa78078
Author: Faizan Ansari <[email protected]>
Date:   Thu Oct 24 21:53:35 2024 +0200

    updated code

commit cae54cc
Author: Faizan Ansari <[email protected]>
Date:   Thu Oct 24 15:40:16 2024 +0200

    changes in io.py file

* Fix formatting

* Fix indentation

* Fix circular imports

* 2 test passed

* Fix attribute name

* Refactor metadata checks, add tests

* Fix f-strings

* Fix tests

* Address comments

* Improve test coverage

* Add comment about screen resolution

* Fix metadata conflict check

* Fix test coverage

* Fix type hint

* Trigger codecov

* rebase me

* Upgrade codecov action

* Revert codecov action upgrade

---------

Co-authored-by: Faizan Ansari <[email protected]>
Co-authored-by: SiQube <[email protected]>
Copy link
Contributor

@dkrako dkrako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your work!

I think there are still some rough edges we need to iron out. There's probably some misunderstanding about how the column specifiers are treated during init.

Adding proper tests of GazeDataFrame.split() will probably resolve these issues.

A list of new GazeDataFrame instances, each containing a partition of the
original data with all metadata and configurations preserved.
"""
by = [by] if isinstance(by, str) else by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the by argument to partion_by() can be of type str or list[str], so this conversion shouldn't be needed

self.auto_column_detect = auto_column_detect
self.time_column = time_column
self.time_unit = time_unit
self.pixel_columns = pixel_columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should we storing these? the pixel_columns are merged into a single column named pixel. after terminating __init__() the pixel_columns won't exist anymore in the dataframe.

trial_columns=self.trial_columns,
time_column=self.time_column,
time_unit=self.time_unit,
position_columns=self.position_columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this works? the position_columns won't exist anymore in self.frame

auto_column_detect=self.auto_column_detect,
trial_columns=self.trial_columns,
time_column=self.time_column,
time_unit=self.time_unit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time_unit is already converted to a datetime type in the time column of self.frame, so you don't need to pass the value from the original init

GazeDataFrame(
new_frame,
experiment=self.experiment,
auto_column_detect=self.auto_column_detect,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the columns are already detected in self.frame

@@ -285,6 +285,14 @@ def __init__(

# Remove this attribute once #893 is fixed
self._metadata: dict[str, Any] | None = None
self.auto_column_detect = auto_column_detect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only needed as a flag for autodetecting pixel, velocity, etc. columns. you don't need to store this.

self.position_columns = position_columns
self.velocity_columns = velocity_columns
self.acceleration_columns = acceleration_columns
self.distance_column = distance_column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distance columns is called distance after initialization. if it was named different before, it is renamed.

experiment=self.experiment,
auto_column_detect=self.auto_column_detect,
trial_columns=self.trial_columns,
time_column=self.time_column,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time columns is called time and will be autodetected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests to gaze_dataframe_test.py?

please check the resulting splits in a similar way like it is done in #879 , e.g. check equality of the by-column within a split and check for difference to all other splits.

@@ -231,6 +231,32 @@ def load_precomputed_reading_measures(self) -> None:
self.paths,
)

def _split_gaze_data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably call remove the leading _

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants