Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse TRIAL_VARs #990

Open
saeub opened this issue Mar 5, 2025 · 6 comments
Open

Parse TRIAL_VARs #990

saeub opened this issue Mar 5, 2025 · 6 comments
Assignees
Labels
enhancement New feature or request essential important parsing

Comments

@saeub
Copy link
Collaborator

saeub commented Mar 5, 2025

Description of the problem

EDF files usually contain DataViewer-compatible messages with trial variables (https://www.sr-research.com/support/thread-83.html). Example from RaCCooNS:

MSG     5141156 TRIALID 8
...
MSG     5149993 !V TRIAL_VAR TRIAL_INDEX 8
MSG     5149994 !V TRIAL_VAR itemnr 199
MSG     5149995 !V TRIAL_VAR sentence Zodra de weg weer vrij is, hebben tient
allen bussen met soldaten voorrang.
MSG     5149996 !V TRIAL_VAR question Is de weg geblokkeerd?
MSG     5149997 !V TRIAL_VAR answer .
MSG     5149997 !V TRIAL_VAR KEY_PRESSED .
MSG     5149998 !V TRIAL_VAR BLOCK 2
MSG     5149999 !V TRIAL_VAR sequence E
...
MSG     5150000 TRIAL_RESULT 0

TRIALID and TRIAL_RESULT denote the start and end of a trial, this can be parsed with the patterns argument in from_asc() without problems. But currently the TRIAL_VAR messages cannot easily be parsed and associated with the corresponding trial, since they can occur at any point between TRIALID and TRIAL_RESULT. In RaCCooNS and MultiplEYE, they occur at the end of the trial (right before TRIAL_RESULT).

Since this is a standard way of encoding trial starts, ends, and variables, I think it would be nice if these would be parsed by default, and trial variables are added as additional_columns automatically.

Description of a solution

By default (or optionally?), samples between TRIALID and TRIAL_RESULT messages should get additional column values:

  • The trial ID from the TRIALID message
  • The trial result from the TRIAL_RESULT message
  • Any trial variables set using !V TRIAL_VAR messages

The values defined by the TRIAL_RESULT and !V TRIAL_VAR messages would have to be "retroactively" added to previous samples as the ASC file is being parsed -- this is probably the main challenge, but shouldn't be too difficult.

Minimum acceptance criteria

  • Trial IDs, trial results, and trial variables from DataViewer messages are parsed (either by default or optionally) into additional columns of the gaze (and event) dataframe
@saeub saeub added the enhancement New feature or request label Mar 5, 2025
@SiQube
Copy link
Member

SiQube commented Mar 5, 2025

looks like an excellent addition to pymovements, would you be interested in providing an initial PR?

@saeub saeub self-assigned this Mar 5, 2025
@saeub saeub added the parsing label Mar 6, 2025
@dkrako
Copy link
Contributor

dkrako commented Mar 6, 2025

Autoparsing trial vars is a great idea!

But wouldn't be adding these as trial columns overkill for the memory consumption? This would add a string like sentence Zodra de weg weer vrij is, hebben tient allen bussen met soldaten voorrang. from your example to every sample. I fear this would blow up.

Instead, let's separate these two functionalities:

  1. auto parsing trial vars into a message data structure
  2. implement functionality to add specific messages to gaze or event frames according to their timestamp

(2.) would be quite helpful outside of parsing and would give the user some flexibility without adding many additional arguments to the from_asc() signature.

@dkrako dkrako added the essential important label Mar 6, 2025
@saeub
Copy link
Collaborator Author

saeub commented Mar 8, 2025

Thanks for your feedback! I agree that adding all trial variables by default would be too much. But I also think that adding another data structure and a method for adding trial variables after loading might just make it more confusing to use, because it's not obvious and people would have to look for it in the documentation. Maybe we could have an optional argument like from_asc(trial_vars=["TRIAL_INDEX", "itemnr"]) so that we can at least load the most essential trial variables by default in the dataset definitions?

Data files exported with SMI's BeGaze also have built-in messages for trial IDs (but no other trial variables?), so I personally think adding trial IDs by default would be nice.

I'll come back to this after dealing with #945 and #1013.

@dkrako
Copy link
Contributor

dkrako commented Mar 8, 2025

I think your proposal is a great compromise between usability and performance.
Even the name of the argument could be used as is.

Probably the reason why I'm continually drawn towards a message data structure is the eyelinker package. It can fit data that does not belong anywhere else. It won't be the last time I propose this, but next time I'll consider better alternatives first.

@saeub
Copy link
Collaborator Author

saeub commented Mar 8, 2025

Probably the reason why I'm continually drawn towards a message data structure is the eyelinker package.

@dkrako I just looked it up, and that does look like a nice and simple structure for arbitrary messages -- much easier to handle than the custom metadata patterns introduced in #767 (I kind of regret that PR now 😅 I think the behavior is too specific and unintuitive, and #907 would just make it more complex). Maybe not a good fit for the trial variables, since they apply to an entire block and not a single timestamp, but I think a simple way to store user-defined messages would indeed be useful. 🤔

@dkrako
Copy link
Contributor

dkrako commented Mar 12, 2025

Ah don't worry, we're always smarter in hindsight. We will take care of the metadata dict step by step.

I think both approaches can be supported.

Your idea with the trial_vars argument would be probably most intuitive and would cover most use cases.

A message datastructure and a way to add specific messages to columns would then be a complementing feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request essential important parsing
Projects
None yet
Development

No branches or pull requests

3 participants