Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment at negative timestamp breaks #127

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jamieforth
Copy link
Contributor

When dejittering, also take into account negative timestamps when
determining segment boundaries.

@agricolab
Copy link
Member

agricolab commented Dec 8, 2024

Thanks. If i understand this correctly, this would catch streams where timestamp progression is negative (i.e. sampling rate was negative), not necessarily streams with negative timestamps, right?

Because with a simple diff, negative timestamps would work
np.diff([-4.-2,-1,0,1,2]) -> array([2, 1, 1, 1, 1])

but reversed time wouldn't
np.diff([2,1,0,-1,-2,-4]) -> array([-1, -1, -1, -1, -2])

or glitches wouldn't
np.diff([-2,-3,-2,-1,0,1,2]) -> array([-1, 1, 1, 1, 1, 1])

unless we take the abs as you suggest.

Is that would you wanted? Or should we handle glitches, but throw an error for negative time?

@agricolab agricolab self-assigned this Dec 8, 2024
@agricolab agricolab self-requested a review December 8, 2024 03:38
Copy link
Member

@agricolab agricolab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a refactoring and added tests (see
https://github.com/agricolab/xdf-Python/tree/pr/fix/expose-segments)

to see whether it would cause any breaking changes. Passes the new tests and this implementation additionally works for negative time and glitches, now too

* Segment when timestamp intervals exceed threshold (previous
  behaviour)

* Always segment at non-positive timestamp intervals (irrespective of
  threshold)

* Additional tests for non-monotonicity
@jamieforth
Copy link
Contributor Author

If i understand this correctly, this would catch streams where timestamp progression is negative (i.e. sampling rate was negative), not necessarily streams with negative timestamps, right?

Exactly, negative time intervals would have been a better way to put it, sorry.

unless we take the abs as you suggest.

Is that would you wanted? Or should we handle glitches, but throw an error for negative time?

Thank you for the tests, this really helped clarify things.

I don't see a problem with handling negative time or negative intervals per se, then it's up to the user to decide what to do with them. However, I now think there might be a better way to do it.

So I retract the abs suggestion in favour of explicitly handling non-positive time interval. Then the original logic for positive time intervals remains the same.

The strategy I've gone for here is to always segment at a non-positive time interval – regardless of the thresholds (i.e. segments are by definition monotonic).

I can't really imagine a use case for non-monotonic segments. But if you think it's a too risky breaking change, perhaps something like _detect_breaks(monotonic=False) could preserve the existing default behaviour (in conjunction with abs).

* Revert to existing dejitter segmentation logic

* Handles cases where data is recorded out-of-order
@jamieforth
Copy link
Contributor Author

Another rethink... An underling problem to all the above is data being recorded out-of-order (which I don't think itself is necessarily a problem, just something more likely to happen with busy/WiFi networks etc.).

So ensuring the time-series data is sorted prior to synchronisation and dejitter avoids the problem!

@agricolab
Copy link
Member

agricolab commented Dec 9, 2024

Thanks for the effort and the additonal work.

I agree that non-monoticity is clearly indicative of a segmentation. It could stem from network issues, but also from clock resets or a bad implementation of the LSL outlet (some apps use custom timestamps).

Now practically e.g [0, 1, 2, 5, 6, 3, 4] would currently only catch the second segment [5,6] but not the last [3,4].

Sorting this prior to segmentation could be an option, but i would not make it the default for the following reasons.

First, unsorted (i.e. non-monotonic) timestamps would in principle caught as multiple segments, once we catch negative diffs in _detect_breaks. The difference between sorting before versus after segmentation is that as streams are dejittered segment-wise, they are dejittered differently. Therefore, sorted first and sorted last will share the identical timeseries, but the respective timestamps will be different. Might or might not be what the user wants - but that means we need to be able to set the option.

Second, pyxdf policy is to be hands-off the timeseries. We discussed this a lot during the PR for syncing stamps of different streams. The idea is to be hands-off unless the user explicitly tells pyxdf to manipulate the timeseries (and not just the timestamps).

In conclusion, my suggestion would be to catch b_breaks = np.logical_or(diffs < 0, diff > np.max((threshold_seconds, threshold_samples * stream.tdiff))) in _detect_breaks and make sorting before segmentation an optional kwarg to load_xdf. I think it is sensible to warn the user though if the stream has non-monotonic timestamps. Many users ignore timestamps anyways, and wouldn't notice without an explicit warning being printed.

Additionally, I think that non-monotonic timestamps might be linked to clock resets, so we should be aware that this is issue is also handled by _clock_sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants