Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mismatch in event timestamp length #700

Open
dougollerenshaw opened this issue Dec 22, 2020 · 15 comments
Open

mismatch in event timestamp length #700

dougollerenshaw opened this issue Dec 22, 2020 · 15 comments
Assignees

Comments

@dougollerenshaw
Copy link
Contributor

The length of the timestamp array in the dataset events dataframe does not match the length of the filtered_events array.

For example:

from visual_behavior.data_access import loading
oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140014

len(dataset.events.iloc[0]['filtered_events'])

140012

Note that the length of the ophyst_timestamps attribute matches the length of the filtered_events attribute, so it would seem that the 'timestamps' attribute of the events dataframe is the outlier.

len(dataset.ophys_timestamps)

140012

@dougollerenshaw
Copy link
Contributor Author

I just checked another session and it's also off by two:

from visual_behavior.data_access import loading
oeid = 958435448
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140058

len(dataset.events.iloc[0]['filtered_events'])

140056

len(dataset.ophys_timestamps)

140056

@dougollerenshaw
Copy link
Contributor Author

One more clue: the timestamps align at the beginning of the array, but not at the end. That'd seem to imply that the extra two timestamps in dataset.events.iloc[0]['timestamps'] are at the end:
image

@dougollerenshaw
Copy link
Contributor Author

@matchings are those timestamps inherited directly from the array that @ledochowitsch is saving to disk?

@matchings
Copy link
Collaborator

@dougollerenshaw Yes. For now, don't use the timestamps in the events df. Use dataset.ophys_timestamps. Those are the ground truth from SDK. I am not sure what could be causing the timestamps in the event detection output to be off.

@dougollerenshaw
Copy link
Contributor Author

More evidence that the extra two timestamps at the end are extraneous:

from visual_behavior.data_access import loading
import matplotlib.pyplot as plt

oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

Plotting with last two timestamps trimmed off:

ax.plot(
    dataset.events.iloc[4]['timestamps'][:-2], #trim off last two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

gives us two aligned traces:

image

But trimming off the first two gives us misaligned traces:

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

ax.plot(
    dataset.events.iloc[4]['timestamps'][2:], # trim off first two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

image

@ledochowitsch
Copy link

ledochowitsch commented Dec 22, 2020 via email

@matchings
Copy link
Collaborator

i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.

@ledochowitsch
Copy link

ledochowitsch commented Dec 22, 2020 via email

@dougollerenshaw
Copy link
Contributor Author

Thanks @ledochowitsch. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.

But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.

For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:

import numpy as np
events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'
f = np.load(events_file, allow_pickle=True)

Get the length of the timestamps array:
len(f['ts'])

140014

Get the length of an events trace:

event_dict = f['event_dict'].item()
cell_roi_ids = list(event_dict.keys())
len(event_dict[cell_roi_ids[0]]['event_trace'])

140012

Above you said:

there is npz[‘ts’], which should be identical to what you get from the SDK

When I go back to the directly to the SDK, I get this:

from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession
oeid = 953443028
session = BehaviorOphysSession.from_lims(oeid)
len(session.ophys_timestamps)

140012

But you went on to say:

...because it’s just the result of

dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False)
ts = dataset.timestamps.ophys_frames.values[0]

Checking that myself, I see:

from visual_behavior.data_access import loading
dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)
len(dataset.timestamps.ophys_frames['timestamps'])

140014

So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchings, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps?

@ledochowitsch
Copy link

ledochowitsch commented Dec 23, 2020 via email

@matchings
Copy link
Collaborator

dataset.timestamps.ophys_frames['timestamps'] are computed directly from the sync file, and these are what is used to create dataset.ophys_timestamps for mesoscope experiments because the SDK does not yet do the proper time resampling for mesoscope (or at least it didnt in the version we are using). For scientifica, dataset.ophys_timestamps is pulled directly from the SDK. If the SDK is doing some truncation of frames, it could lead to a discrepancy between dataset.ophys_timestamps and dataset.timestamps.ophys_frames['timestamps']. But that should be specific to Scientifica, because mesoscope uses the same thing for both. I hope that makes sense...

@dougollerenshaw
Copy link
Contributor Author

dougollerenshaw commented Dec 23, 2020

Thanks @matchings. It looks like you're correct that this is specific to scientifica sessions. Here's an example from mesoscope showing that both the SDK ophys_timestamps and the VBA dataset ophys_frames['timestamps'] vectors are the same length:

image

And here's a different 2P3 (scientifica) session with the same off-by-two error as above:

image

So does this mean that the problem is with the SDK? If so, we should submit an SDK issue to solve it. These discrepancies will undoubtedly confuse other users in the future.

@matchings
Copy link
Collaborator

im guessing that the SDK truncates the timestamps to match the ophys traces, which is probably a desired behavior, otherwise we would have mismatches all over the place. i believe the scientificas are known to give out a few extra TTL pulses at the end of the session (or at least MPE says its at the end, its nice that you just validated that here), which we want to remove so that everything is aligned. it surprises me that you are always seeing an off by exactly 2 issue though, because i thought the number of those extra pulses at the end was variable.

@alexpiet
Copy link
Collaborator

I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?

this_trace_filtered = np.convolve(this_trace, filt)[:len(this_trace)]

https://numpy.org/doc/stable/reference/generated/numpy.convolve.html

@ledochowitsch
Copy link

ledochowitsch commented Dec 23, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants