mismatch in event timestamp length #700

dougollerenshaw · 2020-12-22T21:11:37Z

The length of the timestamp array in the dataset events dataframe does not match the length of the filtered_events array.

For example:

from visual_behavior.data_access import loading
oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140014

len(dataset.events.iloc[0]['filtered_events'])

140012

Note that the length of the ophyst_timestamps attribute matches the length of the filtered_events attribute, so it would seem that the 'timestamps' attribute of the events dataframe is the outlier.

len(dataset.ophys_timestamps)

140012

The text was updated successfully, but these errors were encountered:

dougollerenshaw · 2020-12-22T21:17:16Z

I just checked another session and it's also off by two:

from visual_behavior.data_access import loading
oeid = 958435448
dataset = loading.get_ophys_dataset(oeid)
len(dataset.events.iloc[0]['timestamps'])

140058

len(dataset.events.iloc[0]['filtered_events'])

140056

len(dataset.ophys_timestamps)

140056

dougollerenshaw · 2020-12-22T21:48:31Z

One more clue: the timestamps align at the beginning of the array, but not at the end. That'd seem to imply that the extra two timestamps in dataset.events.iloc[0]['timestamps'] are at the end:

dougollerenshaw · 2020-12-22T21:52:41Z

@matchings are those timestamps inherited directly from the array that @ledochowitsch is saving to disk?

matchings · 2020-12-22T22:03:52Z

@dougollerenshaw Yes. For now, don't use the timestamps in the events df. Use dataset.ophys_timestamps. Those are the ground truth from SDK. I am not sure what could be causing the timestamps in the event detection output to be off.

dougollerenshaw · 2020-12-22T22:39:46Z

More evidence that the extra two timestamps at the end are extraneous:

from visual_behavior.data_access import loading
import matplotlib.pyplot as plt

oeid = 953443028
dataset = loading.get_ophys_dataset(oeid)

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

Plotting with last two timestamps trimmed off:

ax.plot(
    dataset.events.iloc[4]['timestamps'][:-2], #trim off last two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

gives us two aligned traces:

But trimming off the first two gives us misaligned traces:

fig,ax=plt.subplots()
ax.plot(
    dataset.ophys_timestamps,
    dataset.events.iloc[4]['filtered_events']
)

ax.plot(
    dataset.events.iloc[4]['timestamps'][2:], # trim off first two
    dataset.events.iloc[4]['filtered_events'],
    linestyle = ':',
    linewidth = 3
)

ax.set_xlim(352,353)
ax.set_ylim(-0.01,0.05)

ledochowitsch · 2020-12-22T23:22:41Z

Hey guys, Which time stamps are you guys talking about? Note that once you load the npz file, there are different sets of time_stamps in there: there is npz[‘ts’], which should be identical to what you get from the SDK because it’s just the result of dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0] Then inside the event_dict object there is the key ‘ts’, which contains the time stamps for the detected events, for each cell id: npz[‘event_dict’][cid][‘ts’] Those time stamps are computed by upsampling the original time stamps: ts30Hz = resample_poly(ts, uf, 1) figuring out where there are events in the upsampled events trace: event_idx = np.where(event30Hz>0)[0] #upsampled and finally indexing with that into the upsampled time stamp trace: event_ts = ts30Hz[event_idx] What appears to be wrong? Best,

…

-Peter From: Doug Ollerenshaw <[email protected]> Date: Tuesday, December 22, 2020 at 1:53 PM To: AllenInstitute/visual_behavior_analysis <[email protected]> Cc: Peter Ledochowitsch <[email protected]>, Mention <[email protected]> Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700) @matchings<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pl6kzVpYQR1E0Pzry3eehhm6izl06Cy20TW7fTxBqwc%3D&reserved=0> are those timestamps inherited directly from the array that @ledochowitsch<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798159803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=E%2F8FDVT97NctRXoD4XA%2B%2BVvzxuVBQsJlM11FpBPihQY%3D&reserved=0> is saving to disk? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749793515&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SqLklaUnYSYF%2Fx2pApAvmKj4MNLBkhWsm8lHR9JJQtg%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVVFJY5SOTKX7RBGHMTSWEILPANCNFSM4VGEZ56A&data=04%7C01%7C%7Cfc79e335927f4cbf81f808d8a6c3f0a8%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442707798169796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=d9dXLbvSWSORLLvclbekw5PN6Y%2FV5tGj340WRizlPwE%3D&reserved=0>.

matchings · 2020-12-22T23:44:40Z

i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict.

ledochowitsch · 2020-12-22T23:46:35Z

That’s very mysterious then – I’m just passing those through…

…

-Peter From: Marina <[email protected]> Date: Tuesday, December 22, 2020 at 3:44 PM To: AllenInstitute/visual_behavior_analysis <[email protected]> Cc: Peter Ledochowitsch <[email protected]>, Mention <[email protected]> Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700) i believe the timestamps Doug is talking about are the npz['ts'] ones, not the ones in the event_dict. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749838709&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967952327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=binKSuGQjl4az7spjvVkX%2FpH2FGUcFuTB3anhy9O2Xw%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVRRHO2IKNTCFEMOILLSWEVPLANCNFSM4VGEZ56A&data=04%7C01%7C%7C850c8dca95c347320fd608d8a6d39506%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442774967962324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=DQWehU1JH5bpm4bZv4kK22%2BHKQbeXtXGqo6Agm2d2zo%3D&reserved=0>.

dougollerenshaw · 2020-12-22T23:48:38Z

Thanks @ledochowitsch. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied.

But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves.

For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file:

import numpy as np
events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz'
f = np.load(events_file, allow_pickle=True)

Get the length of the timestamps array:
len(f['ts'])

140014

Get the length of an events trace:

event_dict = f['event_dict'].item()
cell_roi_ids = list(event_dict.keys())
len(event_dict[cell_roi_ids[0]]['event_trace'])

140012

Above you said:

there is npz[‘ts’], which should be identical to what you get from the SDK

When I go back to the directly to the SDK, I get this:

from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession
oeid = 953443028
session = BehaviorOphysSession.from_lims(oeid)
len(session.ophys_timestamps)

140012

But you went on to say:

...because it’s just the result of

dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False)
ts = dataset.timestamps.ophys_frames.values[0]

Checking that myself, I see:

from visual_behavior.data_access import loading
dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False)
len(dataset.timestamps.ophys_frames['timestamps'])

140014

So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchings, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps?

ledochowitsch · 2020-12-23T00:00:29Z

Ah, I see... That’s frustrating:/. The good news is that the contents of npz[‘events’] will be unaffected by this issue. However, the events time stamps will be off by the same two samples…

…

-Peter From: Doug Ollerenshaw <[email protected]> Date: Tuesday, December 22, 2020 at 3:48 PM To: AllenInstitute/visual_behavior_analysis <[email protected]> Cc: Peter Ledochowitsch <[email protected]>, Mention <[email protected]> Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700) Thanks @ledochowitsch<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fledochowitsch&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZgLQwOfTjPxjyki3EGk1qXA591cQbt8SeZr%2FR2sEJbs%3D&reserved=0>. Sorry for being unclear about the underlying issue. I was initially struggling to understand it myself so this issue got a little muddied. But the fundamental issue is this: the timestamps associated with the events are two values longer than the events arrays themselves. For the same oeid I initially referenced in this issue, here's what happens when I go back to the cached events file: import numpy as np events_file = '//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/event_detection/953443028.npz' f = np.load(events_file, allow_pickle=True) Get the length of the timestamps array: len(f['ts']) 140014 Get the length of an events trace: event_dict = f['event_dict'].item() cell_roi_ids = list(event_dict.keys()) len(event_dict[cell_roi_ids[0]]['event_trace']) 140012 Above you said: there is npz[‘ts’], which should be identical to what you get from the SDK When I go back to the directly to the SDK, I get this: from allensdk.brain_observatory.behavior.behavior_ophys_session import BehaviorOphysSession oeid = 953443028 session = BehaviorOphysSession.from_lims(oeid) len(session.ophys_timestamps) 140012 But you went on to say: ...because it’s just the result of dataset = loading.get_ophys_dataset(eval(exp_id), include_invalid_rois=False) ts = dataset.timestamps.ophys_frames.values[0] Checking that myself, I see: from visual_behavior.data_access import loading dataset = loading.get_ophys_dataset(oeid, include_invalid_rois=False) len(dataset.timestamps.ophys_frames['timestamps']) 140014 So it'd seem that the dataset.timestamps.ophys_frames['timestamps'] attribute is the source of the confusion here. @matchings<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmatchings&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZSbaUfWGQq0FMMcvQ5sOoFVjLPKDxZ91ZwudNSCKJ3A%3D&reserved=0>, do you know where that attribute is coming from and why it would be two elements longer than ophys_timestamps? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749839540&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=M7FILrl55SbMxpX3c%2Br4cmXGHEuGCGsAb4WS4ivFiZA%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVWS6QH6SPUKRLN7AHDSWEV6HANCNFSM4VGEZ56A&data=04%7C01%7C%7Cb7b765066ca84b5e4ff408d8a6d422fb%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442777354418365%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZGTdRXsyBEGGz0NnbTIaH%2FbL3wR%2FmQEQDE04O%2Fd32W0%3D&reserved=0>.

matchings · 2020-12-23T00:01:46Z

dataset.timestamps.ophys_frames['timestamps'] are computed directly from the sync file, and these are what is used to create dataset.ophys_timestamps for mesoscope experiments because the SDK does not yet do the proper time resampling for mesoscope (or at least it didnt in the version we are using). For scientifica, dataset.ophys_timestamps is pulled directly from the SDK. If the SDK is doing some truncation of frames, it could lead to a discrepancy between dataset.ophys_timestamps and dataset.timestamps.ophys_frames['timestamps']. But that should be specific to Scientifica, because mesoscope uses the same thing for both. I hope that makes sense...

dougollerenshaw · 2020-12-23T00:15:27Z

Thanks @matchings. It looks like you're correct that this is specific to scientifica sessions. Here's an example from mesoscope showing that both the SDK ophys_timestamps and the VBA dataset ophys_frames['timestamps'] vectors are the same length:

And here's a different 2P3 (scientifica) session with the same off-by-two error as above:

So does this mean that the problem is with the SDK? If so, we should submit an SDK issue to solve it. These discrepancies will undoubtedly confuse other users in the future.

matchings · 2020-12-23T00:29:50Z

im guessing that the SDK truncates the timestamps to match the ophys traces, which is probably a desired behavior, otherwise we would have mismatches all over the place. i believe the scientificas are known to give out a few extra TTL pulses at the end of the session (or at least MPE says its at the end, its nice that you just validated that here), which we want to remove so that everything is aligned. it surprises me that you are always seeing an off by exactly 2 issue though, because i thought the number of those extra pulses at the end was variable.

alexpiet · 2020-12-23T01:24:26Z

I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)?

visual_behavior_analysis/visual_behavior/ophys/response_analysis/response_processing.py

Line 422 in 0b07d46

this_trace_filtered = np.convolve(this_trace, filt)[:len(this_trace)]

https://numpy.org/doc/stable/reference/generated/numpy.convolve.html

ledochowitsch · 2020-12-23T01:39:03Z

I had in fact double-checked that the tine stamps were the same length as the dff traces when I prototyped the code - on MesoScope data. Unfortunately, I have not re-checked when I generalized it to also work for Scientifica. Who would have thunk?

…

-Peter Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: Alex Piet <[email protected]> Sent: Tuesday, December 22, 2020 5:24:38 PM To: AllenInstitute/visual_behavior_analysis <[email protected]> Cc: Peter Ledochowitsch <[email protected]>; Mention <[email protected]> Subject: Re: [AllenInstitute/visual_behavior_analysis] mismatch in event timestamp length (#700) I'm always paranoid about convolutions. Should the mode be "valid" instead of "full" (default)? https://github.com/AllenInstitute/visual_behavior_analysis/blob/0b07d4657b80431b328122efc6ef60122306b654/visual_behavior/ophys/response_analysis/response_processing.py#L422<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fblob%2F0b07d4657b80431b328122efc6ef60122306b654%2Fvisual_behavior%2Fophys%2Fresponse_analysis%2Fresponse_processing.py%23L422&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AzxIbsGJ1RMg83%2FrhUZllzsar3fURg5ZMig%2FkkZdS4A%3D&reserved=0> https://numpy.org/doc/stable/reference/generated/numpy.convolve.html<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnumpy.org%2Fdoc%2Fstable%2Freference%2Fgenerated%2Fnumpy.convolve.html&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815889075%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zmKiDtZO8N%2BSeN%2F%2FBvxACPpxTARuK8OIsgYb3XDkl%2FE%3D&reserved=0> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAllenInstitute%2Fvisual_behavior_analysis%2Fissues%2F700%23issuecomment-749866117&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815899059%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WmpaNR%2BJwbY79gvnfN9SeW6PL2ovW14ph0LYgSntKE0%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVLHVTZHSQI2Z2C75DBWY3SWFBFNANCNFSM4VGEZ56A&data=04%7C01%7C%7C76ecd160462f405b878f08d8a6e184b3%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637442834815909057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bdIjw5xHzDU4EAwr7ZmUKnP6VrsBE9u5oo2Fg0rfEr0%3D&reserved=0>.

dougollerenshaw assigned matchings Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mismatch in event timestamp length #700

mismatch in event timestamp length #700

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

matchings commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

ledochowitsch commented Dec 22, 2020 via email

matchings commented Dec 22, 2020

ledochowitsch commented Dec 22, 2020 via email

dougollerenshaw commented Dec 22, 2020

ledochowitsch commented Dec 23, 2020 via email

matchings commented Dec 23, 2020

dougollerenshaw commented Dec 23, 2020 •

edited

Loading

matchings commented Dec 23, 2020

alexpiet commented Dec 23, 2020

ledochowitsch commented Dec 23, 2020 via email

mismatch in event timestamp length #700

mismatch in event timestamp length #700

Comments

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

matchings commented Dec 22, 2020

dougollerenshaw commented Dec 22, 2020

ledochowitsch commented Dec 22, 2020 via email

matchings commented Dec 22, 2020

ledochowitsch commented Dec 22, 2020 via email

dougollerenshaw commented Dec 22, 2020

ledochowitsch commented Dec 23, 2020 via email

matchings commented Dec 23, 2020

dougollerenshaw commented Dec 23, 2020 • edited Loading

matchings commented Dec 23, 2020

alexpiet commented Dec 23, 2020

ledochowitsch commented Dec 23, 2020 via email

dougollerenshaw commented Dec 23, 2020 •

edited

Loading