Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] - ArrowNotImplementedError: Support for codec 'snappy' not built #212

Closed
Marsataheri opened this issue Mar 18, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@Marsataheri
Copy link

It seems that I'm not able to load clusters for any of the repeated site recordings. Perhaps this issue may be related to another posted issue here #175 , but I'm not sure.

Specifically, when I try to run the brain_region_ephys_variability_between_labs.py script in the paper-reproducible-ephys repository, I get a key error, because the DataFrame "rep_site" is empty. After looking into this, it seems the problem is in the following line:
spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one)

When I run this line for any of the repeated site recordings, I get messages such as this one, resulting in an empty "clusters" variable:

2021-03-18 00:58:13.759 INFO [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01. Downloading....
2021-03-18 00:58:19.433 WARNING [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01. Clusters for probe01 will return an empty dict
2021-03-18 00:58:19.713 INFO [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe00. Downloading....
2021-03-18 00:58:28.42 WARNING [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe00. Clusters for probe00 will return an empty dict
2021-03-18 00:58:33.284 INFO [one.py:92] Channel locations for probe01 have been resolved. Channel and cluster locations obtained from ephys aligned histology track.
2021-03-18 00:58:35.771 INFO [one.py:147] Channel locations for probe00 have not been resolved. Channel and cluster locations obtained from histology track.
2021-03-18 00:58:39.740 WARNING [one.py:355] Either clusters or channels does not have key probe01, could not merge
2021-03-18 00:58:39.740 WARNING [one.py:355] Either clusters or channels does not have key probe00, could not merge

I appreciate any help on what's causing this issue and how to resolve it. Thank you.

@Marsataheri Marsataheri added the bug Something isn't working label Mar 18, 2021
@GaelleChapuis
Copy link
Collaborator

I tried the following code snippet and it ran fine:

"""
trying to reproduce Marsa's error

spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one)
mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01

{'subject': 'SWC_054', 'start_time': '2020-10-05T19:15:05.974514',
'number': 1, 'lab': 'mrsicflogellab', 'project': 'ibl_neuropixel_brainwide_01',
'url': 'https://alyx.internationalbrainlab.org/sessions/56b57c38-2699-4091-90a8-aba35103155e',
'task_protocol': '_iblrig_tasks_ephysChoiceWorld6.4.2'}

"""

import brainbox.io.one as bbone
from oneibl.one import ONE
one = ONE()
# sess=one.alyx.rest('sessions', 'list', subject='SWC_054', date='2020-10-05')
eid = '56b57c38-2699-4091-90a8-aba35103155e'
spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one)

here's the output I obtain:

Connected to https://alyx.internationalbrainlab.org as Gaelle
2021-03-18 12:06:27.287 INFO     [one.py:271] Did not find local files for spikes and clusters for /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001 and probe01. Downloading....
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe00/clusters.depths.eac33465-e1c7-4a2a-a338-5545cb48d353.npy Bytes: 3792
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe00/clusters.channels.75f03ce1-6702-4bf5-96f3-ad0a606e640b.npy Bytes: 3792
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/clusters.depths.a5215f77-5bff-4f19-bd53-07a36d755dd5.npy Bytes: 5912
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probes.description.86780020-69b6-4726-88c1-077e7704136e.json Bytes: 468
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/clusters.channels.f37332f5-f8de-4f17-afb4-55ae3db76ec3.npy Bytes: 5912
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 2021-03-18 12:06:33.130 INFO     [one.py:288] Local files for spikes and clusters for /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001 and probe00 found. To re-download set force=True
2021-03-18 12:06:33.619 INFO     [one.py:92] Channel locations for probe01 have been resolved. Channel and cluster locations obtained from ephys aligned histology track.
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/channels.rawInd.739842e5-7e4a-42e2-b05e-69dc2cd7c50b.npy Bytes: 3120
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/channels.localCoordinates.39c75686-82ba-4b30-8ad5-cd8d420c21e2.npy Bytes: 6064
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/channels.brainLocationIds_ccf_2017.97cf4bc0-9230-42e7-9a1b-e97bc7197449.npy Bytes: 3120
Downloading: /Users/gaelle/Downloads/FlatIron/mrsicflogellab/Subjects/SWC_054/2020-10-05/001/alf/probe01/channels.mlapdv.4b1b283b-21d5-4b8c-845c-44ddb10f8fef.npy Bytes: 4616
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 2021-03-18 12:06:35.644 INFO     [one.py:147] Channel locations for probe00 have not been resolved. Channel and cluster locations obtained from histology track.

Could you please confirm you are using the same eid in your query @Marsataheri ?

@Marsataheri
Copy link
Author

Marsataheri commented Mar 18, 2021

Thanks, Gaelle. When I run the piece of code you shared above, with the eid you have, I get the following error:

2021-03-18 08:19:09.642 INFO [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01. Downloading....
Traceback (most recent call last):

File "", line 6, in
spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\brainbox\io\one.py", line 377, in load_spike_sorting_with_channel
dic_spk_bunch, dic_clus = load_spike_sorting(eid, one=one, probe=probe,

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\brainbox\io\one.py", line 273, in load_spike_sorting
one.load(eid, dataset_types=dtypes, download_only=True)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 126, in wrapper
return method(self, id, *args, **kwargs)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 395, in load
return self._load_recursive(eid, dataset_types=dataset_types, dclass_output=dclass_output,

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 507, in _load_recursive
return self._load(eid, **kwargs)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 168, in _load
dc = self._make_dataclass(eid_str, dataset_types, **kwargs)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 580, in _make_dataclass
self._update_cache(ses, dataset_types=dataset_types)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\oneibl\one.py", line 995, in _update_cache
parquet.save(self._cache_file, self._cache)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\brainbox\io\parquet.py", line 27, in save
pq.write_table(pa.Table.from_pandas(table), file)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\pyarrow\parquet.py", line 1798, in write_table
writer.write_table(table, row_group_size=row_group_size)

File "C:\Users\hamel\anaconda3\envs\iblenv\lib\site-packages\pyarrow\parquet.py", line 651, in write_table
self.writer.write_table(table, row_group_size=row_group_size)

File "pyarrow_parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.write_table

File "pyarrow\error.pxi", line 105, in pyarrow.lib.check_status

ArrowNotImplementedError: Support for codec 'snappy' not built

This is the exact code I copy and pasted:

import brainbox.io.one as bbone
from oneibl.one import ONE
one = ONE()
# sess=one.alyx.rest('sessions', 'list', subject='SWC_054', date='2020-10-05')
eid = '56b57c38-2699-4091-90a8-aba35103155e'
spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one)

Now, if I immediately run just the last line again spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one) then I get the following:

2021-03-18 08:25:24.552 INFO [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01. Downloading....
2021-03-18 08:25:28.669 WARNING [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe01. Clusters for probe01 will return an empty dict
2021-03-18 08:25:28.852 INFO [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe00. Downloading....
2021-03-18 08:25:33.86 WARNING [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\mrsicflogellab\Subjects\SWC_054\2020-10-05\001 and probe00. Clusters for probe00 will return an empty dict
2021-03-18 08:25:37.209 INFO [one.py:92] Channel locations for probe01 have been resolved. Channel and cluster locations obtained from ephys aligned histology track.
2021-03-18 08:25:38.621 INFO [one.py:147] Channel locations for probe00 have not been resolved. Channel and cluster locations obtained from histology track.
2021-03-18 08:25:41.793 WARNING [one.py:355] Either clusters or channels does not have key probe01, could not merge
2021-03-18 08:25:41.795 WARNING [one.py:355] Either clusters or channels does not have key probe00, could not merge

@Marsataheri
Copy link
Author

Marsataheri commented Mar 18, 2021

I just tried the following, but still ran into the same problem; perhaps it may provide some additional insight?

I removed the FlatIron\danlab\ folder from my local computer and I ran spikes, clusters, channels = bbone.load_spike_sorting_with_channel(eid, one=one), this time for eid d23a44ef-1402-4ed7-97f5-47e9a7a504d9. I got the following output:

2021-03-18 10:24:54.480 INFO     [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001 and probe01. Downloading....
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probes.description.37414ffd-1388-4a0b-9b01-ad0716c1de66.json Bytes: 466
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\clusters.metrics.1e802055-884e-4a60-ac58-1965f50f2680.pqt Bytes: 63962
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\spikes.clusters.cd0bb1ac-63ad-4f2e-a4e4-4a4fde0caaa2.npy Bytes: 33703456
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\spikes.times.6c7e51b0-93a2-46b3-b352-36a21f89fff3.npy Bytes: 67406784
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\clusters.depths.c0b67572-98f0-4fb8-b3ad-a84e8c646e0b.npy Bytes: 3984
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\clusters.channels.3c5336ae-0e26-4724-9b22-1cceea666afb.npy Bytes: 3984
 |███████████████████████████████████████████████████████████████████████████████████████████████████-| 99.6%  2021-03-18 10:33:59.63 WARNING  [one.py:276] Could not load spikes datasets for session C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001 and probe01. Spikes for None will return an empty dict
2021-03-18 10:33:59.63 WARNING  [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001 and probe01. Clusters for probe01 will return an empty dict
2021-03-18 10:33:59.104 INFO     [one.py:271] Did not find local files for spikes and clusters for C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001 and probe00. Downloading....
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 2021-03-18 10:34:03.601 WARNING  [one.py:788]  local md5 or size mismatch, re-downloading C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\spikes.clusters.npy
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\spikes.clusters.cd0bb1ac-63ad-4f2e-a4e4-4a4fde0caaa2.npy Bytes: 33703456
 |███████████████████████████████████████████████████████████████████████████████████████████████████-| 99.6% 2021-03-18 10:36:43.957 WARNING  [one.py:282] Could not load clusters datasets for session C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001 and probe00. Clusters for probe00 will return an empty dict
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 2021-03-18 10:36:48.336 INFO     [one.py:147] Channel locations for probe01 have not been resolved. Channel and cluster locations obtained from histology track.
2021-03-18 10:36:51.757 INFO     [one.py:92] Channel locations for probe00 have been resolved. Channel and cluster locations obtained from ephys aligned histology track.
2021-03-18 10:36:53.336 WARNING  [one.py:355] Either clusters or channels does not have key probe01, could not merge
2021-03-18 10:36:53.337 WARNING  [one.py:355] Either clusters or channels does not have key probe00, could not merge
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\channels.brainLocationIds_ccf_2017.4c9aaf58-87c1-407c-b3ed-bc30136e454b.npy Bytes: 3120Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\channels.rawInd.0a774fd0-b912-4f59-aa8c-77036251ca47.npy Bytes: 3120

Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\channels.localCoordinates.75ba8f1e-6ac8-4380-9321-2e49bf5b09ce.npy Bytes: 6064
Downloading: C:\Users\hamel\Documents\FlatIron\danlab\Subjects\DY_016\2020-09-12\001\alf\probe00\channels.mlapdv.d92c2a32-bc4a-43e3-b9e6-8b34fd17bced.npy Bytes: 4616
 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 

@oliche oliche changed the title [Bug report] - Cannot load clusters datasets for repeated site sessions [Bug report] - ArrowNotImplementedError: Support for codec 'snappy' not built Mar 18, 2021
@oliche
Copy link
Member

oliche commented Mar 18, 2021

So after some analysis it comes from an environment issue, regarding the latest pyarrow conda package:

ArrowNotImplementedError: Support for codec 'snappy' not built

ContinuumIO/anaconda-issues#12164
https://stackoverflow.com/questions/66017811/python-error-using-pyarrow-arrownotimplementederror-support-for-codec-snappy
AnacondaRecipes/pyarrow-feedstock#2

The error message was cryptic because of an Except clause without printout in brainbox. This was fixed.

The package seems to be installed from pypi in the default channel. So far we tried forcing the conda-forge channel as follows:
conda uninstall pyarrow
conda install pyarrow -c conda-forge

Awaiting feedback

@oliche
Copy link
Member

oliche commented Mar 18, 2021

So just uninstalling and re-installing pyarrow works.
Package shows installed as version 3.0.0 from channel pypi, but it now works.

@oliche oliche closed this as completed Mar 18, 2021
@oluwafemi2016
Copy link

I am having similar problem. I followed the suggestion of uninstalling and re-installing pyarrows but the issue persists. Below is the error message I received. Any suggestion. Thanks


OSError Traceback (most recent call last)
in
----> 1 raw_data = pd.read_parquet(DATAFILE_PATH, engine='auto')

~\anaconda3\lib\site-packages\pandas\io\parquet.py in read_parquet(path, engine, columns, use_nullable_dtypes, **kwargs)
457 """
458 impl = get_engine(engine)
--> 459 return impl.read(
460 path, columns=columns, use_nullable_dtypes=use_nullable_dtypes, **kwargs
461 )

~\anaconda3\lib\site-packages\pandas\io\parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs)
219 )
220 try:
--> 221 return self.api.parquet.read_table(
222 path_or_handle, columns=columns, **kwargs
223 ).to_pandas(**to_pandas_kwargs)

~\anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, columns, use_threads, metadata, use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, use_legacy_dataset, ignore_prefixes)
1729 memory_map=memory_map, buffer_size=buffer_size)
1730
-> 1731 return dataset.read(columns=columns, use_threads=use_threads,
1732 use_pandas_metadata=use_pandas_metadata)
1733

~\anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, use_threads, use_pandas_metadata)
1606 use_threads = False
1607
-> 1608 table = self._dataset.to_table(
1609 columns=columns, filter=self._filter_expression,
1610 use_threads=use_threads

~\anaconda3\lib\site-packages\pyarrow_dataset.pyx in pyarrow._dataset.Dataset.to_table()

~\anaconda3\lib\site-packages\pyarrow_dataset.pyx in pyarrow._dataset.Scanner.to_table()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.pyarrow_internal_check_status()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

OSError: NotImplemented: Support for codec 'snappy' not built

@flevobi
Copy link

flevobi commented Oct 20, 2021

I can confirm. Uninstalling and re-installing didn't help unfortunately. Same stacktrace as oluwafemi2016,

Version pyarrow after re-install: 4.0.1

@k1o0
Copy link
Contributor

k1o0 commented Nov 2, 2021

Did you two install via cona, conda-forge or pip?

@flevobi
Copy link

flevobi commented Nov 2, 2021

It's the re-install via conda that didn't resolve the problem.
Actually, after a 'conda uninstall pyarrow' and subsequent 'python -m pip install pyarrow' the problem was gone...

@k1o0
Copy link
Contributor

k1o0 commented Dec 21, 2021

Actually, after a 'conda uninstall pyarrow' and subsequent 'python -m pip install pyarrow' the problem was gone...

@oluwafemi2016 Did this work for you too?

@vanHueNis
Copy link

I had the same issue, also tried conda install pyarrow and the conda forge, and it failed. But with "python -m pip install pyarrow" it works fine now, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants