Skip to content

Commit

Permalink
Merge pull request #117 from int-brain-lab/v2.7.0
Browse files Browse the repository at this point in the history
V2.7.0
  • Loading branch information
k1o0 authored Mar 25, 2024
2 parents 22f2972 + 960e7a9 commit dc41a27
Show file tree
Hide file tree
Showing 23 changed files with 638 additions and 189 deletions.
37 changes: 37 additions & 0 deletions .github/workflows/python-publish.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Reference for this action:
# https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
name: Publish to PyPI

on:
push:
tags:
- 'v*'

permissions:
contents: read

jobs:
deploy:
name: Build and publish Python distributions to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel
- name: Build package
run: python setup.py sdist bdist_wheel

- name: Publish package
# GitHub recommends pinning 3rd party actions to a commit SHA.
uses: pypa/gh-action-pypi-publish@37f50c210e3d2f9450da2cd423303d6a14a6e29f
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
28 changes: 25 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,34 @@
# Changelog
## [Latest](https://github.com/int-brain-lab/ONE/commits/main) [2.6.0]
## [Latest](https://github.com/int-brain-lab/ONE/commits/main) [2.7.0]
This version of ONE adds support for Alyx 2.0.0 and pandas 3.0.0 with dataset QC filters. This version no longer supports 'data' search filter.

### Added

- support for Alyx v2.0.0
- support for pandas v3.0.0
- one.alf.spec.QC enumeration
- ONE_HTTP_DL_THREADS environment variable allows user to specify maximum number of threads to use
- github workflow for releasing to PyPi

### Modified

- support 'qc' category field in dataset cache table
- One.search supports ´dataset_qc_lte` filter
- One.list_datasets supports ´dataset_qc_lte` and `ignore_qc_not_set` filters
- one.alf.io.iter_sessions pattern arg to make more performant

### Removed

- One.search no longer supports 'data' filter: kwarg must be 'dataset'

## [2.6.0]

### Modified
- `one.load_dataset`

- One.load_dataset
- add an option to skip computing hash for existing files when loading datasets `check_hash=False`
- check filesize before computing hash for performance


## [2.5.5]

### Modified
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![Coverage Status](https://coveralls.io/repos/github/int-brain-lab/ONE/badge.svg?branch=main)](https://coveralls.io/github/int-brain-lab/ONE?branch=main)
![CI workflow](https://github.com/int-brain-lab/ONE/actions/workflows/main.yaml/badge.svg?branch=main)

The Open Neurophysiology Environment is a scheme for sharing neurophysiology data in a standardized manner. It is a Python API for searching and loading ONE-standardized data, stored either on a users local machine or on a remote server.
The Open Neurophysiology Environment is a scheme for sharing neurophysiology data in a standardized manner. It is a Python API for searching and loading ONE-standardized data, stored either on a user's local machine or on a remote server.

Please [Click here](https://int-brain-lab.github.io/ONE/) for the main documentation page. For a quick primer on the file naming convention we use, [click here](https://github.com/int-brain-lab/ONE/blob/main/docs/Open_Neurophysiology_Environment_Filename_Convention.pdf).

Expand Down
17 changes: 17 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,20 @@ or provided a different tag (see [this question](#how-do-i-download-the-datasets
Second, there are minor differences between the default/local modes and remote mode. Namely that in remote mode
queries are generally case-insensitive. See the 'gotcha' section of
'[Searching with ONE](notebooks/one_search/one_search.html#Gotchas)' for more information.

## How do I load datasets that pass quality control
You can first filter sessions by those that the supplied datasets with QC level WARNING or less:

```python
one = ONE()
# In local and auto mode
eids = one.search(dataset=['trials', 'spikes'], dataset_qc_lte='WARNING')
# In remote mode
eids = one.search(datasets=['trials.table.pqt', 'spikes.times.npy'], dataset_qc_lte='WARNING')
```

You can then load the datasets with list_datasets and load_datasets:
```python
dsets = one.list_datasets(eid, qc='WARNING', ignore_qc_not_set=True)
data, info = one.load_datasets(eid, dsets)
```
18 changes: 18 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,21 @@ python ./make-script.py -d

The HTML files are placed in `docs/_build/html/`.

# Contributing to code

Always branch off branch `main` before commiting changes, then push to remote and open a PR into `main`.
A developer will then approve the PR and release.

## Releasing (developers only)

Note that in order to trigger a pypi release the tag must begin with 'v', e.g. `v2.8.0`.

```shell
git checkout -b release/X.X.X origin/<branch>
git checkout origin/main
git merge release/X.X.X
git tag vX.X.X
git push origin --tags
git push origin
git branch -d release/X.X.X
```
15 changes: 11 additions & 4 deletions docs/notebooks/one_list/one_list.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,13 @@
"collections = one.list_collections(eid, filename='*spikes*')\n",
"\n",
"# All datasets with 'raw' in the name:\n",
"datasets = one.list_datasets(eid, '*raw*')\n"
"datasets = one.list_datasets(eid, '*raw*')\n",
"\n",
"# All datasets with a QC value less than or equal to 'WARNING' (i.e. includes 'PASS', 'NOT_SET' also):\n",
"datasets = one.list_datasets(eid, qc='WARNING')\n",
"\n",
"# All QC'd datasets with a value less than or equal to 'WARNING' (i.e. 'WARNING' or 'PASS'):\n",
"datasets = one.list_datasets(eid, qc='WARNING', ignore_qc_not_set=True)"
],
"metadata": {
"collapsed": false,
Expand Down Expand Up @@ -384,7 +390,8 @@
"source": [
"## Combining with load methods\n",
"The list methods are useful in combination with the load methods. For example, the output of\n",
"the `list_datasets` method can be a direct input of the `load_datasets` method:"
"the `list_datasets` method can be a direct input of the `load_datasets` method. Here we load all\n",
"spike and cluster datasets where the QC is either PASS or NOT_SET:"
],
"metadata": {
"collapsed": false
Expand All @@ -403,7 +410,7 @@
}
],
"source": [
"datasets = one.list_datasets(eid, ['*spikes*', '*clusters*'])\n",
"datasets = one.list_datasets(eid, ['*spikes*', '*clusters*'], qc='PASS', ignore_qc_not_set=False)\n",
"data, records = one.load_datasets(eid, datasets)"
],
"metadata": {
Expand Down Expand Up @@ -537,4 +544,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
18 changes: 11 additions & 7 deletions docs/notebooks/one_search/one_search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -573,15 +573,19 @@
"As mentioned above, different search terms perform differently. Below are the search terms and their\n",
"approximate SQL equivalents:\n",
"\n",
"| Term | Lookup |\n",
"|--------------|-----------|\n",
"| dataset | LIKE AND |\n",
"| number | EXACT |\n",
"| date_range | BETWEEN |\n",
"| subject, etc.| LIKE OR |\n",
"| Term | Lookup |\n",
"|-----------------|----------|\n",
"| dataset | LIKE AND |\n",
"| dataset_qc_lte | <= |\n",
"| number | EXACT |\n",
"| date_range | BETWEEN |\n",
"| subject, etc. | LIKE OR |\n",
"\n",
"Combinations of terms form a logical AND, for example `one.search(subject=['foo', 'bar'], project='baz')`\n",
"searches for sessions where the subject name contains foo OR bar, AND the project contains baz.\n",
"NB: When `dataset_qc_lte` which is provided with `dataset(s)`, sessions are returned where ALL matching datasets\n",
"have a less than or equal QC value. When `dataset_qc_lte` is provided alone, sessions are returned where\n",
"ANY of the datasets have a less than or equal QC value.\n",
"\n",
"#### Difference between remote mode search terms\n",
"Many search terms perform differently between auto/local mode and [remote mode](../one_modes.html),\n",
Expand All @@ -591,7 +595,7 @@
"In remote mode there are three ways to search for datasets:\n",
"\n",
"* **dataset** - a partial, case-insensitive match of a single dataset (multiple datasets not supported).\n",
"* **datasets** - an exact, case-sensitive match of one or more datasets. All datasets must be present.\n",
"* **datasets** - an exact, case-sensitive match of one or more datasets. All datasets must be present. If `dataset_qc` provided, this criterion applies only to these datasets.\n",
"* **dataset_type** - an exact, case-sensitive match of one or more [dataset types](../datasets_and_types.html#Dataset-types). All dataset types must be present.\n",
"\n",
"#### Regex systems between modes\n",
Expand Down
2 changes: 1 addition & 1 deletion one/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""The Open Neurophysiology Environment (ONE) API."""
__version__ = '2.6.0'
__version__ = '2.7.0'
41 changes: 22 additions & 19 deletions one/alf/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from one.alf.io import iter_sessions, iter_datasets
from one.alf.files import session_path_parts, get_alf_path
from one.converters import session_record2path
from one.util import QC_TYPE

__all__ = ['make_parquet_db', 'remove_missing_datasets', 'DATASETS_COLUMNS', 'SESSIONS_COLUMNS']
_logger = logging.getLogger(__name__)
Expand All @@ -40,12 +41,12 @@

SESSIONS_COLUMNS = (
'id', # int64
'lab',
'subject',
'lab', # str
'subject', # str
'date', # datetime.date
'number', # int
'task_protocol',
'projects',
'task_protocol', # str
'projects', # str
)

DATASETS_COLUMNS = (
Expand All @@ -56,6 +57,7 @@
'file_size', # file size in bytes
'hash', # sha1/md5, computed in load function
'exists', # bool
'qc', # one.util.QC_TYPE
)


Expand All @@ -64,7 +66,7 @@
# -------------------------------------------------------------------------------------------------

def _ses_str_id(session_path):
"""Returns a str id from a session path in the form '(lab/)subject/date/number'"""
"""Returns a str id from a session path in the form '(lab/)subject/date/number'."""
return Path(*filter(None, session_path_parts(session_path, assert_valid=True))).as_posix()


Expand All @@ -91,7 +93,8 @@ def _get_dataset_info(full_ses_path, rel_dset_path, ses_eid=None, compute_hash=F
'rel_path': Path(rel_dset_path).as_posix(),
'file_size': file_size,
'hash': md5(full_dset_path) if compute_hash else None,
'exists': True
'exists': True,
'qc': 'NOT_SET'
}


Expand Down Expand Up @@ -140,7 +143,7 @@ def _metadata(origin):
Parameters
----------
origin : str, pathlib.Path
Path to full directory, or computer name / db name
Path to full directory, or computer name / db name.
"""
return {
'date_created': datetime.datetime.now().isoformat(sep=' ', timespec='minutes'),
Expand All @@ -150,17 +153,17 @@ def _metadata(origin):

def _make_sessions_df(root_dir) -> pd.DataFrame:
"""
Given a root directory, recursively finds all sessions and returns a sessions DataFrame
Given a root directory, recursively finds all sessions and returns a sessions DataFrame.
Parameters
----------
root_dir : str, pathlib.Path
The folder to look for sessions
The folder to look for sessions.
Returns
-------
pandas.DataFrame
A pandas DataFrame of session info
A pandas DataFrame of session info.
"""
rows = []
for full_path in iter_sessions(root_dir):
Expand All @@ -176,21 +179,21 @@ def _make_sessions_df(root_dir) -> pd.DataFrame:

def _make_datasets_df(root_dir, hash_files=False) -> pd.DataFrame:
"""
Given a root directory, recursively finds all datasets and returns a datasets DataFrame
Given a root directory, recursively finds all datasets and returns a datasets DataFrame.
Parameters
----------
root_dir : str, pathlib.Path
The folder to look for sessions
The folder to look for sessions.
hash_files : bool
If True, an MD5 is computed for each file and stored in the 'hash' column
If True, an MD5 is computed for each file and stored in the 'hash' column.
Returns
-------
pandas.DataFrame
A pandas DataFrame of dataset info
A pandas DataFrame of dataset info.
"""
df = pd.DataFrame([], columns=DATASETS_COLUMNS)
df = pd.DataFrame([], columns=DATASETS_COLUMNS).astype({'qc': QC_TYPE})
# Go through sessions and append datasets
for session_path in iter_sessions(root_dir):
rows = []
Expand All @@ -200,7 +203,7 @@ def _make_datasets_df(root_dir, hash_files=False) -> pd.DataFrame:
rows.append(file_info)
df = pd.concat((df, pd.DataFrame(rows, columns=DATASETS_COLUMNS)),
ignore_index=True, verify_integrity=True)
return df
return df.astype({'qc': QC_TYPE})


def make_parquet_db(root_dir, out_dir=None, hash_ids=True, hash_files=False, lab=None):
Expand All @@ -216,7 +219,7 @@ def make_parquet_db(root_dir, out_dir=None, hash_ids=True, hash_files=False, lab
root directory.
hash_ids : bool
If True, experiment and dataset IDs will be UUIDs generated from the system and relative
paths (required for use with ONE API)
paths (required for use with ONE API).
hash_files : bool
If True, an MD5 hash is computed for each dataset and stored in the datasets table.
This will substantially increase cache generation time.
Expand All @@ -227,9 +230,9 @@ def make_parquet_db(root_dir, out_dir=None, hash_ids=True, hash_files=False, lab
Returns
-------
pathlib.Path
The full path of the saved sessions parquet table
The full path of the saved sessions parquet table.
pathlib.Path
The full path of the saved datasets parquet table
The full path of the saved datasets parquet table.
"""
root_dir = Path(root_dir).resolve()

Expand Down
Loading

0 comments on commit dc41a27

Please sign in to comment.