Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RemoteDandiset.has_data_standard() convenience function #958

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
27 changes: 17 additions & 10 deletions dandi/dandiapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@

T = TypeVar("T")

DATA_STANDARD_MAP = dict(NWB="RRID:SCR_015242")


class AssetType(Enum):
"""
Expand Down Expand Up @@ -1071,19 +1073,24 @@ def iter_upload_raw_asset(
self, metadata=asset_metadata, jobs=jobs, replacing=replace_asset
)

def is_nwb(self) -> bool:
def has_data_standard(self, data_standard: str) -> bool:
"""
Returns True if the Dandiset contains one or more NWB file assets.

This is determined by checking for "RRID:SCR_015242" in the "dataStandard" field
of the assetsSummary of the dandiset.
Returns True if the Dandiset contains one or more files of the indicated
standard. Otherwise, returns False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should document the accepted data standards (or, at the very least, the RRID input format).

"""
assetsSummary = self.get_raw_metadata()["assetsSummary"]
if "dataStandard" not in assetsSummary:
if data_standard in DATA_STANDARD_MAP:
rrid = DATA_STANDARD_MAP[data_standard]
elif data_standard.startswith("RRID:"):
rrid = data_standard
else:
raise ValueError(
f"'data_standard' must be an RRID (of form 'RRID:XXX_NNNNNNN`) or one "
bendichter marked this conversation as resolved.
Show resolved Hide resolved
f"of the following values: {DATA_STANDARD_MAP.keys()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict.keys() doesn't stringify in a human-friendly way. Use ", ".join(DATA_STANDARD_MAP.keys()) instead.

bendichter marked this conversation as resolved.
Show resolved Hide resolved
)
assets_summary = self.get_raw_metadata()["assetsSummary"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, assetsSummary seems to be non optional, so must be present in a valid metadata record, but if not we would get KeyError here... which could be good as to alert user that smth is wrong with metadata in that dandisets. But should it be an exception really or just a warning? I would lean to a be a warning and just return False. WDYT @jwodder and @satra ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder and @bendichter WDYT - better be strict or robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with changing this to throw a warning if assetsSummary is missing

bendichter marked this conversation as resolved.
Show resolved Hide resolved
if "dataStandard" not in assets_summary:
return False
return any(
x["identifier"] == "RRID:SCR_015242" for x in assetsSummary["dataStandard"]
)
return any(x["identifier"] == rrid for x in assets_summary["dataStandard"])


class BaseRemoteAsset(ABC, APIBase):
Expand Down