Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow recompute via _make_file func #1093

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from
Draft

Conversation

CBroz1
Copy link
Member

@CBroz1 CBroz1 commented Sep 6, 2024

Description

This PR adds the ability to recompute when fetching a non-existent file from v1.SpikeSortingRecording, adding some infrastructure to support expanding this tool in the future. To facilitate this, I add ...

  • A standard of a _make_file func that other tables can implement in the future.
  • NWB file specific and more general directory hashing tools
  • Tests for round-trip file deletion, then calling fetch_nwb

I have tested these hashers for...

  • All files in v1.SpikeSortingRecording
  • All directories in v0.SpikeSortingRecording
  • A subset of files in Raw (ongoing, so far 50%)

Checklist:

  • No. This PR should be accompanied by a release: (yes/no/unsure)
  • N/a. If release, I have updated the CITATION.cff
  • Yes. This PR makes edits to table definitions: (yes/no)
  • Yes. If table edits, I have included an alter snippet for release notes.
  • N/a. If this PR makes changes to position, I ran the relevant tests locally.
  • Yes. I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

Question

I found at least one file/directory where hashes mismatched across

  • (a) the existing version
  • (b) regenerating with this branch

What should we do about these? Do we need to restrict our deletions to only items replicable with an up-to-date Spyglass? Should I start building the infrastructure to compile this list? What do we want to do with files that don't replicate? Do we need to then do some experimentation with dependency versions? This might point us to adding conda env dumps to some part of this process

@CBroz1
Copy link
Member Author

CBroz1 commented Sep 16, 2024

I ran in to some issues here, being unable to replicate the DJ-stored file hash with seemingly matched contents. Need to do more testing on whether or not memory pointers are factored into the hash. If so, we may need to store a hash of the object computed by the specific table, and adjust the DJ-stored file hash on recreation

@edeno edeno added the enhancement New feature or request label Sep 25, 2024
@edeno edeno self-requested a review February 20, 2025 17:12
Copy link
Collaborator

@edeno edeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall structure looks pretty good. For comparing arrays, remind me why something like np.allclose wouldn't work well?

raise FileNotFoundError(
f"Found {len(query)} files for: {analysis_nwb_file_name}"
)
return f"{analysis_dir}/" + query.fetch1("filepath")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably be better to make this less file system dependent by using pathlib

)

external_tbl = schema.external["analysis"]
file_path = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another pathlib place

@@ -317,6 +317,8 @@ def make(self, key):
recording = self._get_filtered_recording(key)
recording_name = self._get_recording_name(key)

# recording_dir = Path("/home/cbroz/wrk/temp_ssr0/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove before merge

@@ -0,0 +1,257 @@
"""This schema is used to transition manage files for recompute.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module could have a more descriptive name. recompute.py perhaps?

self.file.close()

def remove_version(self, key: str) -> bool:
version_pattern = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a bit about these regex's failing. Versioning is hopefully pretty stable.

@edeno
Copy link
Collaborator

edeno commented Feb 24, 2025

Just adding a reminder here to warn the lab before this gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants