-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PathLike wrapper/cache for ExternalStorage #186
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #186 +/- ##
==========================================
- Coverage 99.21% 99.15% -0.06%
==========================================
Files 36 38 +2
Lines 1907 2128 +221
==========================================
+ Hits 1892 2110 +218
- Misses 15 18 +3 ☔ View full report in Codecov by Sentry. |
gufe/storage/pseudodirectory.py
Outdated
read_only : bool | ||
write to prevent NEW files from being written within this shared | ||
directory. NOTE: This will not prevent overwrite of existing files, | ||
in scratch space, but it will prevent changed files from uploading | ||
to the external storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This read_only
is mainly to prevent the following mistake:
shared = root.get_other_shared(other_unit_prefix)
with open(shared / "oldfile.txt") as f:
... # good so far!
with open(shared / "newfile.txt", mode='wb') as f:
... # uh-oh -- now we're writing into the wrong unit's space!
However, it won't stop open(shared / "oldfile.txt", mode='wb')
. You can still overwrite that data. If that data is in a separate permanent storage, it will be fine -- only overwrite locally. (If you set shared
up such that the directory it uses is identical to the internal holding cache here, of course, it would overwrite that.)
…o shared-object-v2
gufe/storage/pseudodirectory.py
Outdated
def __del__(self): | ||
# take everything in self.shared_dir and write to it shared; keeping | ||
# our prefix | ||
self.transfer_holding_to_external() | ||
if self.delete_holding: | ||
shutil.rmtree(self.shared_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc __del__
isn't called on del obj
but instead when garbage collection happens. Maybe instead make transfer_holding_to_external
always explicit and don't rely on garbage collection related things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me ... it's an executor-level thing anyway, and I don't mind asking the executor writer to think about how files get moved around (don't want to ask the protocol writer to think about that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe if you want this behaviour, use a context manager, so instead of __del__
use __exit__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This started out as a context manager. I switched it away because it makes actual usage kind of weird -- the entirety of the execution block where this is used gets indented by one level, which I don't particularly like. It gets worse if you have one of these for shared
and one for permanent
-- there's a syntax for setting multiple context managers at once, but I've also always found that kind of weird.
But I can switch it back to a context manager if you want 🤷♂️
Still in progress, but here's a quick update to what is included in this PR now. Before I dive any further into writing unit tests, take a quick look, @richardjgowers (at least at the contents of this comment). Main idea: We have "staging" directories that are local. From a protocol author's perspective, I've expanded this PR to include some tools for handling the entirety of the scratch / shared / permanent storage lifecycle. There are two context managers: one for the unit level that provides the Here's an example from a protocol author's viewpoint (kind of faking our real units, but this gives the rough picture): gufe/gufe/tests/storage/test_storagemanager.py Lines 27 to 53 in ed5e83c
For the executor, the basic picture is something like this: manager = StorageManager(scratch, shared_external, permanent_external)
with manager.running_dag(dag_label) as dag_ctx:
for unit in ordered_dag_units:
with dag_ctx.running_unit(unit.key) as (scratch, shared, permanent):
unit_result = unit.run(scratch, shared, permanent) # or however we run the unit
# when this context ends, scratch will be deleted, shared external storage will be
# updated, and shared holding will be deleted
# when this context ends, permanent external storage will be updated, and permanent holding
# will be deleted For much more detail, see this lifecycle test: gufe/gufe/tests/storage/test_storagemanager.py Lines 59 to 131 in ed5e83c
Next tests will be to check the special cases that |
From power hour: with shared.other_shared("unit1") as prev_shared:
with open(prev_shared / "bar.txt", mode='r') as f:
bar = f.read() Instead of having Instead, making |
Makes much more sense as _safe_to_delete_holding
docs/guide/storage.rst
Outdated
protocol authors. In detail, this provides protocol authors with | ||
``PathLike`` objects for ``scratch``, ``shared``, and ``permanent``. All | ||
three of these objects actually point to special subdirectories of the | ||
scratch space for a specific unit, but are managed by context manangers at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scratch space for a specific unit, but are managed by context manangers at | |
local execution space for a specific unit, but are managed by context managers at |
…o shared-object-v2
docs/guide/storage.rst
Outdated
:class:`.DAGContextManager` to create a context to run a unit. That context | ||
creates a :class:`.SharedStaging` and a :class:`.PermanentStaging` | ||
associated with the specific unit. Those staging directories, with the | ||
scratch directory, are provided to the :class:`.ProtocolDAGUnit`, so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scratch directory, are provided to the :class:`.ProtocolDAGUnit`, so that | |
scratch directory, are provided to the :class:`.ProtocolUnit`, so that |
Hello @dwhswenson! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-01-22 17:35:13 UTC |
…o shared-object-v2
also, some docs cleanup/updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is partial, I'm still going to go through the tests and try breaking this locally myself
network-scale analysis. Anything stored here will be usable after the | ||
calculation has completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"after the calculation" could equally apply to a Unit
. Maybe instead "Anything stored here will be retrievable after the Protocol estimation has completed"
In an abstract sense, as used by protocol developers, these three levels | ||
correspond to three lifetimes of data: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another last chance to rename things, what if we instead just named the storage lifetimes against the thing they're scoped to. So scratch
-> unit_storage
, shared
-> dag_storage
and permanent
-> campaign_storage
. The idea being it's easier to remember their scope.
the end of the :class:`.ProtocolDAG`, but is not guaranteed to exist after | ||
the :class:`.ProtocolDAG` terminates. | ||
* ``permanent``: This is the data that will be needed beyond the scope of a | ||
single rough estimate of the calculation. This could include anything that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I like "single rough estimate" here, a single estimate might be perfectly fine. Maybe instead, "this is the data that will be available for post-simulation analysis beyond the scope of a single DAG"
|
||
The ``scratch`` area is always a local directory. However, ``shared`` and | ||
``permanent`` can be external (remote) resources, using the | ||
:class:`.ExternalResource` API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mention that whilst these might not be a local directory, they will still act like one?
As a practical matter, the GUFE storage system can be handled with a | ||
:class:`.StorageManager`. This automates some aspects of the transfer | ||
between stages of the GUFE storage system, and simplifies the API for | ||
protocol authors. In detail, this provides protocol authors with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use "developers" instead of "protocol authors"? I'm thinking developer is more common, someone might get confused thinking "protocol author" is the agent which is submitting protocols or something daft
"""Transfer a given file from staging into external storage | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I've followed this, it returns the held_file back if it was transferred, but otherwise None? Can store_path
ever (knowingly) fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I've followed this, it returns the held_file back if it was transferred, but otherwise None?
Yes. The real point here is that transfer_staging_to_external
, which tries to transfer everything, returns the list of files that have been transferred. That was needed because it turned out that the successfully transferred files are needed by the StorageManager
if keep_shared is False
. I might be able to move that responsibility into a delete_transferred
method on StagingRegistry
, though.
Can
store_path
ever (knowingly) fail?
I'm not sure what you mean here by "knowingly." If store_path
does not successfully store the path, it should raise an exception. The only way it can communicate failure is via exception, and if it fails silently, that's a bug.
gufe/storage/stagingregistry.py
Outdated
""" | ||
other = self._get_other_shared(prefix, delete_staging) | ||
yield other | ||
other.cleanup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead put a __enter__
/__exit__
etc on SharedStaging
? This feels strange to have some of the class' intended behaviour defined in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So other_shared
here would just return be return SharedStaging(...)
gufe/storage/storagemanager.py
Outdated
self.shared_staging = SharedStaging( | ||
scratch=self.scratch_root, | ||
external=self.shared_root, | ||
staging=self.staging, | ||
delete_empty_dirs=delete_empty_dirs, | ||
prefix="" # TODO: remove prefix | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is shared scoped to the lifetime of the StorageManager
? Could you not create this inside the scope of the running_dag
method?
from .stagingregistry import StagingPath # typing | ||
|
||
|
||
class StorageManager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doccing this with some usage examples would be handy
This required a pretty significant rewrite of the code.
Mainly matters for the StorageManager
…o shared-object-v2
Make a function that returns a pathlib.Path be the main thing, instead of the property that returns a string. Almost all usage of `.fspath` was to then wrap it in a pathlib.Path. This is more convenient for users.
This is to avoid likely footguns related to using os.path.join. Includes test of error on os.path.join.
network, are not recommended for ``shared``. | ||
|
||
|
||
Details: Manangement of the Storage Lifetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: management
This is a start to solving #120. We've identified the following goals:
shared
(as well aspermanent
), which we already have viaExternalStorage
, instead of it needing to be a directory on the same host.pathlib.Path
, which will (we hope) minimize frustration with the transition for users.So far, this PR doesn't integrate that into any of the execution process. That can be either part of this PR or a future one.
The basic approach here involves 2 classes:
SharedRoot
: represents the root directory for a given unit. By default, this will exist at$SCRATCH/.holding/$UNIT_LABEL
. This is where most of the actual logic is. It contains references to all the paths that have been registered with it. On deletion (or on demand) this transfers all paths registered with it to theExternalStorage
.SharedPath
: represents any path withinSharedRoot
. Contains information on the actual file path on the system, as well as the label used as the key for the external key-value store. Creation is typically fromroot / "some_path"
orsome_path / "subpath"
. ASharedPath
can be used like any otherPathLike
inopen
, and can be converted viapathlib.Path(shared_path)
to get all the functionality of pathlib.As I'm opening this, code is still in progress. But at least the tests should give a picture of usage; please take a look!
Update for some final to-do items.
prefix
; I believe it is no longer useddag_label
needs to be givendelete_empty_dirs
tokeep_empty_dirs
(consistent with otherkeep_*
)__fspath__
getting called)StagingDirectory
=>StagingRegistry
StagingRegistry
objects non-FileLike (potential footgun onos.path.join(context.shared, "filename.txt")
).