-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect when files are being staged #5905
Comments
This would also help me solve a problem in nf-prov described here I would propose just adding this method: void onFileStage(Path destination, Path source) |
you can determine if it's stage file checking the file path scheme != work dir scheme |
I think the problem is that when a foreign file is added to the task inputs, the FileHolder does not include the original path: nextflow/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy Lines 1941 to 1944 in 47a0cc1
I think if we changed it to |
Precisely ^^ |
Not the inputs files != staged files, therefore I'm not understanding why you want to capture only the latter for provenance purpose? |
I see -- If I understand correctly, you think it would make more sense to notify whenever a file is used as input? |
For my part, the problem is that right now I can only track the staged file, when I really want to track the original remote file for provenance |
New feature
It'd be great to be able to get notified (using the TraceObserver) when files are being staged.
Use case
I'm currently building a provenance plugin
nf-lamin
for logging workflow executions in LaminDB.It's easy to detect when files are being published (i.e. the outputs of the workflow) using the TraceObserver. However it's not so easy to detect when files are staged (i.e. the inputs of the workflow).
I notice that the nf-prov plugin uses
nextflow.prov.util.ProvHelper.getWorkflowInputs()
to detect workflow inputs. However, this only allows me to observer the paths after the files have already been staged. That is, I only get to seework/stage-2478a5a8-2313-49c9-8cfe-92ef6483859b/91/ec7b0d3c79c84f8f7e16e07d823a7e/samplesheet-2-0.csv
while I actually needhttps://github.com/nf-core/test-datasets/raw/scrnaseq/samplesheet-2-0.csv
.Suggested implementation
We could modify the TraceObserver. Note that I would add
onFileStage
andonFileStaging
so we could know whether a file is yet to be staged, or whether it has been staged.Alternatively, an enum could be added to reduce the number of different functions.
Happy to send a PR if it helps! :)
The text was updated successfully, but these errors were encountered: