-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job.restart copies OUTPUT of parent job?? #1575
Comments
I cannot immediately say whether it's a feature or not. It appears indeed somewhat improper to me that the output is also copied, but I presume it's not an easy fix, because inputs and outputs are not unified, and if we base On the other hand, I don't really know why we shouldn't overwrite the data, so I opened a PR with a fix here |
If we are able to remove stuff from hdf5 by modifying 'output' (does that work?), we could do this at any level: GenericJob/GenericAtomisticJob/GenericDFT/Sphinx ; at least the atomistic job knows that all output is in 'output'. If modifying hdf5 is hard - maybe not copy it at all upon restart for atomistic/dft/sphinx jobs? What's the rationale behind it? It might mean that one moves the hdf5 copying upon restart to those job classes that rely on it. |
I seem to recall there's a 'input_only' flag somewhere, but I'd need to hunt for it. |
OK, @skatnagallu Shyam ran into the same problem, so I did some further digging into details. (The depth of the object hierarchy is a pain) Allow me to find this entire copying procedure entangled and weird. And I am not sure in which of the many classes it should be fixed. The 'output' of a job created from scratch is a job-type-specific, non-empty output class created somewhere in the A possible solution would be to create a Comments/thoughts are welcome. |
Conclusion from discussion over lunch/in pyiron Q&A: As a short term solution, this can be fixed at the level of the sphinx job as follows
not sure about @skatnagallu will take care of it. |
I had a brief look at this just now. A good plan to implement the above is probably modifying and overloading the _after_generic_copy_to method. Add here also a parameter |
@pmrv: I politely disagree. The suggested strategy of today's meeting is to make NO changes to pyiron_base at this stage, because it would interfere with @jan-janssen 's plans to clean up. Thus, we fix the sphinx-specific issue inside the sphinx classes as a temporary solution, and leave the generic solution to Jan. Jan said he has the copy without output issue on his radar now. If I understood @jan-janssen (current) plans correctly, the future strategy would be to convert the parent pyiron object to a plain dictionary, which can be manipulated to implement the The current mess comes from the fact that there are two competing representations in the current copying, namely the hdf file and the object in memory, and for SPHInX, the wrong item was manipulated. pyiron_base cannot tell which of the two representations is relevant (for VASP output, for instance, it is the hdf file). Augmenting the parameter list of the post-copy hook will add more mess to it, while Jan's solution probably is more straightforward. Passing around parameters across function calls is a strong sign of a design flaw. Shyam was told to make a very BOLD comment about his SPHInX changes being a temporarary solution until the "copy-issue" is fixed. |
Ah, I just missed that another plan was communicated to him in the pyiron meeting already. All good then. |
@pmrv Yes this was partially discussed in the pyiron FAQ meeting and on the way to lunch between Christoph and me. Still I agree that this implicit communication is hard to follow. @skatnagallu Could you give a quick summary of your changes and our future strategy in the next pyiron meeting? You could simply add it to the agenda https://github.com/orgs/pyiron/discussions/243 . This way we make sure we have it at least communicated to the pyiron developers. |
I ran into a nasty problem with "job.restart" for SPHInX, but I suspect the root issue in pyiron_base .
I used job.restart to create a follow-up job of a successful calculation (changing the charge of the system). Then I used new_job.get_electronic_structure () to get the eigenvalues - and they were identical to the PREVIOUS calculation, even though the raw output of SPHInX shows different numbers.
After quite a bit of debugging, I figured out what happened: apparently, the entire OUTPUT of the parent job is copied to the restart. Then (as a bug or feature?), the sphinx eps.dat parser only puts the newly parsed values into output.generic.dft if there aren't any already (this was introduced by @samwaseda last November). Thus, the eigenvalues get never updated to the new ones.
You may argue in many ways why I am stupid, or why I shouldn't use restart for this, or that the collect_eps_dat should update. But for me the core question is this: Is there any good reason to copy the output of the parent in the first place? If the restarted job does not populate the same fields in the output as its parent (for whatever reason, and I could give you reasonable ones), the restarted output will contain junk from its parent.
You can quickly reproduce this behavior: just call a restart from any job and inspect its output before running the job. Common sense says that a job that has not run cannot have any output. pyiron behaves differently.
The text was updated successfully, but these errors were encountered: