-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New option to store gbw file instead of fetching it by default #72
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## develop #72 +/- ##
===========================================
- Coverage 57.33% 57.24% -0.10%
===========================================
Files 15 15
Lines 1589 1595 +6
===========================================
+ Hits 911 913 +2
- Misses 678 682 +4
|
@yakutovicha @mbercx would you mind taking a look at this PR? I am trying to figure out if the design here makes sense, and whether it is something close to what other AiiDA plugins (aiida-gaussian, aiida-quantumespresso) are doing to handle large wavefunction files. Thanks! |
Just off the top of my head: have you looked into stashing? This is what I use for e.g. the charge density in Quantum ESPRESSO (not in the AiiDAlab QEapp, but just my work in general). |
@mbercx thanks! I saw stashing in the docs, but it seemed a bit too complicated and I was lazy to look into it deeper. :-D |
It's fairly simple to use, and comes with every calculation job through the options. There is only one calcjob_inputs['metadata']['options']['stash'] = {
'source_list': ['restart_file1', 'restart_file2'],
'target_base': '/path/to/stashing/folder',
} You can stash anywhere you want, i.e. sometimes I use it to store charge densities in project storage for later use. If you just need them temporarily for restarts though, you can also just take a default target_base = Path(calcjob_code.computer.get_workdir(), 'stash').as_posix() However, this only really makes sense in case you still want to clean the other files that you don't restart from, or these files somehow interfere with the running of ORCA. Else you could just also restart from the
Regarding re-using the In this case I'm adding it as a spec.input('parent_folder_epw', required=False, valid_type=(orm.RemoteData, orm.RemoteStashFolderData),
help='folder that contains all files required to restart an `EpwCalculation`') (You could also add the There is a slight difference in syntax to get the remote path (i.e. if isinstance(parent_folder_epw, orm.RemoteStashFolderData):
epw_path = Path(parent_folder_epw.target_basepath)
else:
epw_path = Path(parent_folder_epw.get_remote_path()) You can then add its path + contents to the
❗️Warning: be careful when doing multiple restarts from the same |
Hi @mbercx. Thank you for a really detailed writeup ❤️ , that clears things up quite a bit. One thing that needs care in the case of ORCA is the fact that we cannot use the exact same filename for the restart file, otherwise the ORCA calculation crashes (I guess ORCA overwrites the wavefunction file at the beginning before it loads the WF guess from it). So we need to rename it or put it in a subfolder. That needs to be done anyway since as you point out, we probably want to use symlinks so that we do not copy these GBW files around, and we need to make sure that ORCA does not modify them. I am keeping this open for now. |
I had a look at the discussion, and I agree with @mbercx that stashing is probably the right thing here. I avoided supporting pulling the wavefunctions in all plugins I maintain. It escalates memory use. Somewhat related comment. |
ORCA stores molecular wavefunction in a binary GBW file, which can then be used to restart a calculation or used as a wavefunction guess. Currently, in
OrcaCalculation
we always fetch this file as part of the retrieved folder. However, this is not ideal since these files can get very big and it is not possible to delete them once the workflow finishes since they become part of the AiiDA provenance.In this PR, we instead introduce a new keyword
store_gbw
as part of input parameters. This is by default false. If the user sets it to true, the gbw file is attached to the calcjob outputs asSinglefileData
node. This has the advantage that it can then be easily passed into subsequent workflows.Closes #68