Implementing EPW file management (renaming/moving) #2597
Replies: 3 comments 13 replies
-
Thanks for your post. I'm out of town through Jan 3rd but will aim to reply shortly after I return. File handling is tricky... but looks like you have given this quite a bit of thought! |
Beta Was this translation helpful? Give feedback.
-
@hironori-kondo Just to make sure, is this the Python script in question? |
Beta Was this translation helpful? Give feedback.
-
@hironori-kondo: Thank you for the detailed writeup. I have a question that may help me provide better feedback. Is there an issue with having the post-processing step be its own job? This job would be very short and simple since it is just calling I am imagining something like:
|
Beta Was this translation helpful? Give feedback.
-
Happy New Year! I'm returning to the task of implementing EPW in
quacc
, and I was hoping to consult you on how to implement the file management, @Andrew-S-Rosen & @tomdemeyere.The EPW code interfaces with Espresso outputs, but the files are expected to follow a slightly different organization/naming scheme. Two previous jobs are required: an nscf job and a phonon job. After running the phonon job, we run a post-processing Python script (which ships with EPW). This script performs a couple of checks on how the phonon job was run, then copies/renames the necessary files into a new directory that the EPW job reads.
I've read through some of the
quacc
code, and my thought is to modifyquacc.utils.files.copy_decompress_files()
. The current implementation accepts a single destination directory (fed in as the tmp folder), to which the filenames are appended. I propose the following modifications toquacc
's file management behavior:output_filenames
tocopy_decompress_files()
. The default behavior would be unchanged, using the existingfilenames
argument for the output. Shouldoutput_filenames
be specified, however, this behavior would be overridden, resulting in changed file paths.output_filenames
toquacc.runners.prep.calc_setup()
output_filenames
toquacc.runners._base.BaseRunner.setup()
output_filenames
toquacc.runners.ase.Runner.__init__()
output_filenames
toquacc.recipes.espresso._base.run_and_summarize()
If I haven't missed anything, the above should enable
output_filenames
to be optionally specified for any given Espresso job, enabling file movement/renaming during the copying step. The EPW file management would then look like the following:copy_files
argument.quacc.calculators.espresso.utils.prepare_copy_files()
. Let's call this outputupdated_copy_files
.dict
of destination filenames for the above files. Let's call this outputoutput_filenames
.run_and_summarize()
withupdated_copy_files
andoutput_filenames
as arguments.This approach is a tad convoluted because the destination directory used by
copy_decompress_files()
(i.e., the tmp directory for the job) is not created untilquacc.runners.prep.calc_setup()
is called during runner initialization. As such, the desired behavior has to be inserted more deeply, somewhere between the directory's creation and the calculator call. My breakdown of some pros/cons of this approach:Pros
copy_decompress_files()
has a neat argument structure: source directory, source filenames, destination directory, destination filenames.Cons
How does the above sound? Do you have any wisdom/suggestions/preferences?
Beta Was this translation helpful? Give feedback.
All reactions