Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to save experiment directory (EXPDIR) #2994

Open
1 task
KateFriedman-NOAA opened this issue Oct 8, 2024 · 7 comments · May be fixed by #3105
Open
1 task

Ability to save experiment directory (EXPDIR) #2994

KateFriedman-NOAA opened this issue Oct 8, 2024 · 7 comments · May be fixed by #3105
Assignees
Labels
feature New feature or request

Comments

@KateFriedman-NOAA
Copy link
Member

What new functionality do you need?

A switch or similar setting to allow users to save their EXPDIR (e.g. to HPSS).

What are the requirements for the new functionality?

That the contents of the EXPDIR are saved/archived.

Acceptance Criteria

  • EXPDIR saved/archived to HPSS

Suggest a solution (optional)

No response

@KateFriedman-NOAA KateFriedman-NOAA added the feature New feature or request label Oct 8, 2024
@DavidHuber-NOAA DavidHuber-NOAA self-assigned this Nov 4, 2024
@DavidHuber-NOAA
Copy link
Contributor

I'm wondering how often this should be saved since it isn't uncommon for the EXPDIR contents to be modified during an experiment. Should this be once per experiment, once per cycle, or perhaps on the first and last cycle?

@KateFriedman-NOAA
Copy link
Member Author

My thought is at least the first cycle so we can capture the configs at the start. The last cycle would also be good to do, especially to capture the final state of the db/xml and configs. In between that, maybe every 00z so we can capture config changes and save the db/xml at certain points in case they are needed? Definitely first and last though.

@DavidHuber-NOAA
Copy link
Contributor

Alright, sounds good. They won't be large tarballs (less than 1MB), so I will aim to store them every 00z during gdas or gefs archiving.

@KateFriedman-NOAA
Copy link
Member Author

Are you thinking to archive the entire EXPDIR as is or have a list of files to archive? I ask because sometimes users will have more in their EXPDIR than just the configs, db, xml, and other files that get generated by the workflow. For example, they may have code or a clone of something that they are using in the experiment. Do we want to include that too if it's in the EXPDIR?

@DavidHuber-NOAA
Copy link
Contributor

Hmm, good question. I had considered just the XML, database, configs, and possibly the logs. I could see the desire to add other things in there, but if they have a copy of the global workflow in the EXPDIR, that would be an extremely long-running htar command and I'm not sure how the symlinks would be handled. I think we should limit it to just a limited set of files.

That said, we could add a function to get the hashes and diffs of the HOMEgfs global workflow clone and all submodules then add that to a text file to be archived with the EXPDIR.

@KateFriedman-NOAA
Copy link
Member Author

I think we should limit it to just a limited set of files.

Fully agree.

That said, we could add a function to get the hashes and diffs of the HOMEgfs global workflow clone and all submodules then add that to a text file to be archived with the EXPDIR.

Oooooo I like that.! That would be very handy information to archive.

@AndrewEichmann-NOAA
Copy link
Contributor

Hashes and diffs would be wonderful, plus the EXPDIR as it exists (to pick up modifications).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants