-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the use of temporary files in MET and reduce it as much as reasonably possible #2690
Comments
@rgbullock the My thought is that I'd read the input config file line-by-line. For each line, recursively interpret any environment variables present, and then append the result to the end of the The obvious downside is that we'd be storing the entire config file in memory at once. But fortunately, these ASCII config files are relatively small. As the original author of this library, do you think this approach is worth pursing? Any alternative approaches to recommend? |
I've been having a similar issue with job slowness on Jet, where ever since mid-August, METplus GridStat jobs which should finish in 20-40 minutes started timing out after four hours. Jet is much busier than usual during hurricane season due to HFIP, which puts a high load on the lustre file system, but I was still surprised that everything was taking this long. Moving the METplus tmp output to /tmp allows jobs to finish in about an hour. It's not fully back to where it was, but it's a vast improvement. |
…into a stringstream stream rather than reading/writing/deleting temp files each time a config file or config string is read.
The changes for eliminating temp files from the |
As discussed with @jprestop on 9/25, recommend adding a new chapter to the Contributor's Guide between chapters 2 and 3 to describe the logic employed in MET. Specifically, add a sub-section describing our use of temporary files. Update the description of tmp_dir to link to that new sub-section. |
…into a stringstream stream rather than reading/writing/deleting temp files each time a config file or config string is read.
This is fixed by PR #2693 and at least one related issue to review temp file use in stat_analysis has been created. But this task to document our current use of temp files in the Contributor's Guide is now complete. |
Describe the Enhancement
This issue arose via the dtcenter/METplus#2364 discussion. @johnlwagner has encountered severe issues running Stat-Analysis jobs on WCOSS2. @dkokron was able to trace the performance issue back to MET's writing of temporary files. When he reconfigured with
tmp_dir = "/tmp";
, a large Stat-Analysis job completed in minutes rather than hanging for several hours. He found that MET was writing/reading/deleting thousands of tiny temp ascii files. Writing many of these tiny temp files to a project space across the lustre file system used by WCOSS2 is incredibly slow. Whereas writing to the local/tmp
directory relieves this pinch point.However, in 2021, NCO directed NOAA staff to explicitly not write to the local
/tmp
directory on the compute nodes to keep the output for each job well organized and prevent an accumulation of stale data in the local/tmp
directory.For this issue, do the following:
MET_TMP_DIR
andget_tmp_dir()
.Pay close attention to the MetConfig::read_string(const char * s) function. As described in this GitHub Discussion comment, avoiding writing a temp file there may go a long way to addressing this issue.
@johnlwagner notes that since this has to do with the processing of threshold strings, it may well likely be related to the perplexing behavior described in this dtcenter/METplus#1506 discussion.
Time Estimate
2 days
Sub-Issues
Consider breaking the enhancement down into sub-issues.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
Define the source of funding and account keys here or state NONE.
Define the Metadata
Assignee
Labels
Milestone and Projects
Define Related Issue(s)
Consider the impact to the other METplus components.
No direct impacts right now. However, we have been discussing some related functionality in METplus to enhance its logic for cleaning up any temp files written by failed MET jobs.
vx_config
library #2691 remove temp files fromvx_config
library.stat_analysis
.Enhancement Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Development issue
Select: Milestone as the next official version
Select: MET-X.Y.Z Development project for development toward the next official release
The text was updated successfully, but these errors were encountered: