-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum Runtime? #64
Comments
We do not have any parameter that directly control how much time has passed. Could it be that some other limitation of your system is reached (cpu time or disk space)? |
Okay - I'm not missing an obvious namelist option then. Disk space we're fine for, and the (relative) point at which the simulation stops is exactly the same each time I've tested this (and I've used more cpu time for other simulations). I don't think it is anything external to EMEP, especially as there's no error messages thrown when it stops. This is all I get in the log file: Are there any namelist debug flags would you recommend I switch on, to see if I can get more diagnostics for this? |
It is not easy to test when this is happening so far out in the run. Maybe @avaldebe has some ideas? |
Sure thing - I'll try compiling with traceback flags, and will let you know if this throws up anything (and will work around the limitation for the moment). |
Hi @douglowe Please take into account that the debugging flags will make the code run considerably slower. If you problem is CPU time, you should crash earlier on the simulation. |
(that is why I did not include the -O0 flag ! That should still leave the traceback) |
sure thing - I'll take the change in compute time when I'm checking the results. |
Did you find out what went wrong? |
Not yet, sorry - I've been busy with other projects since raising this issue. I should have time in October to investigate, and will let you know if I find anything. |
Did you find the problem? |
I didn't, sorry. Decided in the end to run EMEP in 2 month chunks (makes more sense operationally anyway, as we can then parallise the work more). |
I used to do something similar on an older version of the model. The data assimilation modules were not as well tuned as they are on our current development versions and a whole year run did not fit on the HPC queue. At the time, I had to take care to run the chunks sequentially and create a restart file for the next chunk. Otherwise PM could be too low at the begging of each chunk. |
I just saw this. Very strange problem that I have never seen or heard of before. @mvieno often runs WRF+EMEP, I think for a year at a time? |
Yes, I routinely run EMEP-WRF for a year. This for any domains, form global to regional. But I use a single EMEP_OUT for the full year. The only time I had a similar issue in my case was related to the NetCDF compiled without the large file feature switched on. |
Ahh - I had not checked the NetCDF library compilation settings. I'll have a look at these, see if that might have been an issue. |
Revisiting this question - I'm now finding that my EMEP simulations for one domain fail after ~50 days. Previously I would run this domain for 2 months (+ 7 days spin-up). However, I changed the WRF output files I use to drive EMEP to be hourly, not 3 hourly, and now I always hit the memory limit (~192Gb) for the HPC node I'm running on. Looking back at the memory usage for my previous runs driven by 3-hourly data I can see that the memory usage only gets up to ~90Gb at the end of the 2 months. And polling of the memory usage during a simulation shows that it is increasing as I go through the simulation. I'm running EMEP release 3.44, with some local fixes for reading emission sectors (https://github.com/UoMResearchIT/emep-ctm/tree/source_UoM_CSF3_gfortran). We are reading a lot of emission data (using the UK NAEI dataset), so perhaps this is a contributing factor to the size of the memory usage (and maybe why you've not encountered this problem before)? Do you have any suggestions on how I might solve this problem? Is the best answer to use restart files and just run 1 month at a time? |
Yes, this is a problem we also noticed. In early releases we opened and closed the NetCDF file for each variable; that was very slow. So we kept the file open until all variables are read. That was fast, but on some systems that caused a huge amount of memory being used when many variables (~1000) were read. |
Thanks for the suggestion - that sounds like a good solution to me. I'll have a look at the code in the latest release, and will backport this solution to my working copy. |
I'm trying to run EMEP for a whole year - however I'm finding that my simulations silently fail after exactly 150 days (or 3600 hours), regardless of what my start date is. I can't find any obvious namelist option to control this, could you tell me if there is any way to change this behaviour, or should I run the model individually for each month instead, to avoid this problem?
The text was updated successfully, but these errors were encountered: