-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not the proper method to specify required number of nodes for OLCF #710
Comments
Hi Danila, we are aware of that issue. The next SAGA release (which is inn preparation) will come with a significant change in that context: instead of using different code paths for certain configurations and systems, we'll begin to support machine specific configuration files which should address exactly the issue as referred to in this ticket. |
Hi Andre, That's sounds great! I already started preparation for deployment of Harvester instance for Summit (for ATLAS production reasons) and will be nice to have a similar software stack like for Titan. |
Danila, do you have a time frame on when you will need this? |
As always, yesterday :-) Will be good to have it ASAP - we need to migrate to Summit, and for the moment, it's one of the critical issues. |
hehe - why am I asking ;-) The release which adds support for configuration files goes out next weekend. I hope we can release our summit a week or two later, and specifically the ability t configure CPN per host (summit should then work out of the box though). |
Hi Andre, OLCF support deployed LSF utilities/clients to DTN38 recently, that is quite excited. I am going to continue the deployment and configuration process for a production version of Harvester for ATLAS on Summit. So - the question about the readiness of LSF adaptor and example of configuration became hotter. I am already playing with version 0.60.0 |
Hi Danila, late response, but the LSF adaptor is by now ready for Summit, and we use it there. It will be released in first week of July. |
Great! Will test it as only new release will be available. Cheers, |
We should allow to specify number of nodes directly. |
https://github.com/radical-cybertools/saga-python/blob/f460528a10f0e748e6a1d252be20789ff87228ac/src/saga/adaptors/lsfsummit/lsfjobsummit.py#L207
Hi,
OLCF supports cross-platform job scheduling from the different facilities (DTN cluster, RHEA etc) for Titan and will provide the same support for Summit very soon (in next few days). So, it will be beneficial to identify the number of cores per node based on hostname (at least for Summit). Probably will be better to have 42 cores by default, and only for summitdev use 20.
The text was updated successfully, but these errors were encountered: