-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a choice between delta from the first call versus previous call in variorum_get_energy_json
#575
Comments
@tpatki How does |
Hi @rountree That's a great question, and it will vary by the underlying architecture. See details below. (I am hoping these notes will also help @dbo understand the challenges at our end and why supporting this will take some time.)
Let me know if I answered that in enough detail, happy to have a meeting next week to discuss. |
So yes, I'd prefer to have the general case be sampling occasionally unless the vendor documentation we have makes rollover a once-per-decade thing. But I'm not implementing this, so it's just a preference. |
Thanks @rountree. Given that we have limited resources for Variorum at the moment, at least for the first cut at this, I am going to lean toward telling the user that we are passing along data that the vendor libraries are providing us (ESMI, RSMI, NVML, etc) and trusting that these vendor APIs take care of rollovers. On some architectures (e.g. all GPUs), the low-level registers are not accessible at all, and we have no choice but to trust that APIs such as Intel CPUs are the only exception to this situation, where we read directly from the Looking at our port, I realized that we are already taking care of wraparounds for these registers in the Intel port when we calculate deltas, as we need to do this for reporting power on these systems too. Take a look here. My understanding is that we will be reporting the correct values for energy with the current Intel port if we chose to do deltas (no sampling will be needed if I am understanding the code correctly, but I haven't refreshed my memory on this port enough yet). I will have to test this explicitly when I start working on this PR. I believe @slabasan has tested these wraparounds before, she may be able to comment as well. TLDR: Let's try to get a first cut at this while trusting the vendor APIs (and our Intel port). Let's document this well and explain to the users this decision. And let's leave an issue open to test for rollovers on each architecture, so we can fix these if we run into them or if any users run into them. |
@tpatki Sounds good. |
Update the API to support nested calls in general, especially in Caliper-like tools. This might be useful for the Kokkos-update as well.
Merge #559 and #563 first, and then add a new flag to the API.
Current suggestion:
variorum_get_energy_json(char** s)
will be updated tovariorum_get_energy_json(char** s, bool prev_delta)
.Setting
prev_delta
totrue
will return the accumulated energy since theprevious
call to thevariorum_get_energy_json
function from the application/tool's context. Setting this tofalse
will return the accumulated energy since thefirst
call to thevariorum_get_energy_json
.I will work on an initial WIP PR as soon as I can, hoping to get this merged in by end of August. Happy to take any feedback and suggestions on this.
The text was updated successfully, but these errors were encountered: