-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: incorrect stats #1852
Comments
@james-boydell, so CPU usage seems to be correctly reported 92+100+85+100=377 (the diff is likely due to measurement timing). The memory usage reported by Please note that there can still be discrepancies with htop. htop's reporting may be more accurate but it's when See also: |
To sum up, after the fix
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt It needs to be seen if there are any downsides of using RSS+CACHE(+SWAP) instead if memory.usage_in_bytes besides being different from Kubernetes and Docker. |
Hey @r4victor , thanks for looking into this! For memory stats, I think it's important to report the same way the OOM killer would see it. I think most people will be watching if memory reaches the limit and the container gets killed. This will be important as you work towards issue #1780 and multiple jobs/runs are are on the same node (important for ssh/on prem fleet). As for CPU, I don't think reporting the sum of all CPU core percentages makes sense as a single metic. If I see more than 100%, I think something is wrong. I'm unsure how you're pulling CPU metrics and I'm more familiar with kubernetes, but reporting the percentage of the CPU limit and/or request would be more useful, or average out the percentage of all cores. |
Steps to reproduce
compare htop to dstack stats and the values are incorrect
Actual behaviour
No response
Expected behaviour
No response
dstack version
0.18.18
Server logs
No response
Additional information
The text was updated successfully, but these errors were encountered: