-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance Degradation] Unable to reproduce the example #130
Comments
Hi @jackyk02, First, in case it's not clear, the work scales with the number of threads. Running with 20 threads does 20 times more work than running with one thread. So the "ideal" case will take the same amount of time with 20 threads as it does with one thread. You are not seeing the ideal case because the i9-13950HX has 8 "performance" 1 cores and 16 "efficiency" cores. The efficiency cores run slower than the performance cores and the way the mini benchmark is set up, it's only as fast as the slowest thread. Finally, your CPU will run a single core at a higher frequency than if multiple CPU cores are active due to thermal throttling. If you want to see nice linear scaling, you'll need to run only on the 8 performance cores and disable Intel's Turbo Boost as described at the bottom of the README. Footnotes
|
Thanks a lot for providing the detailed explanation :) |
Additionally, I've encountered a runtime error involving Numpy while executing an OpenAI Gym example using multi-threading in Python. The issue arises specifically when threading is combined with the Gym environment, leading to a non-reentrant call in Numpy. Steps to Reproduce
Error Message:
Environment This issue seems to be related to the re-entrancy of Numpy's float printing code, as suggested by the error message. It would be great if you could also offer insights regarding the problem. Once again, thank you for your help! |
Hi Sam, I was trying to reproduce the example:
However, I got a completely different outcome. On a 32-core Intel i9-13950HX, a single thread required 0.537 seconds, while 20 threads needed 1.445 seconds.
@colesbury, I'm truly grateful for your contribution to the no GIL support in Python. It would be great to hear from you!
The text was updated successfully, but these errors were encountered: