-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix grcp hangs and upgrade grpcio to 1.62.2 #909
base: feature/pdeathsigger
Are you sure you want to change the base?
Conversation
2faadf2
to
a267901
Compare
a1806e9
to
a411db4
Compare
cb02d82
to
6588841
Compare
@slicklash please ping @d3dave and me when it's ready for re-review? |
Sure. Introduction of |
c643b91
to
664141c
Compare
664141c
to
2fb9611
Compare
tests/test_perf.py
Outdated
@@ -144,6 +145,7 @@ def test_perf_comm_change( | |||
I'm not sure it can be done, i.e is this info even kept anywhere). | |||
""" | |||
with system_profiler as profiler: | |||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the sleeps necessary? Is there a way to make the test deterministic instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are very flaky. The conditions for them to pass are:
- build container
- run native app
- app must change
comm
before 1st profiling
To make this deterministic we need signaling from the app that it actually changed comm
. It will add more complexity. Code base has very low commit frequency and is not worth the trouble.
8a1c2e2
to
dd79714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a separate PR for preexec_fn and pdeathsigger change as it's independent of grpcio version we are using. Let's keep the CRI client change along with the grpcio version PR.
b561e25
to
562fa82
Compare
moved to #931 |
@slicklash I'm unsure what this PR solves now. gProfiler already creates only a single Inside the single |
Tests show the problem with implementation of |
I see, so this fixes the |
gprofiler/containers_client.py
Outdated
global _containers_client | ||
try: | ||
self._containers_client: Optional[ContainersClient] = ContainersClient() | ||
logger.info(f"Discovered container runtimes: {self._containers_client.get_runtimes()}") | ||
if _containers_client is None: | ||
_containers_client = ContainersClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extract this to a method get_containers_client
which will be locked (to avoid any concurrency issues) and maintain the cache.
Then, ContainerNamesClient
runs self._containers_client = get_containers_client()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Problem
grcp
hangs (broken threading control) when multiple CRI clients are trying to query newly created container with (thd.cc:160 pthread_create failed: Invalid argument
source )Solution
use single CRI client