Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about dask operator logic #932

Open
CarterFendley opened this issue Feb 7, 2025 · 1 comment
Open

Question about dask operator logic #932

CarterFendley opened this issue Feb 7, 2025 · 1 comment
Labels

Comments

@CarterFendley
Copy link

Wanted to ask a few questions about this PR that was changing the Dask Kubernetes settings / configuration.

Firstly, I may be confused about the async addition for _is_service_available and what that is doing. I see that it is used in two functions (get_external_address_for_scheduler_service and get_internal_address_for_scheduler_service) but both of these functions were async already. So if used from an async context, I would assume that they would already be non-blocking. Especially since _is_service_available is called with await I am confused what this modification does. I am not a heavy python async user so please explain like I am 5 years old.

Secondly, a friend I have had some concerns about periodSeconds being set to 1 , specifically he was under the impression that timeoutSeconds=300 would cause K8 to create 300 concurrent requests if the health point in question is not responding. I am also relatively new to the liveness probe behavior, is it your understanding that these a probe request will only be created after a probe fails, or that probes may be concurrent?

cc: @jacobtomlinson

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Feb 7, 2025

When you have an event loop running there may be many different tasks running concurrently. Every time you call await it allow the loop to move onto another task while some background IO is happening. In the prevous version of this code we called socket.getaddrinfo(host, port) which is blocking, this means the event loop has to sit and wait for the network stack to respond with the address and blocks the whole loop. By switching to an async implementation of this the loop can move on and process other tasks while asyncio.get_event_loop().getaddrinfo(host, port) is waiting on IO.

Generally when writing async code in Python you want to use await as much as possible to give the loop space to breathe. You also never want to make synchronous IO calls like reading a file, pulling data over a network, making requests to subprocesses or communicating with the linux kernel. Synchronous IO calls block the whole event loop.

The async functionality in Python is called cooperative multitasking because the tasks need to behave correctly. If I start 10 concurrent tasks on an event loop if one of them calls time.sleep(10) instead of await asyncio.sleep(10) then the whole event loop will lock up for 10 seconds. Tasks must call await to cooperatively hand off control of the loop to the next task.

With the probes my understanding is that only one probe request is made at a time. The period is just the time interval between the last probe and the next one. So if it probes and it takes 30 seconds to respond it wont probe an additional 30 times during that wait. However I totally understand your perspective on this and it would be great to find some confirmation of this in the Kubernetes documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants