Question about dask operator logic #932

CarterFendley · 2025-02-07T03:11:54Z

Wanted to ask a few questions about this PR that was changing the Dask Kubernetes settings / configuration.

Firstly, I may be confused about the async addition for _is_service_available and what that is doing. I see that it is used in two functions (get_external_address_for_scheduler_service and get_internal_address_for_scheduler_service) but both of these functions were async already. So if used from an async context, I would assume that they would already be non-blocking. Especially since _is_service_available is called with await I am confused what this modification does. I am not a heavy python async user so please explain like I am 5 years old.

Secondly, a friend I have had some concerns about periodSeconds being set to 1 , specifically he was under the impression that timeoutSeconds=300 would cause K8 to create 300 concurrent requests if the health point in question is not responding. I am also relatively new to the liveness probe behavior, is it your understanding that these a probe request will only be created after a probe fails, or that probes may be concurrent?

cc: @jacobtomlinson

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2025-02-07T09:13:43Z

When you have an event loop running there may be many different tasks running concurrently. Every time you call await it allow the loop to move onto another task while some background IO is happening. In the prevous version of this code we called socket.getaddrinfo(host, port) which is blocking, this means the event loop has to sit and wait for the network stack to respond with the address and blocks the whole loop. By switching to an async implementation of this the loop can move on and process other tasks while asyncio.get_event_loop().getaddrinfo(host, port) is waiting on IO.

Generally when writing async code in Python you want to use await as much as possible to give the loop space to breathe. You also never want to make synchronous IO calls like reading a file, pulling data over a network, making requests to subprocesses or communicating with the linux kernel. Synchronous IO calls block the whole event loop.

The async functionality in Python is called cooperative multitasking because the tasks need to behave correctly. If I start 10 concurrent tasks on an event loop if one of them calls time.sleep(10) instead of await asyncio.sleep(10) then the whole event loop will lock up for 10 seconds. Tasks must call await to cooperatively hand off control of the loop to the next task.

With the probes my understanding is that only one probe request is made at a time. The period is just the time interval between the last probe and the next one. So if it probes and it takes 30 seconds to respond it wont probe an additional 30 times during that wait. However I totally understand your perspective on this and it would be great to find some confirmation of this in the Kubernetes documentation.

jacobtomlinson added the question label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dask operator logic #932

Question about dask operator logic #932

CarterFendley commented Feb 7, 2025

jacobtomlinson commented Feb 7, 2025 •

edited

Loading

Question about dask operator logic #932

Question about dask operator logic #932

Comments

CarterFendley commented Feb 7, 2025

jacobtomlinson commented Feb 7, 2025 • edited Loading

jacobtomlinson commented Feb 7, 2025 •

edited

Loading