Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to stop accepting new tasks on existing threads #42709

Closed
wants to merge 9 commits into from
Closed

Feature to stop accepting new tasks on existing threads #42709

wants to merge 9 commits into from

Conversation

janrous-rai
Copy link

This feature allows operators to stop scheduling of new tasks on specific threads which will make those threads high priority and only handle either explicitly scheduled tasks or tasks that are already running on those high priority threads.

This feature can be used to guarantee responsiveness on certain critical low-latency tasks by isolating them from the bulk of the julia workload that will run on other threads. We have observed that this eliminates most of the scheduling issues and slowdowns due to tasks occupying threads for excessive amounts of time (this is particularly problematic with sticky tasks in 1.6 where single long-running task that gets scheduled on the same thread as critical periodic task can cause major delays).

That said, low-latency is still subject to occasional hiccup as a result of stop-the-world behaviors that happen during garbage collection and compilation.

When running in multi-threaded mode, avoid picking up tasks
from shared heaps when you're thread 1 (tid == 0).
When single-threaded, enq_work should place the work in
the thread tid=1 workqueue rather than randomly trying
to avoid first thread.
Threads will need to call jl_stop_accepting_new_tasks_on_this_thread
method in order to stop taking new work from the shared queues.

This should be then used by our own code to make it so on the specific
thread that runs rai-server and schedules all the background tasks.
@tkf
Copy link
Member

tkf commented Oct 19, 2021

It'd be nice to have a more compositional API. If a library function internally uses @spawn, calling it on a "non-accepting" worker thread would just make the worker thread idle.

Also, this API cannot be used in a library since there's no way to know that other libraries are calling this function. So, I think it makes more sense to allocate a dedicated thread pool on startup so that the latency-critical part of the application can use it. It'll remove the cost of this feature when not used.

IMHO, I think API like this needs to address the problem of priority inversion as I mentioned in #41586 (comment)

As an aside, I think it's straightforward to avoid a lock if we don't allow making the primary thread to be non-accepting. Also, it seems to me that the current implementation has a data race. Since the flag is read outside the lock, you'd need to use atomic store/load on the flag, even if it's protected by the lock.

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Oct 20, 2021
@vtjnash
Copy link
Member

vtjnash commented Oct 21, 2021

I like the general direction, though need to review in more detail later. I think you can add this flag to the ptls state struct in julia_threads.h.

@janrous-rai
Copy link
Author

It'd be nice to have a more compositional API. If a library function internally uses @spawn, calling it on a "non-accepting" worker thread would just make the worker thread idle.

Also, this API cannot be used in a library since there's no way to know that other libraries are calling this function. So, I think it makes more sense to allocate a dedicated thread pool on startup so that the latency-critical part of the application can use it. It'll remove the cost of this feature when not used.

Yeah, this is basically intended for setup-once situations where operators of a long-running server will mark certain threads as "high priority" and schedule the latency-sensitive work on them. It should not really be used by libraries because that could cause all sorts of problems.

This is somewhat analogous to having two thread-pools, but less integrated with the built-in scheduler.

We do actually need this kind of behavior and have been running our server with this patch for quite some time now and it did help with a lot of responsiveness issues we've been experiencing prior to this. Ultimately, in a complex codebase where some of the code is user-supplied or generated makes proper cooperation between tasks infeasible (not to mention that there don't seem to be any tools for identifying tasks that are not cooperative, making debugging of these problems intractable).

We might achieve this behavior with threadpools but it seems that the conversation in the PR that implements that feature is hitting some ideological design disagreements so I'm not sure what will come out of it.

I think that the architecturally cleanest solution here would be to have an API that would allow us to spawn a task in a dedicated thread that will only exist while the task is running and will be shut down when the task completes. This way we could spin off lightweight threads running critical work and ensure that they won't be blocked by bulk workloads (those not latency sensitive but potentially large in volume and potentially not yielding). This does imply that we might oversubscribe the physical machine, but this is an acceptable tradeoff and we may still deal with that by scaling down the regular thread count for the bulk workloads.

I have not really thought much about what it would take to implement this thread-dedicated-for-a-task and whether it would integrate cleanly with the rest of the system. In particular, it seems like dynamically adding and removing threads into the system may be quite a challenge.

IMHO, I think API like this needs to address the problem of priority inversion as I mentioned in #41586 (comment)

I'm not sure whether the assumption that all transitive tasks need to stay on the dedicated thread. Our critical tasks are generally pretty simple and they don't do much concurrency so we are probably assuming that this is best-effort and that if something calls @spawn, the low latency is no longer guaranteed.

As a counter-example, we run http server on the high priority thread, ensuring that we can always accept incoming http connections. For long-running user requests, we call @spawn exactly in order to move the bulk processing off of the high priority thread and over to the regular threads.

As an aside, I think it's straightforward to avoid a lock if we don't allow making the primary thread to be non-accepting. Also, it seems to me that the current implementation has a data race. Since the flag is read outside the lock, you'd need to use atomic store/load on the flag, even if it's protected by the lock.

@KristofferC KristofferC force-pushed the backports-release-1.6 branch 2 times, most recently from 1a719fb to 94d6407 Compare November 12, 2021 08:49
@KristofferC KristofferC deleted the branch JuliaLang:release-1.6 November 13, 2021 12:09
@vtjnash
Copy link
Member

vtjnash commented Nov 13, 2021

Why the close?

@KristofferC
Copy link
Member

KristofferC commented Nov 13, 2021

This PR is against the backports-release-1.6 branch which was merged and auto-deleted causing this PR to automatically close.

@vtjnash
Copy link
Member

vtjnash commented Nov 13, 2021

Ah, it needs to be re-created against master then, since it cannot be reopened.

@tkf tkf reopened this Nov 13, 2021
@tkf tkf changed the base branch from backports-release-1.6 to master November 13, 2021 22:16
@tkf
Copy link
Member

tkf commented Nov 13, 2021

I just created a dummy backports-release-1.6, reopened this PR, changed the PR base, and then removed backports-release-1.6.

@NHDaly
Copy link
Member

NHDaly commented Jan 2, 2022

Thanks for managing that, @tkf!

@tkf
Copy link
Member

tkf commented Jan 4, 2022

(I just realized that my hack made it impossible to review the patch. The diff is easy to see at https://github.com/JuliaLang/julia/compare/1b93d53fc4bb59350ada898038ed4de2994cce33..e5cce258ea722884cb14d018213cef60da48b79c)

some ideological design disagreements

This is a pretty good way to summarize my complaints :)

But, from the practical point of view, I think this PR might be better than #42302 as it's very (and surprisingly) minimal. Maybe that's why Jameson liked the approach.

an API that would allow us to spawn a task in a dedicated thread that will only exist while the task is running and will be shut down when the task completes. This way we could spin off lightweight threads running critical work and ensure that they won't be blocked by bulk workloads

Booting up Julia's runtime environment (ptls) for a thread does not seem like a "lightweight" thing to do (at least intuitively). I'm also not entirely sure if it's easy to tear it down in the middle of the process lifetime. I think it would be "easier" to reuse the thread pool as I just sketched in #43649.

I'm not sure whether the assumption that all transitive tasks need to stay on the dedicated thread. Our critical tasks are generally pretty simple and they don't do much concurrency so we are probably assuming that this is best-effort and that if something calls @spawn, the low latency is no longer guaranteed.

It's easy to set up a Channel{Task} where a consumer in a non-dedicated worker thread schedules the tasks on the non-dedicated worker threads. On the other hand, if we want to receive results from the non-dedicated threads and then do the work based on the result as soon as possible, we'd need to spawn a "waiter" task on the dedicated thread. I prefer this approach also because it let us stick with the structured concurrency paradigm.

To put it in another way, I think we better stick with the "black box rule" (i.e., it's an implementation detail if a function spawns tasks or not) to ensure composability, even with the tasks on dedicated worker threads. If every function call is a potential cause of priority inversion, I think it'd be very hard to use library functions.

@NHDaly NHDaly changed the base branch from master to release-1.6 February 3, 2022 16:24
@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2022

This still needs to be rebased against master, and the flag moved into ptls

@NHDaly
Copy link
Member

NHDaly commented Feb 18, 2022

I don't think we have any intention of merging this PR as-is; i think it was opened mainly for discussion purposes, and as an FYI that we are currently applying this patch in our julia builds in production.

As long as the problem is addressed in some way (e.g. by the ongoing thread pools work), i don't think we'd need this PR anymore.

I'm going to mark it DO-NOT-MERGE for clarity.

@NHDaly NHDaly added the DO NOT MERGE Do not merge this PR! label Feb 18, 2022
@vtjnash
Copy link
Member

vtjnash commented Feb 22, 2022

Threadpools is a slight generalization of this idea (allowing multiple threads to be prevented from examining the queue), but I think this is still something we can do in the meantime, and likely will be able to continue to support long-term. Other than finishing some nits, I think we can merge (and backport)

@tkf
Copy link
Member

tkf commented Feb 23, 2022

Threadpools is a slight generalization of this idea (allowing multiple threads to be prevented from examining the queue), but I think this is still something we can do in the meantime, and likely will be able to continue to support long-term.

Hmm... Isn't #42302 close to ready? (I should probably look at the implementation a bit more, though.) I think expanding API surface of the scheduler makes it hard to extend and enable a more flexible approach like #44064. If we are planning to merge #42302, I think it's better to only add #42302 and not in combination with this PR. Or, can we at least put it in Experimental, so that it can be eventually removed once #42302 is in?

@DilumAluthge DilumAluthge removed the DO NOT MERGE Do not merge this PR! label Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants