-
Notifications
You must be signed in to change notification settings - Fork 80
Major slowdowns with Gtk + Threads #503
Comments
Interestingly, I see this even on Julia 1.2, so it's not a consequence of the overhaul in threading for Julia 1.3. However, Julia 1.0.5 seems immune. |
How is the multithreading in That Gtk.jl is resource hungry on the first CPU is another problem. If I get it right, we try to integrate the Gtk mainloop with the Julia task machinery. And that integration seems to be suboptimal. I once made a post on this in the discourse dev forum. |
Here is that discourse topic, where I wanted to start this discussion: https://discourse.julialang.org/t/repl-and-mainloop/35699 |
Sorry we didn't get any discussion in that discourse post. I haven't looked into this deeply myself. I can answer how the multithreading in imfilter is implemented: it uses "old school threads," see here. There doesn't seem to be a way to exclude a particular thread in |
No problem, I am not in the position of solving this problem either. My feeling is that we are scratching real world problems that should actually be discussed and solved in Julia base. I see two somewhat independent problem:
Whats you impression, what is the best forum to discuss this? |
Sometimes fortune smiles on you (aka, Jeff's on the case): JuliaLang/julia#35552. |
The following might be the answer to this particular issue: https://tro3.github.io/ThreadPools.jl/build/index.html#ThreadPools.@bthreads |
I wasn't sure whether I should open a new thread or use this one, since I have the same problem and would like to give an answer to @tknopp's sugestion: I tested with the following MWE: using Gtk
using ThreadPools
main() = loop(Vector{Float64}(undef, 100_000))
function loop(arr)
for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
mainthreaded() = loopthreaded(Vector{Float64}(undef, 100_000))
function loopthreaded(arr)
ThreadPools.@bthreads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
println("===============")
@time main()
@time mainthreaded()
@time main()
@time mainthreaded() This results in the following output:
So unfortunately In JuliaLang/julia#35552, it is suggested that the following line in Gtk.jl/src/GLib/signals.jl could be the culprit: tmout_min::Cint = (uv_pollfd::_GPollFD).fd == -1 ? 10 : 5000 Changing the 5000 to 1 and voilà:
So the problem still persists. Interestingly enough, LoopVectorization.jl is not affected. |
Regarding my suggestion it was rather a hope from me that this solves the problem. We actually see similar problems and I really wish we could do something about it. My theory right now is the following. If you run the code serially (i.e. on the first thread) the gtk main loop will be stopped completely for the moment and will not affect the serial code. If you run those in parallel probably the main thread (doing Gtk work) will regularly invoke GC which is(?) still a serial operation and will stop the other threads? @tro3, @JeffBezanson, @vtjnash: Has anybody of you an idea if
|
The line regarding |
I'm not sure whether this is considered trivial, but I found a somewhat "dirty" solution which works for me at the moment, since I don't need the Gtk main loop when doing intense parallel computing. Here is a MWE: using Gtk
GLib = Gtk.GLib
# Terminate the Gtk main loop
function gtk_quit_main_loop()
if ccall((:g_main_depth, GLib.libglib), Cint, ()) > 0
Gtk.gtk_quit()
end
end
function loop!(arr::Vector{Float64})
for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function loopthreaded!(arr::Vector{Float64})
Threads.@threads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function create_window()
win = GtkWindow("My First Gtk.jl Program", 400, 200)
b = GtkButton("Click Me")
push!(win,b)
function on_button_clicked(w)
println("The button has been clicked")
end
signal_connect(on_button_clicked, b, "clicked")
# Now reinstantiate the Gtk main loop
Gtk.__init__()
showall(win)
# Wait with programm progression until the window is closed
c = Condition()
signal_connect(win, :destroy) do widget
notify(c)
end
wait(c)
# NOTE
# This function terminates the Gtk main loop, if one is active.
# Comment this out to see the massive slowdown due to the Gtk main loop
gtk_quit_main_loop()
end
# Program execution
# ==============================================================================
create_window() # First GUI call
arr = Vector{Float64}(undef, 100_000)
arrthreaded = Vector{Float64}(undef, 100_000)
# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))
# Time measurement
@time loop!(arr)
@time loopthreaded!(arrthreaded)
println("===============")
create_window() # Second GUI call I'm fully aware that this is as hacky as it gets, but at least for some use cases this should allow the use of Gtk and multithreading for performance in the same environment. Of course, this does nothing for you if you want to trigger a multithreaded calculation from a Gtk GUI :-( Anything potentially dangerous about this solution? |
I don't see anything dangerous about that. Interesting would be if it also would be possible to pause the main loop, in which case one could keep the UI open (it would just freeze during the calculation). |
I did a test where I quit the Gtk main loop while a GUI was open. This leads to a crash. Therefore, unfortunately, your suggestion doesn't work. Is there a way to "pause" the Gtk main loop (w/o quitting it)? |
One thing that left me puzzled was the fact that using Gtk
using Polyester
using BenchmarkTools
function loop!(arr::Vector{Float64})
for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function loopthreaded!(arr::Vector{Float64})
Threads.@threads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function looppolyester!(arr::Vector{Float64})
@batch per=core for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function create_window()
win = GtkWindow("My First Gtk.jl Program", 400, 200)
b = GtkButton("Click Me")
push!(win,b)
function on_button_clicked(w)
println("The button has been clicked")
end
signal_connect(on_button_clicked, b, "clicked")
# Now reinstantiate the Gtk main loop
showall(win)
# Wait with programm progression until the window is closed
c = Condition()
signal_connect(win, :destroy) do widget
notify(c)
end
wait(c)
end
# Program execution
# ==============================================================================
create_window() # First GUI call
arr = Vector{Float64}(undef, 100_000)
arrthreaded = copy(arr)
polthreaded = copy(arr)
# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))
# Time measurement
benchmark_serial = @benchmark loop!(arr)
benchmark_threads = @benchmark loopthreaded!(arrthreaded)
benchmark_polyester = @benchmark looppolyester!(polthreaded)
println("===============")
create_window() # Second GUI call Compared to my last example, you will notice that the Gtk main loop isn't deactivated anymore, which leads to the performance regression discussed before for the Julia threads: julia> benchmark_serial
BenchmarkTools.Trial: 1275 samples with 1 evaluation.
Range (min … max): 3.651 ms … 6.428 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.855 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.916 ms ± 223.838 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▁▁ ▃
▂▄▂▄▃█▃▃▃██▃▃▃█▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▁▂▁▂▁▂▂▂▁▁▂▂▁▂ ▃
3.65 ms Histogram: frequency by time 4.85 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> benchmark_threads
BenchmarkTools.Trial: 319 samples with 1 evaluation.
Range (min … max): 1.118 ms … 39.897 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 15.539 ms ┊ GC (median): 0.00%
Time (mean ± σ): 15.664 ms ± 5.143 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▂ █▃ ▁
▇▅▁▁▁▄█▁▁▁▁▁▁▁▁▁▁▇███▆▄▇▇▄▅▆██▆▁▇▇▅▄▆▄▆▇▅▇█▄▅▅▅▁▄▅▄▄▅▅▁▅▁▁▄ ▆
1.12 ms Histogram: log(frequency) by time 30.8 ms <
Memory estimate: 3.12 KiB, allocs estimate: 31. However, the loop created with the julia> benchmark_polyester
BenchmarkTools.Trial: 5136 samples with 1 evaluation.
Range (min … max): 735.900 μs … 1.464 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.003 ms ┊ GC (median): 0.00%
Time (mean ± σ): 968.403 μs ± 120.804 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▂ ▃ ▂ ▂
█▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▅▂▂▃▅▄▃▅█▇██▄▃███▅▄▃▃▄▃▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂ ▃
736 μs Histogram: frequency by time 1.22 ms <
Memory estimate: 0 bytes, allocs estimate: 0. @chriselrod: Any idea why this is the case? The good news are that this allows the combination of an active GUI and multi-threaded |
Polyester uses a very simple static schedule, so there's a lot less that can go wrong than with As you noted, the same applies to From the sound of things, the specific issue is that GTK tries to run a task on the main thread? |
I'm not sure that the Gtk task is actually run on the main thread, otherwise the One thing I'd like to test in the next days is to force the Gtk main loop task to the main thread via |
Great that you are working on this @StefanMathis. And yes, ideally we want a responsible UI but no performance degradations by the UI thread. Would be great if we reach that. Besides number crunching in my research group we have also a use case where the background task is gathering data from a device in real-time. There we also see better performance (-> latency) without UI, which we would like to resolve. |
I finally found some time to test a threaded loop with ThreadPools.jl while forcing the Gtk main loop to the primary thread. Below you find the code for a benchmark w/o the Gtk main loop active. You can clearly see that the normal threads perform the best, then the serial version, and then at last the version from ThreadPools.jl. The using Gtk
using ThreadPools
using BenchmarkTools
GLib = Gtk.GLib
# Terminate the Gtk main loop
function gtk_quit_main_loop()
if ccall((:g_main_depth, GLib.libglib), Cint, ()) > 0
Gtk.gtk_quit()
end
end
function loop!(arr::Vector{Float64})
for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function loop_t!(arr::Vector{Float64})
Threads.@threads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function loop_bt!(arr::Vector{Float64})
ThreadPools.@bthreads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
# Program execution
# ==============================================================================
# Start the Gtk main loop on the primary thread
gtk_quit_main_loop()
# @tspawnat 1 Gtk.gtk_main() # Comment this in to test with the Gtk main loop active
arr = Vector{Float64}(undef, 100_000)
arr_t = copy(arr)
arr_bt = copy(arr)
# Compilation
loop!(Vector{Float64}(undef, 2))
loop_t!(Vector{Float64}(undef, 2))
loop_bt!(Vector{Float64}(undef, 2))
GC.enable(false)
# Time measurement
benchmark_serial = @benchmark loop!(arr)
benchmark_t = @benchmark loop_t!(arr_t)
benchmark_bt = @benchmark loop_bt!(arr_bt)
GC.enable(true) julia> benchmark_t
BenchmarkTools.Trial: 2326 samples with 1 evaluation.
Range (min … max): 1.554 ms … 2.610 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.109 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.143 ms ± 143.424 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▂ ▁█▅▃▁ ▃▁ ▄
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▄▃▄▅▆▆▅▅██▇██████▅▄▅▅▄▄▅██▅█▇▆█▆▇▆▄▃ ▄
1.55 ms Histogram: frequency by time 2.41 ms <
Memory estimate: 2.11 KiB, allocs estimate: 21.
julia> benchmark_serial
BenchmarkTools.Trial: 1019 samples with 1 evaluation.
Range (min … max): 4.032 ms … 14.553 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.688 ms ┊ GC (median): 0.00%
Time (mean ± σ): 4.890 ms ± 723.126 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂▁▁▁
▃▃▃▆▄▅▅██████▆▆▅▅▄▄▃▃▃▄▃▃▃▃▃▃▃▂▃▃▂▂▃▃▂▂▃▁▂▂▂▂▂▂▂▁▂▂▂▂▂▁▁▁▁▂ ▃
4.03 ms Histogram: frequency by time 7.56 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> benchmark_bt
BenchmarkTools.Trial: 348 samples with 1 evaluation.
Range (min … max): 6.469 ms … 1.034 s ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.391 ms ┊ GC (median): 0.00%
Time (mean ± σ): 14.362 ms ± 55.042 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█
▂▂▄███▅▄▃▃▁▁▁▂▁▃▁▁▂▂▃▃▃▃▃▄▃▃▃▃▃▃▃▂▃▃▃▃▃▃▂▁▂▂▁▂▂▁▁▁▁▁▁▂▂▁▂▁▂ ▃
6.47 ms Histogram: frequency by time 26.2 ms <
Memory estimate: 9.92 MiB, allocs estimate: 100064. When spawning the Gtk main loop (i.e. inserting the line julia> benchmark_t
BenchmarkTools.Trial: 364 samples with 1 evaluation.
Range (min … max): 1.760 ms … 28.420 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 15.177 ms ┊ GC (median): 0.00%
Time (mean ± σ): 13.717 ms ± 5.107 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▁█▆▂ ▂
▅▃▄▅▅▅▅▄▂▁▁▁▁▁▁▁▁▁▂▁▁▂▅▃▆▇▆█▅▇▆█████▆▆█▅▃▅▄▁▁▁▁▂▁▂▂▃▁▁▂▂▁▁▃ ▃
1.76 ms Histogram: frequency by time 25.9 ms <
Memory estimate: 2.11 KiB, allocs estimate: 21.
julia> benchmark_serial
BenchmarkTools.Trial: 1048 samples with 1 evaluation.
Range (min … max): 4.005 ms … 9.243 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.578 ms ┊ GC (median): 0.00%
Time (mean ± σ): 4.757 ms ± 663.124 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▃▃▃▂▄█▇▅▃▁
▄█████████████▆▆▆▅▅▅▆▅▃▄▃▃▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▂▃▃▂▂▂▂▂▂▂▂▁▁▂▁▂▂ ▄
4.01 ms Histogram: frequency by time 7.45 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> benchmark_bt
BenchmarkTools.Trial: 328 samples with 1 evaluation.
Range (min … max): 11.393 ms … 22.695 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 15.633 ms ┊ GC (median): 0.00%
Time (mean ± σ): 15.268 ms ± 1.631 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃
▂▂▃▂▅▄▃▃▂▄▄▄▄▃▄▃▂▃▃▃▄▄▃▃▄▄▃▄▅▅▆▆▄▆▅▅███▃▃▅▅▃▄▃▂▃▂▃▁▃▃▃▃▃▂▂▃ ▃
11.4 ms Histogram: frequency by time 18.8 ms <
Memory estimate: 9.92 MiB, allocs estimate: 100063. Now the threaded and the ThreadPools version are on roughly the same footing. In fact, the ThreadPool version is actually hardly affected by the Gtk main loop at all, but because it performs so much worse than the serial version it is still not a good idea to use it here. Please keep in mind that I really don't want to bash ThreadPools, I think it is rather likely that I did something stupid and therefore my results are inaccurate. If you spot something fishy, do not hesitate to point it out :-) |
Proposing a bandaid here #607 |
I just found some time to apply my benchmark from above to the solution of @IanButterworth and I am very pleased to say that his PR really solves the problem 👍 I found one quirk to be aware of: The Gtk main task takes some time to terminate. When a multithreaded loop is started while the Gtk main task is still active, the performance is still thrashed (as to be expected). The following example demonstrates this: using Gtk
using BenchmarkTools
function loop!(arr::Vector{Float64})
for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function loopthreaded!(arr::Vector{Float64})
Threads.@threads for k in 1:length(arr)
k_float = float(k)
arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
end
end
function create_window()
win = GtkWindow("My First Gtk.jl Program", 400, 200)
b = GtkButton("Close me with the x-button in the top right corner.")
push!(win,b)
function on_button_clicked(w)
println("The button has been clicked")
end
signal_connect(on_button_clicked, b, "clicked")
# Now reinstantiate the Gtk main loop
showall(win)
# Wait with programm progression until the window is closed
c = Condition()
signal_connect(win, :destroy) do widget
notify(c)
end
wait(c)
# NOTE
# This function terminates the Gtk main loop, if one is active.
# Comment this out to see the massive slowdown due to the Gtk main loop
#gtk_quit_main_loop()
end
# Program execution
# ==============================================================================
arr = Vector{Float64}(undef, 100_000)
arrthreaded = Vector{Float64}(undef, 100_000)
# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))
# Comment this in or out to see the effect of the Gtk main loop.
create_window()
# Occasionally, the gtk main task may still run when entering the benchmark loop.
# In this case, the performance is as bad as before. A sleep command of 300 ms
# is usually sufficient to achieve task termination before proceeding to the loops.
# sleep(0.3)
is_running = Gtk.gtk_main_running[]
println("Gtk main task running: $is_running")
# When measuring only a single loop call, the Gtk main task may still be running,
# therefore the performance of the multithreaded call may be far worse than the
# single-threaded call
@time loopthreaded!(arrthreaded)
@time loop!(arr)
# Proper measurement with multiple function calls shows that the multithreaded
# version is indeed faster.
benchmark_threads = @benchmark loopthreaded!(arrthreaded)
benchmark_serial = @benchmark loop!(arr)
println("===============") When executing this code snippet, you'll see that the single-call benchmarking via |
Great to hear. #613 added a utility function for avoiding the issue you're seeing
It waits for a stopping eventloop to stop, if needed pauses the event loop, and reinstates its state afterwards It would be best if the Also, just as an API point, i'd use Your example did make me realize there was a bug though, where That is fixed in #615 Also I added (Grr.. Windows.. it would've been simpler if #610 wasn't happening.) So with #615 your options are: If you know the eventloop should be stopping i.e. after a window is destroyed
Or at any time, even if a window is open
which will pause rendering during execution. Adding either of those to your example, I get
|
This may be the same issue as #325, but observed on Linux with threads. I should say I'm running this on a 6-physical-core machine (12 if you include hyperthreads). Copied from JuliaImages/ImageFiltering.jl#161 and specifically the test in timholy/ComputationalResources.jl#18:
All seems well. Now load Gtk and do it again:
You can see that the multithreaded case specifically gets massively slowed down. Here are some key details:
The text was updated successfully, but these errors were encountered: