Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic @threaded activation when initializing multi-threaded Julia #2159

Open
afilogo opened this issue Nov 13, 2024 · 12 comments
Open

Automatic @threaded activation when initializing multi-threaded Julia #2159

afilogo opened this issue Nov 13, 2024 · 12 comments
Labels
discussion parallelization Related to MPI, threading, tasks etc.

Comments

@afilogo
Copy link

afilogo commented Nov 13, 2024

Hello Trixi team,

As far as I understand, initializing Julia with more than 1 thread automatically uses them inside Trixi loops (correct me If I am wrong). However, I see benefit in having a keyword stored in cache to activate it, similar to the one of OrdinaryDiffEq (thread=OrdinaryDiffEq.True()), when, for instance, preferring to use threads somewhere else.

Also, when experimenting with examples involving threads, I consistently encounter allocations (which are expected) along with a degradation in performance, e.g. #1596.

Should there be interest in having this feature, I could give it a try. Or you could also suggest a simple fix. Thank you.

@DanielDoehring
Copy link
Contributor

I agree with you that it is somewhat misleading that if you execute an elixir multithreaded the Trixi internals are thread parallelized, but the time integration is not. Note, however, that there are some elixirs which use the multithreaded version

sol = solve(ode, RDPK3SpFSAL49(thread = OrdinaryDiffEq.True()); abstol = 1e-8,

and that this behaviour is documented in the docs:

https://trixi-framework.github.io/Trixi.jl/stable/time_integration/#time-integration

and

https://trixi-framework.github.io/Trixi.jl/stable/parallelization/#Shared-memory-parallelization-with-threads

What do you mean with performance degredation ? It is well-known that you pretty much never get ideal speed up as synchronizing of threads causes some overhead.

@DanielDoehring DanielDoehring added parallelization Related to MPI, threading, tasks etc. discussion labels Nov 13, 2024
@afilogo
Copy link
Author

afilogo commented Nov 13, 2024

Thank you for your detailed response!

With degradation in performance I mean the elixir runs slower multi-threaded when compared with a single-thread version, just as you showed in the issue I mentioned. At least, that is what I take from the summary_callback().

I would like to be able to run multiple equations at the same time (one from a semidiscretization from Trixi) and, ideally, have similar performances as running each one independently in a single-thread, by using multiple threads (in this regard, I must say I have just been looking at this recently).

@DanielDoehring
Copy link
Contributor

So if you crank up your problem size, say, having at least a couple thousand unknowns per thread, you should see performance improvements. In the issue I deliberately used a toy problem to keep things simple.

@afilogo
Copy link
Author

afilogo commented Nov 13, 2024

I have not observed that in my experiments. But, the main issue for me is that it seems I cannot avoid threaded loops when starting Julia with multiple threads, even though I do not want that for the Trixi solve(), e.g. use threads in another piece of code which benefits more. Just wanted to know if this behavior is intended and/or can be easily fixed, if you find it reasonable.

I appreciate your help so far.

@ranocha
Copy link
Member

ranocha commented Nov 13, 2024

Did you try Trixi.set_polyester!(false)? See https://trixi-framework.github.io/Trixi.jl/stable/reference-trixi/#Trixi.set_polyester!-Tuple{Bool}
Switching to base threads should enable nested threading capabilities of Julia.

Alternatively, we (you) could consider making a PR with a similar function disabling threading in Trixi.jl.

@DanielDoehring
Copy link
Contributor

I have not observed that in my experiments.

So that might be if the runtime is not dominated by rhs! but callbacks such as analysis or AMR. When turning off save_solution, analysis_callback for the

https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave.jl

I get 33 seconds for one thread and 22 seconds for 2 threads. When using a uniform mesh without AMR the one-threaded run takes 61 seconds while the run on two threads takes 32 seconds, which is quite close to ideal speedup.

@afilogo
Copy link
Author

afilogo commented Dec 2, 2024

Thank you both for your comments and sorry for the late reply.

@DanielDoehring In fact, when I add more degrees of freedom it does give a boost, like in the example you showed.

@ranocha I just tried that and it gave me the following error: "TaskFailedException: nested task error: @threads :static cannot be used concurrently or nested", even when I used Threads.@threads :dynamic.

I can make such a PR. I guess I could do this similar to what is used in SciML, with an option in the macro and a cache which contains such a threading option, or like you showed with Trixi.set_polyester!(false) but for threading disabling. What do you think?

@vchuravy
Copy link
Member

vchuravy commented Dec 2, 2024

In #2029, I choose :static as the default since we can assume that most of the Trixi loops have decent work-balance,
but I did not foresee the nested use-case. I have been thinking about an additional argument in macro threaded that will choose the executor instead of globally switching behaviour.

I think for now a preference to disable threading entirely or to switch to :dynamic would be good.

@afilogo
Copy link
Author

afilogo commented Dec 2, 2024

In my opinion, I also think an additional argument to the macro is cleaner.

@ranocha
Copy link
Member

ranocha commented Dec 2, 2024

So it could be something like

Trixi.set_threading!(backend = "polyester"; force = true)

with the backend options

  • "polyester" (Polyester.jl)
  • "static" (base Threads.@threads :static)
  • "dynamic" (base Threads.@threads :dynamic )
  • "serial" (no threading at all)

Any suggestions for a better API?

@DanielDoehring
Copy link
Contributor

Any suggestions for a better API?

Maybe make polyester default keyword value?

@jlchan
Copy link
Contributor

jlchan commented Dec 2, 2024

Could we call it Trixi.set_threading_backend!(backend; force=true)? To me, backend is a little more descriptive than kind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion parallelization Related to MPI, threading, tasks etc.
Projects
None yet
Development

No branches or pull requests

5 participants