[docs] add tutorial on parallelism (#3231)

jump-dev · Feb 22, 2023 · 00064c3 · 00064c3
1 parent 6a422d3
commit 00064c3
Show file tree

Hide file tree

Showing 3 changed files with 316 additions and 1 deletion.
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -2,6 +2,7 @@ build/
 latex_build/
 src/tutorials/*/*.md
 !src/tutorials/*/introduction.md
+!src/tutorials/algorithms/parallelism.md
 src/moi/
 src/tutorials/linear/transp.txt
 src/release_notes.md
diff --git a/docs/make.jl b/docs/make.jl
@@ -58,7 +58,11 @@ end
 
 function _literate_directory(dir)
     for filename in _file_list(dir, dir, ".md")
-        if !endswith(filename, "introduction.md")
+        if endswith(filename, "introduction.md")
+            continue
+        elseif endswith(filename, "parallelism.md")
+            continue
+        else
             rm(filename)
         end
     end
@@ -168,6 +172,7 @@ const _PAGES = [
             "tutorials/algorithms/benders_decomposition.md",
             "tutorials/algorithms/cutting_stock_column_generation.md",
             "tutorials/algorithms/tsp_lazy_constraints.md",
+            "tutorials/algorithms/parallelism.md",
         ],
         "Applications" => [
             "tutorials/applications/power_systems.md",

diff --git a/docs/src/tutorials/algorithms/parallelism.md b/docs/src/tutorials/algorithms/parallelism.md
@@ -0,0 +1,309 @@
+# Parallelism
+
+The purpose of this tutorial is to give a brief overview of parallelism in
+Julia as it pertains to JuMP, and to explain some of the things to be aware of
+when writing parallel algorithms involving JuMP models.
+
+## Multi-threading and Distributed computing
+
+There are two main types of parallelism in Julia:
+
+ 1. Multi-threading
+ 2. Distributed computing
+
+In multi-threading, multiple tasks are run in a single Julia process and share
+the same memory. In distributed computing, tasks are run in multiple Julia
+processes with independent memory spaces. This can include processes across
+multiple physical machines, such as in a high-performance computing cluster.
+
+Choosing and understanding the type of parallelism you are using is important
+because the code you write for each type is different, and there are different
+limitations and benefits to each approach. However, the best choice is highly
+problem dependent, so you may want to experiment with both approaches to
+determine what works for your situation.
+
+### Multi-threading
+
+To use multi-threading with Julia, you must either start Julia with the
+command line flag `--threads=N`, or you must set the `JULIA_NUM_THREADS`
+environment variable before launching Julia. For this documentation, we set
+the environment variable to:
+
+````julia
+julia> ENV["JULIA_NUM_THREADS"]
+"4"
+````
+
+You can check how many threads are available using:
+
+````julia
+julia> Threads.nthreads()
+4
+````
+
+The easiest way to use multi-threading in Julia is by placing the
+`Threads.@threads` macro in front of a `for`-loop:
+
+````julia
+julia> @time begin
+           ids = Int[]
+           Threads.@threads for i in 1:Threads.nthreads()
+               global ids
+               push!(ids, Threads.threadid())
+               sleep(1.0)
+           end
+       end
+  1.050600 seconds (47.75 k allocations: 3.093 MiB, 4.40% compilation time)s
+````
+
+This for-loop sleeps for `1` second on each iteration. Thus, if it had
+executed sequentially, it should have taken the same number of seconds as
+there are threads available. Instead, it took only 1 second, showing that the
+iterations were executed simultaneously. We can verify this by checking the
+`Threads.threadid()` of the thread that executed each iteration:
+
+````julia
+julia> ids
+4-element Vector{Int64}:
+ 1
+ 4
+ 2
+ 3
+````
+
+!!! tip
+    For more information, read the Julia documentation
+    [Multi-Threading](https://docs.julialang.org/en/v1/manual/multi-threading/#man-multithreading).
+
+### Distributed computing
+
+To use distributed computing with Julia, use the `Distributed` package:
+
+````julia
+julia> import Distributed
+````
+
+Like multi-threading, we need to tell Julia how many processes to add. We can
+do this either by starting Julia with the `-p N` command line argument, or by
+using `Distributed.addprocs`:
+
+````julia
+julia> import Pkg
+
+julia> project = Pkg.project();
+
+julia> workers = Distributed.addprocs(4; exeflags = "--project=$(project.path)")
+4-element Vector{Int64}:
+ 2
+ 3
+ 4
+ 5
+````
+
+!!! warning
+    Not loading the parent environment with `--project` is a common mistake.
+
+The added processes are "worker" processes that we can use to do computation
+with. They are orchestrated by the process with the id `1`. You can check
+what process the code is currently running on using `Distributed.myid()`
+
+````julia
+julia> Distributed.myid()
+1
+````
+
+As a general rule, to get maximum performance you should add as many processes
+as you have logical cores available.
+
+Unlike the `for`-loop approach of multi-threading, distributed computing
+extends the Julia `map` function to a "parallel-map" function
+`Distributed.pmap`. For each element in the list of arguments to map over,
+Julia will copy the element to an idle worker process and evaluate the
+function passing the element as an input argument.
+
+````julia
+julia> function hard_work(i::Int)
+           sleep(1.0)
+           return Distributed.myid()
+       end
+hard_work (generic function with 1 method)
+
+julia> Distributed.pmap(hard_work, 1:4)
+ERROR: On worker 2:
+UndefVarError: #hard_work not defined
+Stacktrace:
+[...]
+````
+
+Unfortunately, if you try this code directly, you will get an error message
+that says `On worker 2: UndefVarError: hard_work not defined`. The error is
+thrown because, although process `1` knows what the `hard_work` function is,
+the worker processes do not.
+
+To fix the error, we need to use `Distributed.@everywhere`, which evaluates
+the code on every process:
+
+````julia
+julia> Distributed.@everywhere begin
+           function hard_work(i::Int)
+               sleep(1.0)
+               return Distributed.myid()
+           end
+       end
+````
+
+Now if we run `pmap`, we see that it took only 1 second instead of 4, and that
+it executed on each of the worker processes:
+
+````julia
+julia> @time ids = Distributed.pmap(hard_work, 1:4)
+  1.202006 seconds (216.39 k allocations: 13.301 MiB, 4.07% compilation time)
+4-element Vector{Int64}:
+ 2
+ 3
+ 5
+ 4
+````
+
+!!! tip
+    For more information, read the Julia documentation
+    [Distributed Computing](https://docs.julialang.org/en/v1/manual/distributed-computing/).
+
+## Using parallelism the wrong way
+
+### When building a JuMP model
+
+For large problems, building the model in JuMP can be a bottleneck, and you
+may consider trying to write code that builds the model in parallel, for
+example, by wrapping a `for`-loop that adds constraints with
+`Threads.@threads`.
+
+Unfortunately, you cannot build a JuMP model in parallel, and attempting to do
+so may error or produce incorrect results.
+
+In most cases, we find that the reason for the bottleneck is not JuMP, but in
+how you are constructing the problem data, and that with changes, it is
+possible to build a model in a way that is not the bottleneck in the solution
+process.
+
+!!! tip
+    Looking for help to make your code run faster? Ask for help on the
+    [community forum](https://discourse.julialang.org/c/domain/opt/13). Make
+    sure to include a reproducible example of your code.
+
+### With a single JuMP model
+
+A common approach people try is to use parallelism with a single JuMP model.
+For example, to optimize a model over multiple right-hand side vectors, you
+may try:
+
+```julia
+using JuMP
+import HiGHS
+model = Model(HiGHS.Optimizer)
+set_silent(model)
+@variable(model, x)
+@objective(model, Min, x)
+solutions = Float64[]
+Threads.@threads for i in 1:10
+    set_lower_bound(x, i)
+    optimize!(model)
+    push!(solutions, objective_value(model))
+end
+```
+
+This will not work, and attempting to do so may error, crash Julia or produce
+incorrect results.
+
+## Using parallelism the right way
+
+To use parallelism with JuMP, the simplest rule to remember is that each
+worker must have its own instance of a JuMP model.
+
+### With multi-threading
+
+With multi-threading, create a new instance of `model` in each iteration of
+the for-loop:
+
+````julia
+julia> using JuMP
+
+julia> import HiGHS
+
+julia> solutions = Float64[]
+Float64[]
+
+julia> Threads.@threads for i in 1:10
+           model = Model(HiGHS.Optimizer)
+           set_silent(model)
+           @variable(model, x)
+           @objective(model, Min, x)
+           set_lower_bound(x, i)
+           optimize!(model)
+           push!(solutions, objective_value(model))
+       end
+
+julia> solutions
+8-element Vector{Float64}:
+ 9.0
+ 1.0
+ 7.0
+ 4.0
+ 8.0
+ 5.0
+ 3.0
+ 6.0
+````
+
+### With distributed computing
+
+With distributed computing, remember to evaluate all of the code on all of the
+processes using `Distributed.@everywhere`, and then write a function which
+creates a new instance of the model on every evaluation:
+
+````julia
+julia> Distributed.@everywhere begin
+           using JuMP
+           import HiGHS
+       end
+
+julia> Distributed.@everywhere begin
+           function solve_model_with_right_hand_side(i)
+               model = Model(HiGHS.Optimizer)
+               set_silent(model)
+               @variable(model, x)
+               @objective(model, Min, x)
+               set_lower_bound(x, i)
+               optimize!(model)
+               return objective_value(model)
+           end
+       end
+
+julia> solutions = Distributed.pmap(solve_model_with_right_hand_side, 1:10)
+10-element Vector{Float64}:
+  1.0
+  2.0
+  3.0
+  4.0
+  5.0
+  6.0
+  7.0
+  8.0
+  9.0
+ 10.0
+````
+
+## Other types of parallelism
+
+### GPU
+
+JuMP does not support GPU programming, and few solvers support execution on a
+GPU.
+
+### Parallelism within the solver
+
+Many solvers use parallelism internally. For example, commercial solvers like
+[Gurobi](https://github.com/jump-dev/Gurobi.jl) and [CPLEX](https://github.com/jump-dev/CPLEX.jl)
+both parallelize the search in branch-and-bound. Solvers supporting internal
+parallelism will typically support the [`MOI.NumberOfThreads`](@ref) attribute,
+which you can set using [`set_optimizer_attribute`](@ref).