Best approach for moving from sequential back to parallel areas? #764

coatless · 2025-02-13T20:52:34Z

coatless
Feb 13, 2025

Description

When using the same seed across different RNG approaches (sequential, parallel package, and future package), we're seeing inconsistent results between methods, even though each method is internally consistent. This makes it challenging to achieve reproducible results when switching between different execution strategies.

Reproducible Example

# Function to generate random numbers and return mean
generate_random_numbers <- function(n) {
  numbers <- rnorm(n, mean = 2, sd = 1)
  return(mean(numbers))
}

# 1. Sequential approach
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
seq_result1 <- replicate(5, generate_random_numbers(1000000))
set.seed(123)
seq_result2 <- replicate(5, generate_random_numbers(1000000))

# 2. Parallel package approach
cl <- makeCluster(2)
clusterExport(cl, "generate_random_numbers")
clusterSetRNGStream(cl, iseed = 123)
parallel_result1 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))


clusterSetRNGStream(cl, iseed = 123)
parallel_result2 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))
stopCluster(cl)

# 3. Future package approach
library(future)
library(future.apply)
plan(sequential)
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
future_seq <- future_replicate(5, generate_random_numbers(1000000), future.seed = 123)

plan(multisession, workers = 2)
future_par <- future_replicate(5, generate_random_numbers(1000000), future.seed = 123)

# Results comparison
results_df <- data.frame(
  Sequential_First = seq_result1,
  Sequential_Second = seq_result2,
  Parallel_First = parallel_result1,
  Parallel_Second = parallel_result2,
  Future_Sequential = future_seq,
  Future_Parallel = future_par
)
knitr::kable(results_df)

Sequential_First	Sequential_Second	Parallel_First	Parallel_Second	Future_Sequential	Future_Parallel
2.000622	2.000622	2.000622	2.000622	2.000063	2.000063
2.000961	2.000961	2.000961	2.000961	1.999626	1.999626
1.999776	1.999776	2.001337	2.001337	1.999178	1.999178
2.000493	2.000493	2.000612	2.000612	2.000155	2.000155
2.000756	2.000756	2.000923	2.000923	1.999620	1.999620

Current Behavior

Each approach (sequential, parallel, future) produces internally consistent results (i.e., repeatable when using the same seed within the same approach), but the actual values differ between approaches.

Expected Behavior

Ideally, when using the same seed and L'Ecuyer-CMRG RNG:

All approaches should produce identical results, OR
There should be a clear way to ensure reproducibility across different execution strategies

Questions

Is it possible to achieve consistent results across all these approaches?
If not, what is the recommended way to ensure reproducibility when code might need to switch between sequential and parallel execution?
Are there specific settings or approaches we should use to mitigate this issue?

HenrikBengtsson · 2025-02-13T22:20:39Z

HenrikBengtsson
Feb 13, 2025
Maintainer

Hi, thanks for this. This is a really important and complicated topic. I think there are two main ways to think about this:

numerical reproducibility
statistical reproducibility

and here we're talking the first - numerical reproducibility.

I also want to add the disclaimer that I'm by no means an expert in RNG theory and algorithm, but I understand some based on my background and training. FWIW, I coincidentally had a great chat with @rstub about RNGs earlier today, and we touched on this and related questions, but we never dove into the details.

Is it possible to achieve consistent results across all these approaches?

I don't think it can be done out of the box without quite a bit of manual work. This is because we need to make sure the same RNGkind() is used everywhere, and that each call to generate_random_numbers() is initiated with the exact same RNG state. Futureverse done this for us automatically, which is why it can guarantee numerical reproducibility regardless of parallel backend using RNGkind("L'Ecuyer-CMRG") (parallel RNG). This is done by initiating each call with the exact same RNG state using RNG substreams. In order to achieve the same for classical sequential processing, or in other parallel frameworks, we'd need to do the same there. That would require (i) an agreed upon standard, and (ii) currently manual orchestration of RNG state, which is what Futureverse does for us internally. It might be that we could come up with helper functions to simplify this, but ideally base::replicate() would have an argument for doing this for us. That's not something that is straightforward to implement.

If not, what is the recommended way to ensure reproducibility when code might need to switch between sequential and parallel execution?

I don't have any myself. One reason is that I still don't consider myself in a position to be a person proposing such a standard. Working with RNG is complicated, and be able to claim it correct from a statistical point of view is even harder.

I'd like to add another important case too, which helps to understand the problem. The parallel approach, which is used by most parallel frameworks that I know of, is to initiate the RNG state per worker. Because of this, your set of random numbers will also depend on the number of parallel workers;

# 2. Parallel package approach
for (nworkers in c(1, 2, 3, 4)) {
  cl <- makeCluster(nworkers)
  clusterExport(cl, "generate_random_numbers")
  clusterSetRNGStream(cl, iseed = 123)
  parallel_result1 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))
  print(parallel_result1)

  clusterSetRNGStream(cl, iseed = 123)
  parallel_result2 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))
  print(parallel_result2)

  stopCluster(cl)
}

outputs:

[1] 2.000622 2.000961 1.999776 2.000493 2.000756
[1] 2.000622 2.000961 1.999776 2.000493 2.000756
[1] 2.000622 2.000961 2.001337 2.000612 2.000923
[1] 2.000622 2.000961 2.001337 2.000612 2.000923
[1] 2.000622 2.000961 2.001337 1.999897 2.000754
[1] 2.000622 2.000961 2.001337 1.999897 2.000754
[1] 2.000622 2.001337 1.999897 2.000754 1.999677
[1] 2.000622 2.001337 1.999897 2.000754 1.999677

Understand this, helps to understand what the problem is and what our options are.

When I designed Futureverse, I took an extremely conservative approach (predict questions like this), to design the default RNG to be invariant to the number of parallel workers. So, if you try the above with different number of workers, you'll see it doesn't matter. That design avoids some problems, but it also introduces others. It's my plan to provide an option to set RNG states per worker as an alternative to per iteration ("per function call"). Doing it per worker, can only guarantee "statistically reproducible" random numbers, but you can never achieve "numerically reproducible" random numbers. (This is actually one of several the things @rstub and I discussed; I expect to support this, and more, rather soon).

I'd love for others to pitch in on this too - both from a theoretical point of view and a practical point of view.

3 replies

coatless Feb 14, 2025
Author

Thanks for the detailed and thoughtful response! This really helps clarify the RNG reproducibility landscape. (@rstub is the bees knees for RNGs.)

Your explanation of the per-worker RNG state initialization in parallel approaches particularly helped me understand why we're seeing different results with different numbers of workers. The example output showing how the results change with 1-4 workers really drove that point home.

A few follow-up questions about the Future package's approach:

You mentioned that Future takes a conservative approach making results invariant to worker count. Could you elaborate on what design tradeoffs this introduces?
When you implement the planned per-worker RNG state option as an alternative to per-iteration, will this be opt-in or will the default be updated?
For my specific use case, we sometimes need to switch between sequential and parallel execution based on input size. Would you recommend:
- Always using Future's approach for consistency even when running sequentially
- Using native sequential/parallel approaches for better performance and accepting the numerical differences
- Some other strategy?

Regarding additional thoughts on this approach, maybe we should ping @pierrelecuyer? I'm not sure how active he is on community sites; but, about ~8 years back he provided guidance on MT, c.f. https://cs.stackexchange.com/a/82490.

Thanks again for taking the time to explain this complex topic. The background really helps and I'll try to write up a blog post summarizing the above plus putting a request for comments out.

rstub Feb 15, 2025

From my understanding, the Future package is generating specific seeds for every iteration, which is not exactly cheap.¹ This might be the reason for the better performance of the native methods compared with Future's approach that you alluded to. The upside is that this makes it independent of the number of workers.

I don't think that one approach is inherently better than the other: Per worker seeds is fast but gives only statistical reproducibility, while per iteration seeds is more costly but gives also numerical reproducibility. I think it is good to have choice between the two approaches, since it requires subject-matter expertise to tell which of the two is needed.

1 I plan to benchmark dqrng in that regard: daqana/dqrng#95

HenrikBengtsson Feb 15, 2025
Maintainer

BTW, more clarifications of our RNG options (ignore the RNG kind per se):

I think there are actually two type of per-worker RNG sets. If workers are persistent (e.g. multisession; parallel PSOCK workers), then we can initiate the RNG state once when the parallel worker is first launched. If workers are transient (e.g. multicore, future.callr::callr, ...), then a new worker is launched for each future, meaning the RNG state has to be initiated once per future.

In contrast, the once-per-iteration approach (the current approach), initiates the RNG state ones per iteration. Each future may process many iterations.

For instance, consider future_lapply(X, FUN = myfcn, future.seed = TRUE) where length(X) is 100. By default, future_lapply() will partition X into 4 (=number of workers) uniformly size chunks (Xchunk[[1]], Xchunk[[2]], ...) and then run lapply(Xchunk[[kk]], FUN = myfcn) on each chunk kk = 1, 2, 3, 4, and finally collected and rearrange the results. The per-iteration RNG approach, pre-generated initial RNG states for each of the length(X) elements (aka "iterations"). These pregenerated RNG states are then applied one by one just as each new myfcn(Xchunk[[kk]][[jj]]) call is made. The expensive part with this per-iteration approach is that we have to generate length(X) RNG states ("streams") each time future_lapply() is called.

Next, consider, we have 4 persistent workers. If we take the per-worker approach, we could just initiate the RNG state once, when we launch the workers. Think, parallel::clusterSetRNGStream(workers, ...). This approach is much cheaper that the per-iteration approach, because we only have to generate nbrOfWorkers() RNG streams once per R session. The difference in setup time becomes significant when n * length(X) >> nbrOfWorkers(), where n is the number of calls to future_lapply().

Then, consider we have 4 transient workers. If we take the per-worker approach, we can only initiate their RNG states when these workers are launched - a worker is only launched when the future is launched. So in our future_lapply() approach, that would mean we have to pregenerate one RNG state per chunk. The default is nbrOfWorkers() chunks, so that means nbrOfWorkers() RNG streams per future_apply() call. So, compares to the per-worker approach with persistent workers, this requires generation of n * nbrOfWorkers() RNG streams if we call future_lapply() n times. So, with transient workers we will have to produce n times more RNG streams than with persistent workers.

All three of these strategies will produce different random number sets.

PS. The per-worker approach with one RNG stream per chunk is easier to implement, because it can be used everywhere without knowing if the future backend runs on persistent or transient workers. With that argument, that might be what will be added first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best approach for moving from sequential back to parallel areas? #764

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Best approach for moving from sequential back to parallel areas? #764

coatless Feb 13, 2025

Description

Reproducible Example

Current Behavior

Expected Behavior

Questions

Replies: 1 comment · 3 replies

HenrikBengtsson Feb 13, 2025 Maintainer

coatless Feb 14, 2025 Author

rstub Feb 15, 2025

HenrikBengtsson Feb 15, 2025 Maintainer

coatless
Feb 13, 2025

Replies: 1 comment 3 replies

HenrikBengtsson
Feb 13, 2025
Maintainer

coatless Feb 14, 2025
Author

HenrikBengtsson Feb 15, 2025
Maintainer