Replies: 1 comment 3 replies
-
Hi, thanks for this. This is a really important and complicated topic. I think there are two main ways to think about this:
and here we're talking the first - numerical reproducibility. I also want to add the disclaimer that I'm by no means an expert in RNG theory and algorithm, but I understand some based on my background and training. FWIW, I coincidentally had a great chat with @rstub about RNGs earlier today, and we touched on this and related questions, but we never dove into the details.
I don't think it can be done out of the box without quite a bit of manual work. This is because we need to make sure the same
I don't have any myself. One reason is that I still don't consider myself in a position to be a person proposing such a standard. Working with RNG is complicated, and be able to claim it correct from a statistical point of view is even harder. I'd like to add another important case too, which helps to understand the problem. The parallel approach, which is used by most parallel frameworks that I know of, is to initiate the RNG state per worker. Because of this, your set of random numbers will also depend on the number of parallel workers; # 2. Parallel package approach
for (nworkers in c(1, 2, 3, 4)) {
cl <- makeCluster(nworkers)
clusterExport(cl, "generate_random_numbers")
clusterSetRNGStream(cl, iseed = 123)
parallel_result1 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))
print(parallel_result1)
clusterSetRNGStream(cl, iseed = 123)
parallel_result2 <- parSapply(cl, 1:5, function(x) generate_random_numbers(1000000))
print(parallel_result2)
stopCluster(cl)
} outputs:
Understand this, helps to understand what the problem is and what our options are. When I designed Futureverse, I took an extremely conservative approach (predict questions like this), to design the default RNG to be invariant to the number of parallel workers. So, if you try the above with different number of workers, you'll see it doesn't matter. That design avoids some problems, but it also introduces others. It's my plan to provide an option to set RNG states per worker as an alternative to per iteration ("per function call"). Doing it per worker, can only guarantee "statistically reproducible" random numbers, but you can never achieve "numerically reproducible" random numbers. (This is actually one of several the things @rstub and I discussed; I expect to support this, and more, rather soon). I'd love for others to pitch in on this too - both from a theoretical point of view and a practical point of view. |
Beta Was this translation helpful? Give feedback.
-
Description
When using the same seed across different RNG approaches (sequential, parallel package, and future package), we're seeing inconsistent results between methods, even though each method is internally consistent. This makes it challenging to achieve reproducible results when switching between different execution strategies.
Reproducible Example
Current Behavior
Each approach (sequential, parallel, future) produces internally consistent results (i.e., repeatable when using the same seed within the same approach), but the actual values differ between approaches.
Expected Behavior
Ideally, when using the same seed and L'Ecuyer-CMRG RNG:
Questions
Beta Was this translation helpful? Give feedback.
All reactions