-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RKMilCommute constant cache perform step #383
Conversation
RKMil constant cache perform step and tests
Looks like the amount of time is just over 2 hours. |
WeakConvergence just needs CPU right? So we can put it in a different queue. All of the tests are already multithreaded? |
Yes, I think all tests are already multithreaded, since they should all use
Yes. |
Looks like a random failure. FWIW, it's great that we're finding out how to make these stable and not rely on just a seed. On another note, damn these take a lot of compute. I think we will want a semi-SIMD form @chriselrod mentioned before. I'll look at what's going on with that GPU test. I think that's a regression. |
yeah, indeed the tests get really expensive ..
Yes.. I also wanted to ask you already about that. I thought it must be something almost trivial as the tests were passing before we added the weak convergence tests to the buildkite queue |
@vchuravy @jpsamaroo could you take a look at https://buildkite.com/julialang/stochasticdiffeq-dot-jl/builds/40#de077c9c-e46e-4a31-b7ac-1fdd10b45bf4/359-733 and see if you know what that is? I am confused since I can't recreate it. I locally grabbed the test and it ran fine: using StochasticDiffEq, Test, Random
using DiffEqGPU
using CUDA
function weak_error(prob,alg,numtraj, batchsize, f_true,trange;abstol=1,reltol=0,ensemblealg=EnsembleCPUArray())
sol = @time solve(prob,alg,ensemblealg,
dt=0.01f0,adaptive=true,abstol=abstol,reltol=reltol,
trajectories=numtraj,batch_size=batchsize,
saveat = trange
)
computed_exp = (sol.u/numtraj)[1,:]
true_exp = f_true.(trange)
return sum((computed_exp-true_exp).^2)/length(trange)
end
function prob_func(prob, i, repeat)
#(i%50000 == 0) && @show i
remake(prob,seed=seeds[i])
end
function reduction(u,batch,I)
u.+sum(batch),false
end
function output_func(sol,i)
#h1(asinh(sol[end][1])),false
h1.(asinh.(sol)),false
end
# prob 1
u₀ = [0.0f0]
tspan = (0.0f0,2.0f0)
h1(z) = z^3-6*z^2+8*z
function f1!(du,u,p,t)
@inbounds begin
du[1] = 1//2*u[1]+sqrt(u[1]^2 +1)
end
nothing
end
function g1!(du,u,p,t)
@inbounds begin
du[1] = sqrt(u[1]^2 +1)
end
nothing
end
f_true1(t) = t^3-3*t^2+2*t
prob1 = SDEProblem(f1!,g1!,u₀,tspan)
ensemble_prob1 = EnsembleProblem(prob1;
output_func = output_func,
prob_func = prob_func,
reduction = reduction,
u_init=Vector{eltype(prob1.u0)}([0.0])
)
# prob 2
u₀ = [0.1f0,0.1f0]
function f2!(du,u,p,t)
@inbounds begin
du[1] = 3//2*u[1]
du[2] = 3//2*u[2]
end
nothing
end
function g2!(du,u,p,t)
@inbounds begin
du[1] = 1//10*u[1]
du[2] = 1//10*u[2]
end
nothing
end
f_true2(t) = 1//10*exp(3//2*t) #1//100*exp(301//100*t)
h2(z) = z #z^2
prob2 = SDEProblem(f2!,g2!,u₀,tspan)
ensemble_prob2 = EnsembleProblem(prob2;
output_func = (sol,i) -> (h2.(sol),false),
prob_func = prob_func,
reduction = reduction,
u_init=Vector{eltype(prob2.u0)}([0.0, 0.0])
)
tsave = 0.0f0:0.05f0:2f0
probs = Vector{EnsembleProblem}(undef, 2)
probs[1] = ensemble_prob1
probs[2] = ensemble_prob2
ftrue = Vector{}(undef, 2)
ftrue[1] = f_true1
ftrue[2] = f_true2
numtraj = Int(1e5)
seed = 100
Random.seed!(seed)
seeds = rand(UInt, numtraj)
numtraj = Int(1e5)
seed = 100
Random.seed!(seed)
seeds = rand(UInt, numtraj)
for i in 1:2
@show i
err1 = weak_error(probs[i],DRI1NM(),numtraj,Int(1e4),ftrue[i],tsave,abstol=1f0,reltol=1f0, ensemblealg=EnsembleGPUArray())
@show err1
# err2 = weak_error(probs[i],DRI1NM(),numtraj,Int(1e4),ftrue[i],tsave,abstol=0.1f0,reltol=0.1f0, ensemblealg=EnsembleGPUArray())
# @show err2
err3 = weak_error(probs[i],DRI1NM(),numtraj,Int(1e4),ftrue[i],tsave,abstol=0.01f0,reltol=0.01f0, ensemblealg=EnsembleGPUArray())
@show err3
@test err1 > err3
println("")
end |
What do you want to SIMD? Also, I could revive the PR on using julia> x = Vector{Float64}(undef, 1024);
julia> using VectorizedRNG, Random
julia> @benchmark randn!(local_rng(), $x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.298 μs (0.00% GC)
median time: 1.344 μs (0.00% GC)
mean time: 1.347 μs (0.00% GC)
maximum time: 2.637 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
julia> @benchmark randn!($x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.062 μs (0.00% GC)
median time: 2.176 μs (0.00% GC)
mean time: 2.182 μs (0.00% GC)
maximum time: 3.321 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
julia> @benchmark randn(local_rng())
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.695 ns (0.00% GC)
median time: 3.050 ns (0.00% GC)
mean time: 3.029 ns (0.00% GC)
maximum time: 5.762 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark randn()
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.353 ns (0.00% GC)
median time: 6.712 ns (0.00% GC)
mean time: 6.736 ns (0.00% GC)
maximum time: 23.688 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 But this computer has AVX512. julia> @benchmark rand!(local_rng(), $x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 168.105 ns (0.00% GC)
median time: 168.276 ns (0.00% GC)
mean time: 168.450 ns (0.00% GC)
maximum time: 192.335 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 750
julia> @benchmark rand!($x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 594.022 ns (0.00% GC)
median time: 597.654 ns (0.00% GC)
mean time: 598.793 ns (0.00% GC)
maximum time: 701.682 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 179
julia> @benchmark rand(local_rng())
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.855 ns (0.00% GC)
median time: 1.927 ns (0.00% GC)
mean time: 1.931 ns (0.00% GC)
maximum time: 5.347 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark rand()
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 4.581 ns (0.00% GC)
median time: 5.086 ns (0.00% GC)
mean time: 5.099 ns (0.00% GC)
maximum time: 17.237 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000 and is thus probably faster on everything with AVX2. But I wouldn't be surprised if RandomNumbers.jl wins if all you need is one scalar at a time. |
It's two pieces. One is the RNG. The other is that this is solving a ton of trajectories multithreaded, so instead of doing every one separate, we should do that semi-clumping we discussed before. |
We almost never need one scalar at time, so if you wanted to revive that effort for VectorizedRNGs I would be super happy. |
What's blocking this? |
Mainly, the CUDA issue in buildkite/stochasticdiffeq-dot-jl/weakadaptive . |
I will take a look at the failure. It seems a change to adapt has broken |
I'm going to merge this to separate the concerns. But let's definitely continue this thread. I'll open two issues on the two points. |
Solves #372 .
Also fixes two small bugs for
RKMilGeneral
.Additionally, The iip_weak test file used a pre-defined SDE problem from DiffEqProblemLibrary before (which, however, is defined as oop). This is now also corrected.