Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation of problem with multiple models #18

Open
klowrey opened this issue Jan 18, 2020 · 6 comments
Open

Investigation of problem with multiple models #18

klowrey opened this issue Jan 18, 2020 · 6 comments
Assignees

Comments

@klowrey
Copy link
Contributor

klowrey commented Jan 18, 2020

Instead of using tconstruct, using a mujoco model per mjdata.

reproduce by adding the following to hopper:

function ensembleconstruct(::Type{HopperV2}, n::Integer)
    modelpath = joinpath(@__DIR__, "hopper.xml")
    return Tuple(HopperV2(MJSim(modelpath, skip = 4)) for m=1:n )
end

and then in whatever NPG code for hopper:

npg = NaturalPolicyGradient((n) -> LyceumMuJoCo.ensembleconstruct(etype, n),
    #npg = NaturalPolicyGradient((i)->tconstruct(etype, i),

the above with ensembleconstruct produces:
image

while normal tconstruct produces:
image

note the units are very different; tconstruct performs much better.

@colinxs
Copy link
Contributor

colinxs commented Jan 18, 2020

thats disconcerting. ill take a look.

@colinxs
Copy link
Contributor

colinxs commented Jan 18, 2020

I can't reproduce this. Using runhopper.jl in SharedExperiments#master gets me curves like below with no different between tconsruct and ensembleconstruct.

image

@klowrey
Copy link
Contributor Author

klowrey commented Jan 19, 2020

I get the same perf with fixed seed and single thread in both cases.

The above curves were produced with multiple threads, which is really when the problem is evident. For a single thread tconstruct and ensemble construct would be exactly the same.

Not sure if its related but seed_threadrng doesn't seem to be fixing my thread seeds. Regardless, it seems like the perf with ensembleconstruct is consistently lower than tconstruct. I took a look at the envsampler and nothing popped out at me.

edit: i can try to look more into a reproducible case for this but i wonder if it's what's causing problems for penhand.

@colinxs
Copy link
Contributor

colinxs commented Jan 19, 2020

I still can't reproduce this. To make sure we're on the same page, I added a MWE under SharedExperiments/lyceumdev/ghissue_lyceummujoco_18/ or attached as a tarfile below. Running with 16 threads on voltron yields qualitatively equivalent training curves.

Not sure if its related but seed_threadrng doesn't seem to be fixing my thread seeds

What do you mean by "fixing"? seed_threadrngs!(rngs, seed[, jump = big(10) ^ 20) will set rngs[1] = MersenneTwister(seed) and then create the remaining rngs as rngs[tid] = Future.randjump(rngs[1], (tid - 1) * jump) which creates a new RNG with the same seed, but with a state that is "jumped" forward by (tid -1) * jump (would be the same as calling rand() (tid - 1) * jump times. Point being, they all have the same seed, but different state, guaranteeing uncorrelated noise, so the rngs having the same seed is normal.

I'm guessing you've tried this on different machines already, so I'm not sure why you're seeing this issue and I'm not.

ghissue_lyceummujoco_18.tar.gz

@klowrey
Copy link
Contributor Author

klowrey commented Jan 19, 2020

When I have given seed_threadrngs a fixed number, like 100, and run npg twice (with another call to seed_threadrngs(100)) it’ll produce two different curves.

I’ll do some more digging when I get to a machine.

@colinxs
Copy link
Contributor

colinxs commented Jan 19, 2020

Copying over from Slack for reference:

Colin Summers 3:10 PM
let's say I need 100 samples and Hmax=50. Running with two threads.
Three ways to get that: thread 1 samples 2 trajectories, thread 1 & 2 sample 1 trajectory each, thread 2 samples 2 trajectories
each one of those will be unique b/c each thread has unique random stream
So the thread has the same random sequence, but may be at a different state at the end of each NPG iteration depending on thread scheduling
butterfly effect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants