-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switched from RLInterface to CommonRLInterface #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for some small issues I have pointed out.
src/evaluation_policy.jl
Outdated
@@ -7,25 +7,28 @@ Interface for defining an evaluation policy | |||
returns the average reward of the current policy, the user can specify its own function | |||
f to carry the evaluation, we provide a default basic_evaluation that is just a rollout. | |||
""" | |||
function evaluation(f::Function, policy::AbstractNNPolicy, env::AbstractEnvironment, n_eval::Int64, max_episode_length::Int64, verbose::Bool = false) | |||
function evaluation(f::Function, policy::AbstractNNPolicy, env::AbstractEnv, n_eval::Int64, max_episode_length::Int64, verbose::Bool = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did ^M
come in? Likely a vim
issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am not sure how the ^M is happening. I will correct it. Thanks
src/solver.jl
Outdated
return solve(solver, env) | ||
end | ||
|
||
function POMDPs.solve(solver::DeepQLearningSolver, problem::POMDP) | ||
env = POMDPEnvironment(problem, rng=solver.rng) | ||
env = POMDPCommonRLEnv{AbstractArray{Float32}}(problem) # ignores solver.rng because CommonRLEnv doesn't have rng support yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't get any more concrete information except AbstractArray{Float32}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AbstractArray{Float32}
is what we want here. It just means that convert_o(AbstractArray{Float32}, o, pomdp)
will be called on every observation. That way problem-writers can use static arrays or built-in arrays. If the problem implementation is type-stable, the compiler should still be able to infer a nice concrete return type.
@MaximeBouton do you have any comments on or concerns about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Sorry I took a lot of time off :) |
That looks good! We can merge it, should we tag a version 0.6 and register? |
Yes, I think we can register! Go ahead or I can do it in the next few days |
I switched from RLInterface to CommonRLInterface, so we should be able to register this. It is also almost possible to use the package with a CommonRLInterface.AbstractEnv, but not quite (see #51 and #52). I am not sure if there are any performance regressions.