Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switched from RLInterface to CommonRLInterface #53

Merged
merged 2 commits into from
Jan 6, 2021
Merged

Conversation

zsunberg
Copy link
Member

I switched from RLInterface to CommonRLInterface, so we should be able to register this. It is also almost possible to use the package with a CommonRLInterface.AbstractEnv, but not quite (see #51 and #52). I am not sure if there are any performance regressions.

Copy link
Member

@rejuvyesh rejuvyesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except for some small issues I have pointed out.

@@ -7,25 +7,28 @@ Interface for defining an evaluation policy
returns the average reward of the current policy, the user can specify its own function
f to carry the evaluation, we provide a default basic_evaluation that is just a rollout.
"""
function evaluation(f::Function, policy::AbstractNNPolicy, env::AbstractEnvironment, n_eval::Int64, max_episode_length::Int64, verbose::Bool = false)
function evaluation(f::Function, policy::AbstractNNPolicy, env::AbstractEnv, n_eval::Int64, max_episode_length::Int64, verbose::Bool = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did ^M come in? Likely a vim issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not sure how the ^M is happening. I will correct it. Thanks

src/solver.jl Outdated
return solve(solver, env)
end

function POMDPs.solve(solver::DeepQLearningSolver, problem::POMDP)
env = POMDPEnvironment(problem, rng=solver.rng)
env = POMDPCommonRLEnv{AbstractArray{Float32}}(problem) # ignores solver.rng because CommonRLEnv doesn't have rng support yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't get any more concrete information except AbstractArray{Float32}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AbstractArray{Float32} is what we want here. It just means that convert_o(AbstractArray{Float32}, o, pomdp) will be called on every observation. That way problem-writers can use static arrays or built-in arrays. If the problem implementation is type-stable, the compiler should still be able to infer a nice concrete return type.

@zsunberg
Copy link
Member Author

@MaximeBouton do you have any comments on or concerns about this?

@zsunberg zsunberg requested a review from rejuvyesh December 24, 2020 03:40
Copy link
Member

@rejuvyesh rejuvyesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

@MaximeBouton
Copy link
Contributor

Sorry I took a lot of time off :)
Reviewing this right now

@MaximeBouton
Copy link
Contributor

That looks good! We can merge it, should we tag a version 0.6 and register?

@MaximeBouton MaximeBouton merged commit f7f4f73 into master Jan 6, 2021
@zsunberg
Copy link
Member Author

zsunberg commented Jan 7, 2021

Yes, I think we can register! Go ahead or I can do it in the next few days

@dylan-asmar dylan-asmar deleted the common-rl branch December 19, 2023 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants