Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make train on lunar lander #14
Make train on lunar lander #14
Changes from 11 commits
babc9de
ccde355
4128388
469c36d
fddd812
5509196
528d498
8e8719b
21b2531
36f1e2c
bc52e3b
930f241
fbb740a
a1e631b
4445bfa
7c2ae64
46b243b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be controversial but I really think we should kill the ActionValue and GaussianMLPPolicy files and all the contents in them. Just do the network construction explicitly in this file instead. Otherwise we're hiding one of the things I'd like to make obvious, which is which network you're using. In the branch I was working on I have done this already and I think it's much better and more obvious what's happening. So, at the beginning just pop in a small
This goes with the philosophy of making test and experiment code the configuration that I really think will make our iterations faster and easier in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I'm not sure I agree though. As you said we did have an issue where it wasn't always obvious what's the network architecture and perhaps that should have been more clear but I'm not convinced this is the right answer. For lunar lander and hit the middle, sure it makes sense to add a few lines of code for clarity sake but I'm imagining a bit down the line where we might use bigger/more complicated networks and plus other networks we need for CURL, GAIL, world models etc. we might end up having to write so many lines for defining the networks that i'm not sure it makes sense. Plus there is always the issue of losing performance because we might forget an implementation detail of the network for that specific test/experiment.
Maybe the solution is to keep ActionValue and GaussianMLPPolicy and other types of networks and give both options to the user and they can use it as they see fit.
I changed this to what you suggested for now, as I think that's a bit outside of the scope of this PR, but I think maybe this is a conversation we should have and make changes if needed in a separate PR.