-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checkpoint monitoring class #75
Comments
Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.
Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.
Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.
Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.
Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.
Will test this tomorrow and see if I can add my own landing and takeoff scripts slowly |
@xabierolaz this has been merged in to master was just referencing the changes. If you are looking to merge navigation support into GymFC it would be a good idea to open a feature request issue so others know you are working on it and so the approach can be discussed. Recently #76 was added however I have no idea if anything will come of it but if multiple people were working on it it'd be completed much faster. |
Hi Wil, I have been going through the two scripts
that invokes the step function of Gymfc environment. It is this data that is getting logged. Should we not be retrieving the data from the checkpoints folder for storing it in the .csv files? |
Hi @varunag18 ,
|
I think I did not understand it clearly then. Your response makes it clear to me now. The training script invokes the MlpPolicy class for training the NN, while the evaluation script invokes the PpoBaselinesPolicy class to get the trained data from the checkpoints folder. Am I correct now?
Where exactly are we setting the count of number of agents?
Will do this for sure. |
Yes that is correct. Checkpoints are just data files containing the neural network and other functions used during training. Training is specific to the RL algorithm. However as long as the algorithm produces a Tensorflow graph we can do the evaluation independent of the training library and just on the Tensorflow checkpoint which is ideal because this is more scalable. All you need to know is the input and output tensor names to extract the NN subgraph.
You don't. When you execute the PPO trainer it trains a single agent. To train more than one just execute that script N number of times. The number N depends on your research goals. Wrap the python call in a bash loop script if you want to automate it. |
Hi Wil, |
Another question, how do we exactly zero down on the best checkpoint? What is the criteria to do so? In your thesis, you write, "Once training was complete, we select the checkpoint that provided the most stable step responses, which occurred after 2,500,000 steps to use as our flight controller policy." Please elaborate on this. |
@varunag18 |
Hi @varunag18, this issue is currently closed. In the future please open a new issue if you have a new question. Which chapter are you referring to? ESC voltage and current is baked into the framework (see the message type here) but there currently doesn't exist a model, this is not something I explored for my thesis work. If this is something you are looking to support you can fork the aircraft-plugin repo, add the model and pass the value back here. As @xabierolaz points out, if you plot over training validation the MAE or other error metrics it will begin to converge after a couple million steps. If the reward function was perfect we'd probably select the longest trained one with the highest reward. Unfortunately this isn't the case and it isn't perfect so after convergence it usually takes looking at a bunch of step responses plots and selecting the agent that produces the best step responses in terms of minimizing error and oscillations. Then when you look to have a good one trying it out on the drone and confirming there are no visual oscillations. |
Is your feature request related to a problem? Please describe.
RL training produces checkpoints however the examples do not include the evaluation
Describe the solution you'd like
The thesis work used a checkpoint monitor to evaluate new checkpoints once created
Describe alternatives you've considered
We can alternatively do this ondemand but we'll leave this to a new issue/ PR
Additional context
This would be one of a new features to support RL training and evaluation.
The text was updated successfully, but these errors were encountered: