-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If it is necessary to apply extra critic nteworks for evaluating 'safety Q value' ? #1
Comments
Thank you for the questions. I am also a safe-RL learner. Here are my comments about your questions. 2-) I have not inspected the change of the lambda to be honest, but with little modification on my code, u can also inspect the lambda value. The reason for doing max(0, lambda) for lagrangian optimization is to keep the Lamba in a positive scale but again I have to work on it to give you a proper answer. These days I am busy with other stuff. |
Thank you so much for your reply. |
Hello ZhininLee,
Thank you for the questions. I am also a safe-RL learner. Here are my
comments about your questions.
1-) There are two contained RL methods in general. First, peak constraint
RL which deals with the constraints on the reward function itself, the
other method is the average constraint RL which tries to minimize the cost
with extra value function while trying to maximize the reward. So for
average constrain formulation yes it is required. I did not get you what do
you mean by "actor loss by the cost from off-policy data"
2-) I have not inspected the change of the lambda to be honest but with
little modification on my code, u can inspect the lambda value as well.
The reason for doing max(0, lambda) for lagrangian optimization is to keep
the Lamba in positive scale but again I have to work on it to give you a
proper answer. These days I am busy with other stuff.
I hope this helps you.
Ammar
…On Fri, Jul 8, 2022 at 11:35 PM ZhihanLee ***@***.***> wrote:
Hello, Dr.Haydari. I am an undergraduate student engaging in safe RL, and
I also tried to implement CSAC/SAC-Lagrangian in pytorch.
I was wondering :
① if it is necessary to apply extra critic networks for 'safety Q value',
does it has better performance than constructing actor loss by the cost
from off-policy data?
②Have you ploted the lambda training curve? I experienced a monotonic
training curve, which is just raise (positve loss) or decend (negative
loss), I have noticed that some paper adjust the gradient ascent with
max(0, lambda)
I would appreciate it if you could help me.
—
Reply to this email directly, view it on GitHub
<#1>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMOULIOHWWB35F6TZ3BXD3VTEMUNANCNFSM53C4UDWQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Ammar Haydari
PhD Student
UC Davis
|
Hello, Dr.Haydari. I am an undergraduate student engaging in safe RL, and I also tried to implement CSAC/SAC-Lagrangian in pytorch.
I was wondering :
① if it is necessary to apply extra critic networks for 'safety Q value', does it has better performance than constructing actor loss by the cost from off-policy data?
②Have you ploted the lambda training curve? I experienced a monotonic training curve, which is just raise (positve loss) or decend (negative loss), I have noticed that some paper adjust the gradient ascent with max(0, lambda)
I would appreciate it if you could help me.
The text was updated successfully, but these errors were encountered: