You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():
optimizer.zero_grad() loss.backward() for p in list(model.parameters()): if hasattr(p,'org'): p.data.copy_(p.org) optimizer.step() for p in list(model.parameters()): if hasattr(p,'org'): p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you
The text was updated successfully, but these errors were encountered:
Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():
optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you
The text was updated successfully, but these errors were encountered: