Is Gradient Clippping in the code as it is on the paper? #29

sangmain · 2021-10-25T13:11:06Z

Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():

optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you

The text was updated successfully, but these errors were encountered:

whubaichuan · 2024-03-09T15:37:11Z

good question. there is no gradient clip i think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Gradient Clippping in the code as it is on the paper? #29

Is Gradient Clippping in the code as it is on the paper? #29

sangmain commented Oct 25, 2021 •

edited

Loading

whubaichuan commented Mar 9, 2024

Is Gradient Clippping in the code as it is on the paper? #29

Is Gradient Clippping in the code as it is on the paper? #29

Comments

sangmain commented Oct 25, 2021 • edited Loading

whubaichuan commented Mar 9, 2024

sangmain commented Oct 25, 2021 •

edited

Loading