You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@lessw2020 Thanks for this awesome optimizer. I´m very excited about it!
There is one particular workload that trains using a batch of 1 item.
Theoretically, make sense to use RAdam (Rectified Adam), LookAhead, and GC in this context?
I´m thinking about it, read the papers but I still could not make a conclusion. As you (or any other person here) is much more experienced than me, do you have an option on this?
The text was updated successfully, but these errors were encountered:
Hi @bratao - it would still make sense to use, but my recommendation is to run with MABN - moving average batch norm.
This creates a moving average across batches and they show for example a batch size of 2 can get same accuracy as batch size 32, vs normally there is a large drop.
I am planning to test it out this week so I don't have proof it works yet but paper looks strong and idea is solid. https://arxiv.org/abs/2001.06838
Their code is linked there though it needs to likely be extractedout of their framework as I recall.
Anyway it's on my todo list and maybe can pull it out and make it a pluggable item.
Regardless that is the best way imo to address the batch size 1 issue
Hope that helps!
I'll leave this open to use to track my testing results on mabn and please post if you use it before I get to it :)
@lessw2020 Thanks for this awesome optimizer. I´m very excited about it!
There is one particular workload that trains using a batch of 1 item.
Theoretically, make sense to use RAdam (Rectified Adam), LookAhead, and GC in this context?
I´m thinking about it, read the papers but I still could not make a conclusion. As you (or any other person here) is much more experienced than me, do you have an option on this?
The text was updated successfully, but these errors were encountered: