You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the idea of gradient flow together with an implicit ODE solver can give approximative profiles. This can in turn be used with mini-batching, maybe when combined with some kind of memory idea as implemented in adaptive SGD optimizers.
Something like this should be tested and implemented soon, as this might make approximative profiles possible for very, very large models with very, very much data...
The text was updated successfully, but these errors were encountered:
Using the idea of gradient flow together with an implicit ODE solver can give approximative profiles. This can in turn be used with mini-batching, maybe when combined with some kind of memory idea as implemented in adaptive SGD optimizers.
Something like this should be tested and implemented soon, as this might make approximative profiles possible for very, very large models with very, very much data...
The text was updated successfully, but these errors were encountered: