Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v3.4.1
Change Log
Feature
- Support
GCSAM
optimizer. (#343, #344)- Gradient Centralized Sharpness Aware Minimization
- you can use it from
SAM
optimizer by settinguse_gc=True
.
- Support
LookSAM
optimizer. (#343, #344)
Update
- Support alternative precision training for
Shampoo
optimizer. (#339) - Add more features to and tune
Ranger25
optimizer. (#340)AGC
+Lookahead
variants- change default beta1, beta2 to 0.95 and 0.98 respectively
- Skip adding
Lookahead
wrapper in case ofRanger*
optimizers, which already have it increate_optimizer()
. (#340) - Improved optimizer visualization. (#345)
- Rename
pytorch_optimizer.optimizer.gc
topytorch_optimizer.optimizer.gradient_centralization
to avoid possible conflict with Python built-in functiongc
. (#349)
Bug
Docs
- Update the visualizations. (#340)
Contributions
thanks to @AidinHamedi
pytorch-optimizer v3.4.0
Change Log
Feature
- Implement
FOCUS
optimizer. (#330, #331) - Implement
PSGD Kron
optimizer. (#336, #337) - Implement
EXAdam
optimizer. (#338, #339)
Update
- Support
OrthoGrad
variant toRanger25
. (#332)Ranger25
optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such asADOPT
+AdEMAMix
+Cautious
+StableAdamW
+Adam-Atan2
+OrthoGrad
.
Fix
- Add the missing
state
property inOrthoGrad
optimizer. (#326, #327) - Add the missing
state_dict
, andload_state_dict
methods toTRAC
andOrthoGrad
optimizers. (#332) - Skip when the gradient is sparse in
OrthoGrad
optimizer. (#332) - Support alternative precision training in
SOAP
optimizer. (#333) - Store SOAP condition matrices as the dtype of their parameters. (#335)
Contributions
thanks to @Vectorrent, @kylevedder
pytorch-optimizer v3.3.4
Change Log
Feature
- Support
OrthoGrad
feature forcreate_optimizer()
. (#324) - Enhanced flexibility for the
optimizer
parameter inLookahead
,TRAC
, andOrthoGrad
optimizers. (#324)- Now supports both torch.optim.Optimizer instances and classes
- You can now use
Lookahead
optimizer in two ways.Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)
Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
- Implement
SPAM
optimizer. (#324) - Implement
TAM
, andAdaTAM
optimizers. (#325)
pytorch-optimizer v3.3.3
Change Log
Feature
- Implement
Grams
optimizer. (#317, #318) - Support
stable_adamw
variant forADOPT
andAdEMAMix
optimizer. (#321)optimizer = ADOPT(model.parameters(), ..., stable_adamw=True)
- Implement an experimental optimizer
Ranger25
(not tested). (#321)- mixing
ADOPT + AdEMAMix + StableAdamW + Cautious + RAdam
optimizers.
- mixing
- Implement
OrthoGrad
optimizer. (#321) - Support
Adam-Atan2
feature forProdigy
optimizer wheneps
is None. (#321)
pytorch-optimizer v3.3.2
pytorch-optimizer v3.3.1
Change Log
Feature
- Support
Cautious
variant toAdaShift
optimizer. (#310) - Save the state of the
Lookahead
optimizer too. (#310) - Implement
APOLLO
optimizer. (#311, #312) - Rename the
Apollo
(An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
) optimizer name toApolloDQN
not to overlap with the new optimizer nameAPOLLO
. (#312) - Implement
MARS
optimizer. (#313, #314) - Support
Cautious
variant toMARS
optimizer. (#314)
Bug
- Fix
bias_correction
inAdamG
optimizer. (#305, #308) - Fix a potential bug when loading the state for
Lookahead
optimizer. (#306, #310)
Docs
Contributions
thanks to @Vectorrent
pytorch-optimizer v3.3.0
Change Log
Feature
- Support
PaLM
variant forScheduleFreeAdamW
optimizer. (#286, #288)- you can use this feature by setting
use_palm
toTrue
.
- you can use this feature by setting
- Implement
ADOPT
optimizer. (#289, #290) - Implement
FTRL
optimizer. (#291) - Implement
Cautious optimizer
feature. (#294)- Improving Training with One Line of Code
- you can use it by setting
cautious=True
forLion
,AdaFactor
andAdEMAMix
optimizers.
- Improve the stability of
ADOPT
optimizer. (#294) - Support a new projection type
random
forGaLoreProjector
. (#294) - Implement
DeMo
optimizer. (#300, #301) - Implement
Muon
optimizer. (#302) - Implement
ScheduleFreeRAdam
optimizer. (#304) - Implement
LaProp
optimizer. (#304) - Support
Cautious
variant toLaProp
,AdamP
,Adopt
optimizers. (#304).
Refactor
- Big refactoring, removing direct import from
pytorch_optimizer.*
.- I removed some methods not to directly import from it from
pytorch_optimzier.*
because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers. pytorch_optimizer.[Shampoo stuff]
->pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff]
.shampoo_utils
likeGraft
,BlockPartitioner
,PreConditioner
, etc. You can check the details here.
pytorch_optimizer.GaLoreProjector
->pytorch_optimizer.optimizers.galore.GaLoreProjector
.pytorch_optimizer.gradfilter_ema
->pytorch_optimizer.optimizers.grokfast.gradfilter_ema
.pytorch_optimizer.gradfilter_ma
->pytorch_optimizer.optimizers.grokfast.gradfilter_ma
.pytorch_optimizer.l2_projection
->pytorch_optimizer.optimizers.alig.l2_projection
.pytorch_optimizer.flatten_grad
->pytorch_optimizer.optimizers.pcgrad.flatten_grad
.pytorch_optimizer.un_flatten_grad
->pytorch_optimizer.optimizers.pcgrad.un_flatten_grad
.pytorch_optimizer.reduce_max_except_dim
->pytorch_optimizer.optimizers.sm3.reduce_max_except_dim
.pytorch_optimizer.neuron_norm
->pytorch_optimizer.optimizers.nero.neuron_norm
.pytorch_optimizer.neuron_mean
->pytorch_optimizer.optimizers.nero.neuron_mean
.
- I removed some methods not to directly import from it from
Docs
- Add more visualizations. (#297)
Bug
- Add optimizer parameter to
PolyScheduler
constructor. (#295)
Contributions
thanks to @tanganke
pytorch-optimizer v3.2.0
Change Log
Feature
- Implement
SOAP
optimizer. (#275) - Support
AdEMAMix
variants. (#276)bnb_ademamix8bit
,bnb_ademamix32bit
,bnb_paged_ademamix8bit
,bnb_paged_ademamix32bit
- Support 8/4bit, fp8 optimizers. (#208, #281)
torchao_adamw8bit
,torchao_adamw4bit
,torchao_adamwfp8
.
- Support a module-name-level (e.g.
LayerNorm
) weight decay exclusion forget_optimizer_parameters
. (#282, #283) - Implement
CPUOffloadOptimizer
, which offloads optimizer to CPU for single-GPU training. (#284) - Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.
Bug
Contributions
thanks to @Vectorrent