Skip to content

Releases: kozistr/pytorch_optimizer

pytorch-optimizer v3.4.1

14 Feb 11:57
00fbae0
Compare
Choose a tag to compare

Change Log

Feature

Update

  • Support alternative precision training for Shampoo optimizer. (#339)
  • Add more features to and tune Ranger25 optimizer. (#340)
    • AGC + Lookahead variants
    • change default beta1, beta2 to 0.95 and 0.98 respectively
  • Skip adding Lookahead wrapper in case of Ranger* optimizers, which already have it in create_optimizer(). (#340)
  • Improved optimizer visualization. (#345)
  • Rename pytorch_optimizer.optimizer.gc to pytorch_optimizer.optimizer.gradient_centralization to avoid possible conflict with Python built-in function gc. (#349)

Bug

  • Fix to update exp_avg_sq after calculating the denominator in ADOPT optimizer. (#346, #347)

Docs

  • Update the visualizations. (#340)

Contributions

thanks to @AidinHamedi

pytorch-optimizer v3.4.0

02 Feb 05:07
8da7b49
Compare
Choose a tag to compare

Change Log

Feature

Update

  • Support OrthoGrad variant to Ranger25. (#332)
    • Ranger25 optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such as ADOPT + AdEMAMix + Cautious + StableAdamW + Adam-Atan2 + OrthoGrad.

Fix

  • Add the missing state property in OrthoGrad optimizer. (#326, #327)
  • Add the missing state_dict, and load_state_dict methods to TRAC and OrthoGrad optimizers. (#332)
  • Skip when the gradient is sparse in OrthoGrad optimizer. (#332)
  • Support alternative precision training in SOAP optimizer. (#333)
  • Store SOAP condition matrices as the dtype of their parameters. (#335)

Contributions

thanks to @Vectorrent, @kylevedder

pytorch-optimizer v3.3.4

19 Jan 06:31
55c3553
Compare
Choose a tag to compare

Change Log

Feature

  • Support OrthoGrad feature for create_optimizer(). (#324)
  • Enhanced flexibility for the optimizer parameter in Lookahead, TRAC, and OrthoGrad optimizers. (#324)
    • Now supports both torch.optim.Optimizer instances and classes
    • You can now use Lookahead optimizer in two ways.
      • Lookahead(AdamW(model.parameters(), lr=1e-3), k=5, alpha=0.5)
      • Lookahead(AdamW, k=5, alpha=0.5, params=model.parameters())
  • Implement SPAM optimizer. (#324)
  • Implement TAM, and AdaTAM optimizers. (#325)

pytorch-optimizer v3.3.3

13 Jan 16:07
5baa713
Compare
Choose a tag to compare

Change Log

Feature

pytorch-optimizer v3.3.2

21 Dec 10:38
8f538d4
Compare
Choose a tag to compare

Change Log

Feature

Bug

  • Clone exp_avg before calling apply_cautious not to mask exp_avg. (#316)

pytorch-optimizer v3.3.1

21 Dec 07:20
d16a368
Compare
Choose a tag to compare

Change Log

Feature

Bug

  • Fix bias_correction in AdamG optimizer. (#305, #308)
  • Fix a potential bug when loading the state for Lookahead optimizer. (#306, #310)

Docs

Contributions

thanks to @Vectorrent

pytorch-optimizer v3.3.0

06 Dec 14:44
5def5d7
Compare
Choose a tag to compare

Change Log

Feature

Refactor

  • Big refactoring, removing direct import from pytorch_optimizer.*.
    • I removed some methods not to directly import from it from pytorch_optimzier.* because they're probably not used frequently and actually not an optimizer rather utils only used for specific optimizers.
    • pytorch_optimizer.[Shampoo stuff] -> pytorch_optimizer.optimizers.shampoo_utils.[Shampoo stuff].
      • shampoo_utils like Graft, BlockPartitioner, PreConditioner, etc. You can check the details here.
    • pytorch_optimizer.GaLoreProjector -> pytorch_optimizer.optimizers.galore.GaLoreProjector.
    • pytorch_optimizer.gradfilter_ema -> pytorch_optimizer.optimizers.grokfast.gradfilter_ema.
    • pytorch_optimizer.gradfilter_ma -> pytorch_optimizer.optimizers.grokfast.gradfilter_ma.
    • pytorch_optimizer.l2_projection -> pytorch_optimizer.optimizers.alig.l2_projection.
    • pytorch_optimizer.flatten_grad -> pytorch_optimizer.optimizers.pcgrad.flatten_grad.
    • pytorch_optimizer.un_flatten_grad -> pytorch_optimizer.optimizers.pcgrad.un_flatten_grad.
    • pytorch_optimizer.reduce_max_except_dim -> pytorch_optimizer.optimizers.sm3.reduce_max_except_dim.
    • pytorch_optimizer.neuron_norm -> pytorch_optimizer.optimizers.nero.neuron_norm.
    • pytorch_optimizer.neuron_mean -> pytorch_optimizer.optimizers.nero.neuron_mean.

Docs

  • Add more visualizations. (#297)

Bug

  • Add optimizer parameter to PolyScheduler constructor. (#295)

Contributions

thanks to @tanganke

pytorch-optimizer v3.2.0

28 Oct 23:30
a59f2e1
Compare
Choose a tag to compare

Change Log

Feature

  • Implement SOAP optimizer. (#275)
  • Support AdEMAMix variants. (#276)
    • bnb_ademamix8bit, bnb_ademamix32bit, bnb_paged_ademamix8bit, bnb_paged_ademamix32bit
  • Support 8/4bit, fp8 optimizers. (#208, #281)
    • torchao_adamw8bit, torchao_adamw4bit, torchao_adamwfp8.
  • Support a module-name-level (e.g. LayerNorm) weight decay exclusion for get_optimizer_parameters. (#282, #283)
  • Implement CPUOffloadOptimizer, which offloads optimizer to CPU for single-GPU training. (#284)
  • Support a regex-based filter for searching names of optimizers, lr schedulers, and loss functions.

Bug

  • Fix should_grokfast condition when initialization. (#279, #280)

Contributions

thanks to @Vectorrent

pytorch-optimizer v3.1.2

10 Sep 10:58
9d5e181
Compare
Choose a tag to compare

Change Log

Feature

Bug

  • Add **kwargs to the parameters for dummy placeholder. (#270, #271)

pytorch-optimizer v3.1.1

14 Aug 09:47
a8eb19c
Compare
Choose a tag to compare

Change Log

Feature

Bug

  • Handle the optimizers that only take the model instead of the parameters in create_optimizer(). (#263)
  • Move the variable to the same device with the parameter. (#266, #267)