GeoTorch

A library for constrained optimization and manifold optimization for deep learning in PyTorch

Overview

GeoTorch provides a simple way to perform constrained optimization and optimization on manifolds in PyTorch. It is compatible out of the box with any optimizer, layer, and model implemented in PyTorch without any boilerplate in the training code. Just state the constraints when you construct the model and you are ready to go!

import torch
import torch.nn as nn
import geotorch

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        # One line suffices: Instantiate a linear layer with orthonormal columns
        self.linear = nn.Linear(64, 128)
        geotorch.orthogonal(self.linear, "weight")

        # Works with tensors: Instantiate a CNN with kernels of rank 1
        self.cnn = nn.Conv2d(16, 32, 3)
        geotorch.low_rank(self.cnn, "weight", rank=1)

        # Weights are initialized to a random value when you put the constraints, but
        # you may re-initialize them to a different value by assigning to them
        self.linear.weight = torch.eye(128, 64)
        # And that's all you need to do. The rest is regular PyTorch code

    def forward(self, x):
        # self.linear is orthogonal and every 3x3 kernel in self.cnn is of rank 1

# Use the model as you would normally do. Everything just works
model = Model().cuda()

# Use your optimizer of choice. Any optimizer works out of the box with any parametrization
optim = torch.optim.Adam(model.parameters(), lr=lr)

Constraints

The following constraints are implemented and may be used as in the example above:

geotorch.symmetric. Symmetric matrices
geotorch.skew. Skew-symmetric matrices
geotorch.sphere. Vectors of norm 1
geotorch.orthogonal. Matrices with orthogonal columns
geotorch.grassmannian. Skew-symmetric matrices
geotorch.almost_orthogonal(λ). Matrices with singular values in the interval [1-λ, 1+λ]
geotorch.invertible. Invertible matrices with positive determinant
geotorch.sln. Matrices of determinant equal to 1
geotorch.low_rank(r). Matrices of rank at most r
geotorch.fixed_rank(r). Matrices of rank r
geotorch.positive_definite. Positive definite matrices
geotorch.positive_semidefinite. Positive semidefinite matrices
geotorch.positive_semidefinite_low_rank(r). Positive semidefinite matrices of rank at most r
geotorch.positive_semidefinite_fixed_rank(r). Positive semidefinite matrices of rank r

Each of these constraints have some extra parameters which can be used to tailor the behavior of each constraint to the problem in hand. For more on this, see the documentation.

These constraints are a fronted for the families of spaces listed below.

Supported Spaces

Each constraint in GeoTorch is implemented as a manifold. These give the user more flexibility on the options that they choose for each parametrization. All these support Riemannian Gradient Descent (more on this here), but they also support optimization via any other PyTorch optimizer.

GeoTorch currently supports the following spaces:

Rn(n): Rⁿ. Unrestricted optimization
Sym(n): Vector space of symmetric matrices
Skew(n): Vector space of skew-symmetric matrices
Sphere(n): Sphere in Rⁿ. { x ∈ Rⁿ | ||x|| = 1 } ⊂ Rⁿ
SO(n): Manifold of n×n orthogonal matrices
St(n,k): Manifold of n×k matrices with orthonormal columns
AlmostOrthogonal(n,k,λ): Manifold of n×k matrices with singular values in the interval [1-λ, 1+λ]
Gr(n,k): Manifold of k-dimensional subspaces in Rⁿ
GLp(n): Manifold of invertible n×n matrices with positive determinant
SL(n): Manifold of n×n matrices with determinant equal to 1
LowRank(n,k,r): Variety of n×k matrices of rank r or less
FixedRank(n,k,r): Manifold of n×k matrices of rank r
PSD(n): Cone of n×n symmetric positive definite matrices
PSSD(n): Cone of n×n symmetric positive semi-definite matrices
PSSDLowRank(n,r): Variety of n×n symmetric positive semi-definite matrices of rank r or less
PSSDFixedRank(n,r): Manifold of n×n symmetric positive semi-definite matrices of rank r
ProductManifold(M₁, ..., Mₖ): Product of manifolds M₁ × ... × Mₖ

Every space of dimension (n, k) can be applied to tensors of shape (*, n, k), so we also get efficient parallel implementations of product spaces such as

ObliqueManifold(n,k): Matrix with unit length columns, Sⁿ⁻¹ × ...ᵏ⁾ × Sⁿ⁻¹

Using GeoTorch in your Code

The files in examples/copying_problem.py and examples/sequential_mnist.py serve as tutorials to see how to handle the initialization and usage of GeoTorch in some real code. They also show how to implement Riemannian Gradient Descent and some other tricks. For an introduction to how the library is actually implemented, see the Jupyter Notebook examples/parametrisations.ipynb.

You may try GeoTorch installing it as

pip install git+https://github.com/Lezcano/geotorch/

GeoTorch is tested in Linux, Mac, and Windows environments for Python >= 3.6 and supports PyTorch >= 1.9

Sharing Weights, Parametrizations, and Normalizing Flows

If one wants to use a parametrized tensor in different places in their model, or uses one parametrized layer many times, for example in an RNN, it is recommended to wrap the forward pass as follows to avoid each parametrization to be computed many times:

with geotorch.parametrize.cached():
    logits = model(input_)

Of course, this with statement may be used simply inside the forward function where the parametrized layer is used several times.

These ideas fall in the context of parametrized optimization, where one wraps a tensor X with a function f, and rather than using X, uses f(X). Particular examples of this idea are pruning, weight normalization, and spectral normalization among others. This repository implements a framework to approach this kind of problems. This framework was accepted to core PyTorch 1.8. It can be found under torch.nn.utils.parametrize and torch.nn.utils.parametrizations. When using PyTorch 1.10 or higher, the native PyTorch functions are used within GeoTorch. In this case, the user can interact with the parametrizations in GeoTorch using the PyTorch functions.

As every space in GeoTorch is, at its core, a map from a flat space into a manifold, the tools implemented here also serve as a building block in normalizing flows. Using a factorized space such as LowRank(n,k,r) it is direct to compute the determinant of the transformation it defines, as we have direct access to the singular values of the layer.

Bibliography

Please cite the following work if you found GeoTorch useful. This paper exposes a simplified mathematical explanation of part of the inner-workings of GeoTorch.

@inproceedings{lezcano2019trivializations,
    title = {Trivializations for gradient-based optimization on manifolds},
    author = {Lezcano-Casado, Mario},
    booktitle={Advances in Neural Information Processing Systems, NeurIPS},
    pages = {9154--9164},
    year = {2019},
}

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
geotorch		geotorch
test		test
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.rst		README.rst
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoTorch

Overview

Constraints

Supported Spaces

Using GeoTorch in your Code

Sharing Weights, Parametrizations, and Normalizing Flows

Bibliography

About

Releases 3

Languages

License

lezcano/geotorch

Folders and files

Latest commit

History

Repository files navigation

GeoTorch

Overview

Constraints

Supported Spaces

Using GeoTorch in your Code

Sharing Weights, Parametrizations, and Normalizing Flows

Bibliography

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Languages