From 301d9632274241a321afbcc37984562a1f32a1f5 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 18 Oct 2023 18:06:21 +0000 Subject: [PATCH] build based on 2204479 --- dev/advanced/index.html | 2 +- dev/basics/index.html | 2 +- dev/contributing/index.html | 2 +- dev/examples/docs_0_fw_visualized/index.html | 402 ++++++++-------- .../docs_10_alternating_methods/index.html | 218 ++++----- dev/examples/docs_1_mathopt_lmo/index.html | 332 ++++++------- .../docs_2_polynomial_regression/index.html | 454 +++++++++--------- .../docs_3_matrix_completion/index.html | 426 ++++++++-------- dev/examples/docs_4_rational_opt/index.html | 46 +- dev/examples/docs_5_blended_cg/index.html | 180 +++---- dev/examples/docs_6_spectrahedron/index.html | 164 +++---- .../docs_7_shifted_norm_polytopes/index.html | 162 +++---- .../docs_8_callback_and_tracking/index.html | 2 +- .../docs_9_extra_vertex_storage/index.html | 4 +- dev/index.html | 2 +- dev/reference/0_reference/index.html | 2 +- dev/reference/1_algorithms/index.html | 2 +- dev/reference/2_lmo/index.html | 2 +- dev/reference/3_backend/index.html | 2 +- dev/reference/4_linesearch/index.html | 2 +- dev/search/index.html | 2 +- 21 files changed, 1205 insertions(+), 1205 deletions(-) diff --git a/dev/advanced/index.html b/dev/advanced/index.html index 957cc8a92..ea7944121 100644 --- a/dev/advanced/index.html +++ b/dev/advanced/index.html @@ -73,4 +73,4 @@ Base.:*(scalar::Real, x::IT) Base.:-(x1::IT, x2::IT) LinearAlgebra.dot(x1::IT, x2::IT)

For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.

The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:

FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}

which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:

FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)
-FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)
+FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom) diff --git a/dev/basics/index.html b/dev/basics/index.html index 0e0d18a57..944afda72 100644 --- a/dev/basics/index.html +++ b/dev/basics/index.html @@ -1,2 +1,2 @@ -How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

+How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 17cdee785..39c0e27e7 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -4,4 +4,4 @@ """ function f(x) # ... -end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

+end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

diff --git a/dev/examples/docs_0_fw_visualized/index.html b/dev/examples/docs_0_fw_visualized/index.html index 3d30762e0..84b79823d 100644 --- a/dev/examples/docs_0_fw_visualized/index.html +++ b/dev/examples/docs_0_fw_visualized/index.html @@ -122,119 +122,119 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + +

plot chosen vertices

scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label="v_1")
 scatter!(
     [vertices[2][1]],
@@ -248,121 +248,121 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_10_alternating_methods/index.html b/dev/examples/docs_10_alternating_methods/index.html index ec7edb55d..214185412 100644 --- a/dev/examples/docs_10_alternating_methods/index.html +++ b/dev/examples/docs_10_alternating_methods/index.html @@ -41,17 +41,17 @@ Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- I 1 2.010582e+00 -4.198942e+01 4.400000e+01 2.010582e+00 0.000000e+00 Inf - FW 1000 5.100158e-02 4.913021e-02 1.871364e-03 5.100158e-02 3.203934e+00 3.121162e+02 - FW 2000 5.052592e-02 4.954907e-02 9.768468e-04 5.052592e-02 3.237912e+00 6.176821e+02 - FW 3000 5.035759e-02 4.969300e-02 6.645894e-04 5.035759e-02 3.271389e+00 9.170417e+02 - FW 4000 5.027073e-02 4.976697e-02 5.037594e-04 5.027073e-02 3.304896e+00 1.210326e+03 - FW 5000 5.021791e-02 4.980140e-02 4.165104e-04 5.021791e-02 3.337764e+00 1.498009e+03 - FW 6000 5.018239e-02 4.983318e-02 3.492063e-04 5.018239e-02 3.372031e+00 1.779343e+03 - FW 7000 5.015674e-02 4.985610e-02 3.006417e-04 5.015674e-02 3.404090e+00 2.056350e+03 - FW 8000 5.013751e-02 4.987326e-02 2.642458e-04 5.013751e-02 3.435477e+00 2.328643e+03 - FW 9000 5.012245e-02 4.988670e-02 2.357462e-04 5.012245e-02 3.542657e+00 2.540466e+03 - FW 10000 5.011036e-02 4.989745e-02 2.129161e-04 5.011036e-02 3.572997e+00 2.798771e+03 - Last 10001 5.011035e-02 4.989379e-02 2.165630e-04 5.011035e-02 3.737794e+00 2.675643e+03 + FW 1000 5.100158e-02 4.913021e-02 1.871364e-03 5.100158e-02 2.365112e+00 4.228129e+02 + FW 2000 5.052592e-02 4.954907e-02 9.768468e-04 5.052592e-02 2.379783e+00 8.404127e+02 + FW 3000 5.035759e-02 4.969300e-02 6.645894e-04 5.035759e-02 2.394951e+00 1.252635e+03 + FW 4000 5.027073e-02 4.976697e-02 5.037594e-04 5.027073e-02 2.410855e+00 1.659163e+03 + FW 5000 5.021791e-02 4.980140e-02 4.165104e-04 5.021791e-02 2.426395e+00 2.060670e+03 + FW 6000 5.018239e-02 4.983318e-02 3.492063e-04 5.018239e-02 2.443613e+00 2.455380e+03 + FW 7000 5.015674e-02 4.985610e-02 3.006417e-04 5.015674e-02 2.461153e+00 2.844196e+03 + FW 8000 5.013751e-02 4.987326e-02 2.642458e-04 5.013751e-02 2.478581e+00 3.227653e+03 + FW 9000 5.012245e-02 4.988670e-02 2.357462e-04 5.012245e-02 2.496249e+00 3.605410e+03 + FW 10000 5.011036e-02 4.989745e-02 2.129161e-04 5.011036e-02 2.513818e+00 3.978012e+03 + Last 10001 5.011035e-02 4.989379e-02 2.165630e-04 5.011035e-02 2.635528e+00 3.794686e+03 ---------------------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -63,8 +63,8 @@ Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- I 1 8.287728e-01 -4.317123e+01 4.400000e+01 8.287728e-01 0.000000e+00 Inf - FW 1000 5.000000e-02 4.999153e-02 8.474350e-06 5.000000e-02 1.502725e-01 6.654576e+03 - Last 1445 5.000000e-02 4.999896e-02 1.036019e-06 5.000000e-02 1.670777e-01 8.648673e+03 + FW 1000 5.000000e-02 4.999153e-02 8.474350e-06 5.000000e-02 1.662781e-02 6.014020e+04 + Last 1445 5.000000e-02 4.999896e-02 1.036019e-06 5.000000e-02 2.475182e-02 5.837955e+04 ---------------------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -75,9 +75,9 @@ ---------------------------------------------------------------------------------------------------------------- Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- - I 1 1.470947e+00 -4.252905e+01 4.400000e+01 1.470947e+00 0.000000e+00 Inf - FW 1000 5.000000e-02 4.998663e-02 1.337041e-05 5.000000e-02 4.080015e-02 2.450971e+04 - Last 1531 5.000000e-02 4.999896e-02 1.042885e-06 5.000000e-02 6.641747e-02 2.305117e+04 + I 1 2.234629e+01 -Inf Inf 2.234629e+01 0.000000e+00 Inf + FW 1000 5.000000e-02 4.999099e-02 9.011518e-06 5.000000e-02 2.065141e-02 4.842283e+04 + Last 1455 5.000000e-02 4.999900e-02 1.000546e-06 5.000000e-02 3.039242e-02 4.787378e+04 ----------------------------------------------------------------------------------------------------------------

As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.

_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(
     FrankWolfe.away_frank_wolfe,
     f,
@@ -98,9 +98,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   2.300000e+01           -Inf            Inf   0.000000e+00            Inf              2
-  Last           147   5.000000e-02   4.999914e-02   8.622362e-07   1.302043e+00   1.128995e+02             74
+  Last           147   5.000000e-02   4.999914e-02   8.622362e-07   9.637755e-01   1.525252e+02             74
 ----------------------------------------------------------------------------------------------------------------
-    PP           147   5.000000e-02   4.999914e-02   8.622362e-07   1.380386e+00   1.064920e+02             74
+    PP           147   5.000000e-02   4.999914e-02   8.622362e-07   1.026757e+00   1.431692e+02             74
 ----------------------------------------------------------------------------------------------------------------

Running Alternating Projections

Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.

_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(
     lmos,
     x0,
@@ -118,123 +118,123 @@
 ----------------------------------------------------------------------------------
   Type     Iteration       Dual Gap         Infeas           Time         It/sec
 ----------------------------------------------------------------------------------
-     I             1   7.387930e-01   3.693965e-01   0.000000e+00            Inf
-    FW           100   1.045964e-04   5.000029e-02   1.807312e+00   5.533080e+01
-    FW           200   2.549000e-05   5.000002e-02   2.532547e+00   7.897189e+01
-    FW           300   1.123044e-05   5.000000e-02   3.433586e+00   8.737222e+01
-    FW           400   6.488644e-06   5.000000e-02   4.478935e+00   8.930694e+01
-    FW           500   4.160782e-06   5.000000e-02   5.619675e+00   8.897311e+01
-    FW           600   2.869222e-06   5.000000e-02   6.809893e+00   8.810711e+01
-    FW           700   2.123105e-06   5.000000e-02   8.106202e+00   8.635364e+01
-    FW           800   1.581551e-06   5.000000e-02   9.406900e+00   8.504395e+01
-    FW           900   1.264159e-06   5.000000e-02   1.081679e+01   8.320399e+01
-    FW          1000   1.012869e-06   5.000000e-02   1.221100e+01   8.189339e+01
-  Last          1015   9.893090e-07   5.000000e-02   1.244735e+01   8.154345e+01
+     I             1   7.349846e-01   3.674923e-01   0.000000e+00            Inf
+    FW           100   1.045964e-04   5.000029e-02   1.341553e+00   7.454050e+01
+    FW           200   2.549000e-05   5.000002e-02   1.788896e+00   1.118008e+02
+    FW           300   1.123044e-05   5.000000e-02   2.366663e+00   1.267608e+02
+    FW           400   6.488644e-06   5.000000e-02   3.036871e+00   1.317145e+02
+    FW           500   4.160782e-06   5.000000e-02   3.747880e+00   1.334088e+02
+    FW           600   2.869222e-06   5.000000e-02   4.524747e+00   1.326041e+02
+    FW           700   2.123105e-06   5.000000e-02   5.329606e+00   1.313418e+02
+    FW           800   1.581551e-06   5.000000e-02   6.194734e+00   1.291419e+02
+    FW           900   1.264159e-06   5.000000e-02   7.056032e+00   1.275504e+02
+    FW          1000   1.012869e-06   5.000000e-02   7.961558e+00   1.256036e+02
+  Last          1015   9.893090e-07   5.000000e-02   8.109660e+00   1.251594e+02
 ----------------------------------------------------------------------------------

Plotting the resulting trajectories

labels = ["BCFW - Full", "BCFW - Cyclic", "BCFW - Stochastic", "AFW", "AP"]
 
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_1_mathopt_lmo/index.html b/dev/examples/docs_1_mathopt_lmo/index.html index c6a410d2e..17c75eee8 100644 --- a/dev/examples/docs_1_mathopt_lmo/index.html +++ b/dev/examples/docs_1_mathopt_lmo/index.html @@ -130,191 +130,195 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_2_polynomial_regression/index.html b/dev/examples/docs_2_polynomial_regression/index.html index 253028f68..192417c88 100644 --- a/dev/examples/docs_2_polynomial_regression/index.html +++ b/dev/examples/docs_2_polynomial_regression/index.html @@ -246,256 +246,252 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_3_matrix_completion/index.html b/dev/examples/docs_3_matrix_completion/index.html index b49157d0c..3aad59ba9 100644 --- a/dev/examples/docs_3_matrix_completion/index.html +++ b/dev/examples/docs_3_matrix_completion/index.html @@ -265,240 +265,240 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_4_rational_opt/index.html b/dev/examples/docs_4_rational_opt/index.html index 14a6332b0..5998353a4 100644 --- a/dev/examples/docs_4_rational_opt/index.html +++ b/dev/examples/docs_4_rational_opt/index.html @@ -34,17 +34,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.000000e+00 -1.000000e+00 2.000000e+00 0.000000e+00 Inf - FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 7.357970e-01 1.359071e+01 - FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 7.380141e-01 2.709975e+01 - FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 7.401740e-01 4.053101e+01 - FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 7.429810e-01 5.383718e+01 - FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 7.461145e-01 6.701385e+01 - FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 7.490019e-01 8.010660e+01 - FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 7.525025e-01 9.302295e+01 - FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 7.559547e-01 1.058265e+02 - FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 7.597608e-01 1.184583e+02 - FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 7.637572e-01 1.309317e+02 - Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 7.650420e-01 1.320189e+02 + FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 5.520201e-01 1.811528e+01 + FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 5.534615e-01 3.613621e+01 + FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 5.547215e-01 5.408119e+01 + FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 5.565055e-01 7.187710e+01 + FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 5.580693e-01 8.959461e+01 + FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 5.602716e-01 1.070909e+02 + FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 5.624246e-01 1.244611e+02 + FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 5.651114e-01 1.415650e+02 + FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 5.679758e-01 1.584575e+02 + FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 5.710852e-01 1.751052e+02 + Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 5.720567e-01 1.765559e+02 ------------------------------------------------------------------------------------------------- Output type of solution: BigFloat

Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as $\gamma_t=\frac{\langle \nabla f(x_t),x_t-v_t\rangle}{2L||x_t-v_t||^2}$. However, as this step size depends on an upper bound on the Lipschitz constant $L$ as well as the inner product with the gradient $\nabla f(x_t)$, both have to be of a rational type.

@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(
@@ -67,16 +67,16 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.000000e+00  -1.000000e+00   2.000000e+00   0.000000e+00            Inf
-    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   6.103741e-01   1.638340e+01
-    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   6.127742e-01   3.263845e+01
-    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   6.152931e-01   4.875725e+01
-    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   6.183010e-01   6.469341e+01
-    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   6.218787e-01   8.040153e+01
-    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   6.268345e-01   9.571904e+01
-    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   6.310514e-01   1.109260e+02
-    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   6.357340e-01   1.258388e+02
-    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   6.400374e-01   1.406168e+02
-    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   6.450212e-01   1.550337e+02
-  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   6.460607e-01   1.547842e+02
+    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   4.611304e-01   2.168584e+01
+    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   4.631026e-01   4.318698e+01
+    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   4.649619e-01   6.452142e+01
+    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   4.666648e-01   8.571464e+01
+    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   4.689863e-01   1.066129e+02
+    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   4.715965e-01   1.272274e+02
+    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   4.744272e-01   1.475464e+02
+    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   4.774989e-01   1.675397e+02
+    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   4.808394e-01   1.871727e+02
+    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   4.845339e-01   2.063839e+02
+  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   4.852965e-01   2.060596e+02
 -------------------------------------------------------------------------------------------------
-  1.239368 seconds (1.57 M allocations: 87.975 MiB, 16.15% gc time, 1.38% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

+ 0.769494 seconds (1.57 M allocations: 88.104 MiB, 1.81% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

diff --git a/dev/examples/docs_5_blended_cg/index.html b/dev/examples/docs_5_blended_cg/index.html index 1d3c13704..5580afb3a 100644 --- a/dev/examples/docs_5_blended_cg/index.html +++ b/dev/examples/docs_5_blended_cg/index.html @@ -16,7 +16,7 @@ function grad!(storage, x) return storage .= linear + hessian * x end -L = eigmax(hessian)
250643.69867666412

We run over the probability simplex and call the LMO to get an initial feasible point:

lmo = FrankWolfe.ProbabilitySimplexOracle(1.0);
+L = eigmax(hessian)
250643.6986766641

We run over the probability simplex and call the LMO to get an initial feasible point:

lmo = FrankWolfe.ProbabilitySimplexOracle(1.0);
 x00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))
 
 target_tolerance = 1e-5
@@ -154,116 +154,116 @@
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_6_spectrahedron/index.html b/dev/examples/docs_6_spectrahedron/index.html index 4fdd5a2ab..8b3323a06 100644 --- a/dev/examples/docs_6_spectrahedron/index.html +++ b/dev/examples/docs_6_spectrahedron/index.html @@ -67,7 +67,7 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.018651e+00 1.014119e+00 4.531396e-03 0.000000e+00 Inf - Last 26 1.014314e+00 1.014314e+00 8.598814e-09 2.467651e+00 1.053634e+01 + Last 26 1.014314e+00 1.014314e+00 8.598814e-09 1.884509e+00 1.379670e+01 ------------------------------------------------------------------------------------------------- Lazified Conditional Gradient (Frank-Wolfe + Lazification). @@ -80,113 +80,113 @@ Type Iteration Primal Dual Dual Gap Time It/sec Cache Size ---------------------------------------------------------------------------------------------------------------- I 1 1.018651e+00 1.014119e+00 4.531396e-03 0.000000e+00 Inf 1 - LD 2 1.014317e+00 1.014314e+00 3.679257e-06 2.132371e-01 9.379229e+00 2 - LD 3 1.014315e+00 1.014314e+00 1.036964e-06 3.733917e-01 8.034458e+00 3 - LD 4 1.014315e+00 1.014314e+00 5.090329e-07 4.623221e-01 8.651977e+00 4 - LD 6 1.014314e+00 1.014314e+00 2.019539e-07 5.849577e-01 1.025715e+01 5 - LD 9 1.014314e+00 1.014314e+00 8.396068e-08 7.244230e-01 1.242368e+01 6 - LD 13 1.014314e+00 1.014314e+00 3.872634e-08 8.944469e-01 1.453412e+01 7 - LD 19 1.014314e+00 1.014314e+00 1.766051e-08 1.122667e+00 1.692399e+01 8 - LD 27 1.014314e+00 1.014314e+00 8.603148e-09 1.425066e+00 1.894649e+01 9 - Last 27 1.014314e+00 1.014314e+00 7.988600e-09 1.605047e+00 1.682194e+01 10 + LD 2 1.014317e+00 1.014314e+00 3.679257e-06 1.668336e-01 1.198799e+01 2 + LD 3 1.014315e+00 1.014314e+00 1.036964e-06 2.310038e-01 1.298680e+01 3 + LD 4 1.014315e+00 1.014314e+00 5.090329e-07 2.959035e-01 1.351792e+01 4 + LD 6 1.014314e+00 1.014314e+00 2.019539e-07 3.812349e-01 1.573833e+01 5 + LD 9 1.014314e+00 1.014314e+00 8.396068e-08 4.908067e-01 1.833716e+01 6 + LD 13 1.014314e+00 1.014314e+00 3.872634e-08 6.248155e-01 2.080614e+01 7 + LD 19 1.014314e+00 1.014314e+00 1.766051e-08 8.059228e-01 2.357546e+01 8 + LD 27 1.014314e+00 1.014314e+00 8.603148e-09 1.037223e+00 2.603105e+01 9 + Last 27 1.014314e+00 1.014314e+00 7.988600e-09 1.165440e+00 2.316722e+01 10 ----------------------------------------------------------------------------------------------------------------

Plotting the resulting trajectories

data = [trajectory, trajectory_lazy]
 label = ["FW", "LCG"]
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_7_shifted_norm_polytopes/index.html b/dev/examples/docs_7_shifted_norm_polytopes/index.html index 8db7c8afe..31457c266 100644 --- a/dev/examples/docs_7_shifted_norm_polytopes/index.html +++ b/dev/examples/docs_7_shifted_norm_polytopes/index.html @@ -67,27 +67,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 2.000000e+00 -6.000000e+00 8.000000e+00 0.000000e+00 Inf - FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.514062e-01 3.302375e+02 - FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.517922e-01 6.587955e+02 - FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.521765e-01 9.856978e+02 - FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.526727e-01 1.309992e+03 - FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.530116e-01 1.633863e+03 - FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.535510e-01 1.953748e+03 - FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.540157e-01 2.272495e+03 - FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.543895e-01 2.590849e+03 - FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.547487e-01 2.907940e+03 - FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.551661e-01 3.222353e+03 - FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.555332e-01 3.536222e+03 - FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.558780e-01 3.849164e+03 - FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.562157e-01 4.160913e+03 - FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.565686e-01 4.470883e+03 - FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.569411e-01 4.778863e+03 - FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.573002e-01 5.085817e+03 - FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.577667e-01 5.387702e+03 - FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.581439e-01 5.691019e+03 - FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.587014e-01 5.986085e+03 - FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.591426e-01 6.283673e+03 - Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.594129e-01 6.279292e+03 + FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.146231e-01 4.362124e+02 + FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.149393e-01 8.700247e+02 + FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.152404e-01 1.301627e+03 + FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.155396e-01 1.731009e+03 + FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.158310e-01 2.158318e+03 + FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.161235e-01 2.583457e+03 + FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.164697e-01 3.005074e+03 + FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.167609e-01 3.425805e+03 + FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.170527e-01 3.844423e+03 + FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.173658e-01 4.260186e+03 + FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.176712e-01 4.674042e+03 + FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.179660e-01 5.086213e+03 + FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.182575e-01 5.496482e+03 + FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.185571e-01 5.904330e+03 + FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.188486e-01 6.310551e+03 + FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.191373e-01 6.714943e+03 + FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.194360e-01 7.116784e+03 + FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.197284e-01 7.517016e+03 + FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.200223e-01 7.915198e+03 + FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.203118e-01 8.311739e+03 + Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.204673e-01 8.309311e+03 ------------------------------------------------------------------------------------------------- Final solution: [1.799813188674937, 0.5986834801090863] @@ -102,27 +102,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.300000e+01 -1.900000e+01 3.200000e+01 0.000000e+00 Inf - FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 7.803746e-02 6.407179e+02 - FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 7.844696e-02 1.274747e+03 - FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 7.888705e-02 1.901453e+03 - FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 7.926655e-02 2.523132e+03 - FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 7.968504e-02 3.137352e+03 - FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 8.009654e-02 3.745480e+03 - FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 8.047874e-02 4.348975e+03 - FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 8.098403e-02 4.939245e+03 - FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 8.135933e-02 5.531019e+03 - FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 8.174573e-02 6.116528e+03 - FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 8.212472e-02 6.697131e+03 - FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 8.250702e-02 7.272109e+03 - FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 8.289961e-02 7.840809e+03 - FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 8.326361e-02 8.407034e+03 - FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 8.372361e-02 8.958047e+03 - FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 8.415470e-02 9.506302e+03 - FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 8.453280e-02 1.005527e+04 - FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 8.494140e-02 1.059554e+04 - FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 8.531759e-02 1.113487e+04 - FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 8.568859e-02 1.167017e+04 - Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 8.589359e-02 1.165396e+04 + FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 5.747023e-02 8.700156e+02 + FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 5.777803e-02 1.730762e+03 + FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 5.891743e-02 2.545936e+03 + FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 5.925093e-02 3.375474e+03 + FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 5.955033e-02 4.198129e+03 + FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 5.984563e-02 5.012897e+03 + FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 6.014463e-02 5.819306e+03 + FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 6.043953e-02 6.618185e+03 + FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 6.073573e-02 7.409147e+03 + FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 6.102823e-02 8.192929e+03 + FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 6.132483e-02 8.968634e+03 + FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 6.161993e-02 9.737109e+03 + FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 6.191303e-02 1.049860e+04 + FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 6.220723e-02 1.125271e+04 + FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 6.251664e-02 1.199681e+04 + FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 6.281984e-02 1.273483e+04 + FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 6.311654e-02 1.346715e+04 + FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 6.341113e-02 1.419309e+04 + FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 6.370374e-02 1.491278e+04 + FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 6.399744e-02 1.562563e+04 + Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 6.415464e-02 1.560293e+04 ------------------------------------------------------------------------------------------------- Final solution: [2.0005598087769556, 0.9763463450796975]

We plot the polytopes alongside the solutions from above:

xcoord1 = [1, 3, 1, -1, 1]
@@ -158,53 +158,53 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_8_callback_and_tracking/index.html b/dev/examples/docs_8_callback_and_tracking/index.html index 865f6f57e..fe9a8871c 100644 --- a/dev/examples/docs_8_callback_and_tracking/index.html +++ b/dev/examples/docs_8_callback_and_tracking/index.html @@ -79,4 +79,4 @@ total_iterations = 500 tf.counter = 501 tgrad!.counter = 501 -tlmo_prob.counter = 13

This page was generated using Literate.jl.

+tlmo_prob.counter = 13

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_9_extra_vertex_storage/index.html b/dev/examples/docs_9_extra_vertex_storage/index.html index e4630a1e4..7fa64c81e 100644 --- a/dev/examples/docs_9_extra_vertex_storage/index.html +++ b/dev/examples/docs_9_extra_vertex_storage/index.html @@ -22,7 +22,7 @@ epsilon=1e-5, add_dropped_vertices=true, extra_vertex_storage=vertex_storage, -)
([0.40888376803682813, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.2466788176214311, 0.0, 0.0, 0.0, 0.38566923031085226, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 2114.0128437669537, 4.745775129322283e-6, Any[], Tuple{Float64, FrankWolfe.ScaledHotVector{Float64}}[(0.0034752189937032707, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.11153326596120185, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.10971876316086877, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.10675074668229552, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.1007720972005921, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.09508924838065771, [4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.08969051867694239, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.3, 0.0, 0.0, 0.0]), (0.08567891652650678, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.07574141924241908, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.06090158490898265, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.057367166888704905, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.05495942795962819, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.048321625417496884, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])])

The counter indicates the number of initial calls to the LMO. We will now construct different objective functions based on new centers, call the BPCG algorithm while accumulating vertices in the storage, in addition to warm-starting with the active set of the previous iteration. This allows for a "double-warmstarted" algorithm, reducing the number of LMO calls from one problem to the next.

active_set = results[end]
+)
([0.40888376803682847, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.2466788176214298, 0.0, 0.0, 0.0, 0.3856692303108522, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 2114.0128437669537, 4.74577513642771e-6, Any[], Tuple{Float64, FrankWolfe.ScaledHotVector{Float64}}[(0.0034752189937033804, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.11153326596120189, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.1097187631608688, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.1067507466822955, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.10077209720059209, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.09508924838065778, [4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.08969051867694237, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.3, 0.0, 0.0, 0.0]), (0.08567891652650678, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.07574141924241867, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.060901584908982745, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.05736716688870461, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 4.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.05495942795962834, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), (0.04832162541749693, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])])

The counter indicates the number of initial calls to the LMO. We will now construct different objective functions based on new centers, call the BPCG algorithm while accumulating vertices in the storage, in addition to warm-starting with the active set of the previous iteration. This allows for a "double-warmstarted" algorithm, reducing the number of LMO calls from one problem to the next.

active_set = results[end]
 tlmo.counter
 
 for iter in 1:10
@@ -66,4 +66,4 @@
 [ Info: Number of LMO calls in iter 9: 16
 [ Info: Vertex storage size: 77
 [ Info: Number of LMO calls in iter 10: 15
-[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

+[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index 48cfbc9c2..720e92a17 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ...

If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:

using Plots
 using FrankWolfe
 
-include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl"))
+include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl")) diff --git a/dev/reference/0_reference/index.html b/dev/reference/0_reference/index.html index e2c3de52a..b99260cb5 100644 --- a/dev/reference/0_reference/index.html +++ b/dev/reference/0_reference/index.html @@ -1,2 +1,2 @@ -API Reference · FrankWolfe.jl +API Reference · FrankWolfe.jl diff --git a/dev/reference/1_algorithms/index.html b/dev/reference/1_algorithms/index.html index 1be26780b..1d445dc7e 100644 --- a/dev/reference/1_algorithms/index.html +++ b/dev/reference/1_algorithms/index.html @@ -1,2 +1,2 @@ -Algorithms · FrankWolfe.jl

Algorithms

This section contains all main algorithms of the package. These are the ones typical users will call.

The typical signature for these algorithms is:

my_algorithm(f, grad!, lmo, x0)

Standard algorithms

FrankWolfe.frank_wolfeMethod
frank_wolfe(f, grad!, lmo, x0; ...)

Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x final iterate
  • v last vertex from the LMO
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.
source
FrankWolfe.stochastic_frank_wolfeMethod
stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

source
FrankWolfe.block_coordinate_frank_wolfeFunction
block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.

The method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.

See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

source

Active-set based methods

The following algorithms maintain the representation of the iterates as a convex combination of vertices.

Away-step

Blended Conditional Gradient

FrankWolfe.blended_conditional_gradientMethod
blended_conditional_gradient(f, grad!, lmo, x0)

Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

source
FrankWolfe.build_reduced_problemMethod
build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

source
FrankWolfe.lp_separation_oracleMethod

Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

source
FrankWolfe.minimize_over_convex_hull!Method
minimize_over_convex_hull!

Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

source
FrankWolfe.simplex_gradient_descent_over_convex_hullFunction
simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

source

Blended Pairwise Conditional Gradient

Alternating Methods

Problems over intersections of convex sets, i.e.

\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

pose a challenge as one has to combine the information of two or more LMOs.

FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

FrankWolfe.alternating_linear_minimizationMethod
alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source
FrankWolfe.alternating_projectionsMethod
alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source

Index

    +Algorithms · FrankWolfe.jl

    Algorithms

    This section contains all main algorithms of the package. These are the ones typical users will call.

    The typical signature for these algorithms is:

    my_algorithm(f, grad!, lmo, x0)

    Standard algorithms

    FrankWolfe.frank_wolfeMethod
    frank_wolfe(f, grad!, lmo, x0; ...)

    Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x final iterate
    • v last vertex from the LMO
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.
    source
    FrankWolfe.stochastic_frank_wolfeMethod
    stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

    Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

    Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

    Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

    source
    FrankWolfe.block_coordinate_frank_wolfeFunction
    block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

    Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.

    The method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.

    See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

    source

    Active-set based methods

    The following algorithms maintain the representation of the iterates as a convex combination of vertices.

    Away-step

    Blended Conditional Gradient

    FrankWolfe.blended_conditional_gradientMethod
    blended_conditional_gradient(f, grad!, lmo, x0)

    Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

    source
    FrankWolfe.build_reduced_problemMethod
    build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

    Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

    f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

    In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

    source
    FrankWolfe.lp_separation_oracleMethod

    Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

    inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

    source
    FrankWolfe.minimize_over_convex_hull!Method
    minimize_over_convex_hull!

    Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

    It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

    source
    FrankWolfe.simplex_gradient_descent_over_convex_hullFunction
    simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

    Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

    source

    Blended Pairwise Conditional Gradient

    Alternating Methods

    Problems over intersections of convex sets, i.e.

    \[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

    pose a challenge as one has to combine the information of two or more LMOs.

    FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

    FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

    FrankWolfe.alternating_linear_minimizationMethod
    alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source
    FrankWolfe.alternating_projectionsMethod
    alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source

    Index

      diff --git a/dev/reference/2_lmo/index.html b/dev/reference/2_lmo/index.html index f6e161642..8ca766be2 100644 --- a/dev/reference/2_lmo/index.html +++ b/dev/reference/2_lmo/index.html @@ -1,2 +1,2 @@ -Linear Minimization Oracles · FrankWolfe.jl

      Linear Minimization Oracles

      The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

      \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

      See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

      Interface and wrappers

      FrankWolfe.LinearMinimizationOracleType

      Supertype for linear minimization oracles.

      All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

      source

      All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

      FrankWolfe.compute_extreme_pointFunction
      compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

      Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

      source

      We also provide some meta-LMOs wrapping another one with extended behavior:

      FrankWolfe.CachedLinearMinimizationOracleType
      CachedLinearMinimizationOracle{LMO}

      Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

      By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

      source
      FrankWolfe.SingleLastCachedLMOType
      SingleLastCachedLMO{LMO, VT}

      Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

      source
      FrankWolfe.MultiCacheLMOType
      MultiCacheLMO{N, LMO, A}

      Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

      source
      FrankWolfe.VectorCacheLMOType
      VectorCacheLMO{LMO, VT}

      Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

      source

      Norm balls

      FrankWolfe.EllipsoidLMOType
      EllipsoidLMO(A, c, r)

      Linear minimization over an ellipsoid centered at c of radius r:

      x: (x - c)^T A (x - c) ≤ r

      The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

      source
      FrankWolfe.KNormBallLMOType
      KNormBallLMO{T}(K::Int, right_hand_side::T)

      LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

      C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

      with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

      source
      FrankWolfe.LpNormLMOType
      LpNormLMO{T, p}(right_hand_side)

      LMO with feasible set being an L-p norm ball:

      C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
      source
      FrankWolfe.NuclearNormLMOType
      NuclearNormLMO{T}(radius)

      LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

      source
      FrankWolfe.SpectraplexLMOType
      SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

      Feasible set

      {X ∈ 𝕊_n^+, trace(X) == radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source
      FrankWolfe.UnitSpectrahedronLMOType
      UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

      Feasible set of PSD matrices with bounded trace:

      {X ∈ 𝕊_n^+, trace(X) ≤ radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source

      Simplex

      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

      source

      Polytope

      FrankWolfe.BirkhoffPolytopeLMOType
      BirkhoffPolytopeLMO

      The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

      source
      FrankWolfe.KSparseLMOType
      KSparseLMO{T}(K::Int, right_hand_side::T)

      LMO for the K-sparse polytope:

      C = B_1(τK) ∩ B_∞(τ)

      with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

      source
      FrankWolfe.ScaledBoundL1NormBallType
      ScaledBoundL1NormBall(lower_bounds, upper_bounds)

      Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

      source
      FrankWolfe.ScaledBoundLInfNormBallType
      ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

      Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

      source

      MathOptInterface

      FrankWolfe.MathOptLMOType
      MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

      Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

      The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

      The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

      source
      FrankWolfe.convert_mathoptFunction
      convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

      Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

      source

      Index

        +Linear Minimization Oracles · FrankWolfe.jl

        Linear Minimization Oracles

        The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

        \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

        See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

        Interface and wrappers

        FrankWolfe.LinearMinimizationOracleType

        Supertype for linear minimization oracles.

        All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

        source

        All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

        FrankWolfe.compute_extreme_pointFunction
        compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

        Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

        source

        We also provide some meta-LMOs wrapping another one with extended behavior:

        FrankWolfe.CachedLinearMinimizationOracleType
        CachedLinearMinimizationOracle{LMO}

        Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

        By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

        source
        FrankWolfe.SingleLastCachedLMOType
        SingleLastCachedLMO{LMO, VT}

        Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

        source
        FrankWolfe.MultiCacheLMOType
        MultiCacheLMO{N, LMO, A}

        Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

        source
        FrankWolfe.VectorCacheLMOType
        VectorCacheLMO{LMO, VT}

        Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

        source

        Norm balls

        FrankWolfe.EllipsoidLMOType
        EllipsoidLMO(A, c, r)

        Linear minimization over an ellipsoid centered at c of radius r:

        x: (x - c)^T A (x - c) ≤ r

        The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

        source
        FrankWolfe.KNormBallLMOType
        KNormBallLMO{T}(K::Int, right_hand_side::T)

        LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

        C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

        with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

        source
        FrankWolfe.LpNormLMOType
        LpNormLMO{T, p}(right_hand_side)

        LMO with feasible set being an L-p norm ball:

        C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
        source
        FrankWolfe.NuclearNormLMOType
        NuclearNormLMO{T}(radius)

        LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

        source
        FrankWolfe.SpectraplexLMOType
        SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

        Feasible set

        {X ∈ 𝕊_n^+, trace(X) == radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source
        FrankWolfe.UnitSpectrahedronLMOType
        UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

        Feasible set of PSD matrices with bounded trace:

        {X ∈ 𝕊_n^+, trace(X) ≤ radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source

        Simplex

        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

        source

        Polytope

        FrankWolfe.BirkhoffPolytopeLMOType
        BirkhoffPolytopeLMO

        The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

        source
        FrankWolfe.KSparseLMOType
        KSparseLMO{T}(K::Int, right_hand_side::T)

        LMO for the K-sparse polytope:

        C = B_1(τK) ∩ B_∞(τ)

        with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

        source
        FrankWolfe.ScaledBoundL1NormBallType
        ScaledBoundL1NormBall(lower_bounds, upper_bounds)

        Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

        source
        FrankWolfe.ScaledBoundLInfNormBallType
        ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

        Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

        source

        MathOptInterface

        FrankWolfe.MathOptLMOType
        MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

        Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

        The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

        The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

        source
        FrankWolfe.convert_mathoptFunction
        convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

        Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

        source

        Index

          diff --git a/dev/reference/3_backend/index.html b/dev/reference/3_backend/index.html index 78b86c72c..16d2bf3eb 100644 --- a/dev/reference/3_backend/index.html +++ b/dev/reference/3_backend/index.html @@ -1,2 +1,2 @@ -Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Function
          active_set_update!(active_set::ActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::ActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example "Tracking number of calls to different oracles".

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.

          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, l)

          Returns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Index

          +Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Function
          active_set_update!(active_set::ActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::ActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example "Tracking number of calls to different oracles".

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.

          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, l)

          Returns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Index

          diff --git a/dev/reference/4_linesearch/index.html b/dev/reference/4_linesearch/index.html index 6600cd658..cc5dc399b 100644 --- a/dev/reference/4_linesearch/index.html +++ b/dev/reference/4_linesearch/index.html @@ -1,2 +1,2 @@ -Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l ≥ 4 is advised only for strongly convex sets, see:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          +Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l ≥ 4 is advised only for strongly convex sets, see:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          diff --git a/dev/search/index.html b/dev/search/index.html index dc1751fad..91967c1f5 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · FrankWolfe.jl +Search · FrankWolfe.jl