diff --git a/dev/advanced/index.html b/dev/advanced/index.html index 73c684e28..957cc8a92 100644 --- a/dev/advanced/index.html +++ b/dev/advanced/index.html @@ -73,4 +73,4 @@ Base.:*(scalar::Real, x::IT) Base.:-(x1::IT, x2::IT) LinearAlgebra.dot(x1::IT, x2::IT)

For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.

The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:

FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}

which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:

FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)
-FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)
+FrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom) diff --git a/dev/basics/index.html b/dev/basics/index.html index bb14e6436..0e0d18a57 100644 --- a/dev/basics/index.html +++ b/dev/basics/index.html @@ -1,2 +1,2 @@ -How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

+How does it work? · FrankWolfe.jl

How does it work?

FrankWolfe.jl contains generic routines to solve optimization problems of the form

\[\min_{x \in \mathcal{C}} f(x)\]

where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:

\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]

Linear Minimization Oracles

The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:

\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]

Custom LMOs

To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:

compute_extreme_point(lmo::LMO, direction; kwargs...) -> v

This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.

Pre-defined LMOs

If you don't want to define your LMO manually, several common implementations are available out-of-the-box:

  • Simplices: unit simplex, probability simplex
  • Balls in various norms
  • Polytopes: K-sparse, Birkhoff

You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.

Finally, we provide wrappers to combine oracles easily, for example in a product.

See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.

Optimization algorithms

The package features several variants of Frank-Wolfe that share the same basic API.

Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).

Standard Frank-Wolfe (FW)

It is implemented in the frank_wolfe function.

See Jaggi (2013) for an overview.

This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).

Away-step Frank-Wolfe (AFW)

It is implemented in the away_frank_wolfe function.

See Lacoste-Julien, Jaggi (2015) for an overview.

Stochastic Frank-Wolfe (SFW)

It is implemented in the FrankWolfe.stochastic_frank_wolfe function.

Blended Conditional Gradients (BCG)

It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.

See Braun, Pokutta, Tu, Wright (2018).

Blended Pairwise Conditional Gradients (BPCG)

It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.

See Tsuji, Tanaka, Pokutta (2021)

Comparison

The following table compares the characteristics of the algorithms presented in the package:

AlgorithmProgress/IterationTime/IterationSparsityNumerical StabilityActive SetLazifiable
FWLowLowLowHighNoYes
AFWMediumMedium-HighMediumMedium-HighYesYes
B(P)CGHighMedium-HighHighMediumYesBy design
SFWLowLowLowHighNoNo

While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:

FW vs AFW.

Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.

Block-Coordinate Frank-Wolfe (BCFW)

It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.

See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.

Alternating Linear Minimization (ALM)

It is implemented in the FrankWolfe.alternating_linear_minimization function.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index bd638d36c..17cdee785 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -4,4 +4,4 @@ """ function f(x) # ... -end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

+end

Provide a new example or test

If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.

The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.

Provide a new feature

Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.

Some typical features to implement are:

  1. A new Linear Minimization Oracle (LMO)
  2. A new step size
  3. A new algorithm (less frequent) following the same API.

Code style

We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.

This contribution guide was inspired by ColPrac and the one in Manopt.jl.

diff --git a/dev/examples/docs_0_fw_visualized/index.html b/dev/examples/docs_0_fw_visualized/index.html index f677afb39..3d30762e0 100644 --- a/dev/examples/docs_0_fw_visualized/index.html +++ b/dev/examples/docs_0_fw_visualized/index.html @@ -122,119 +122,119 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + +

plot chosen vertices

scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label="v_1")
 scatter!(
     [vertices[2][1]],
@@ -248,121 +248,121 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_10_alternating_methods/index.html b/dev/examples/docs_10_alternating_methods/index.html index bad9f3778..ec7edb55d 100644 --- a/dev/examples/docs_10_alternating_methods/index.html +++ b/dev/examples/docs_10_alternating_methods/index.html @@ -41,17 +41,17 @@ Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- I 1 2.010582e+00 -4.198942e+01 4.400000e+01 2.010582e+00 0.000000e+00 Inf - FW 1000 5.100158e-02 4.913021e-02 1.871364e-03 5.100158e-02 3.244263e+00 3.082364e+02 - FW 2000 5.052592e-02 4.954907e-02 9.768468e-04 5.052592e-02 3.275919e+00 6.105156e+02 - FW 3000 5.035759e-02 4.969300e-02 6.645894e-04 5.035759e-02 3.306438e+00 9.073207e+02 - FW 4000 5.027073e-02 4.976697e-02 5.037594e-04 5.027073e-02 3.339685e+00 1.197718e+03 - FW 5000 5.021791e-02 4.980140e-02 4.165104e-04 5.021791e-02 3.371332e+00 1.483093e+03 - FW 6000 5.018239e-02 4.983318e-02 3.492063e-04 5.018239e-02 3.449728e+00 1.739268e+03 - FW 7000 5.015674e-02 4.985610e-02 3.006417e-04 5.015674e-02 3.477140e+00 2.013149e+03 - FW 8000 5.013751e-02 4.987326e-02 2.642458e-04 5.013751e-02 3.502454e+00 2.284113e+03 - FW 9000 5.012245e-02 4.988670e-02 2.357462e-04 5.012245e-02 3.528993e+00 2.550303e+03 - FW 10000 5.011036e-02 4.989745e-02 2.129161e-04 5.011036e-02 3.560152e+00 2.808869e+03 - Last 10001 5.011035e-02 4.989379e-02 2.165630e-04 5.011035e-02 3.725195e+00 2.684691e+03 + FW 1000 5.100158e-02 4.913021e-02 1.871364e-03 5.100158e-02 3.203934e+00 3.121162e+02 + FW 2000 5.052592e-02 4.954907e-02 9.768468e-04 5.052592e-02 3.237912e+00 6.176821e+02 + FW 3000 5.035759e-02 4.969300e-02 6.645894e-04 5.035759e-02 3.271389e+00 9.170417e+02 + FW 4000 5.027073e-02 4.976697e-02 5.037594e-04 5.027073e-02 3.304896e+00 1.210326e+03 + FW 5000 5.021791e-02 4.980140e-02 4.165104e-04 5.021791e-02 3.337764e+00 1.498009e+03 + FW 6000 5.018239e-02 4.983318e-02 3.492063e-04 5.018239e-02 3.372031e+00 1.779343e+03 + FW 7000 5.015674e-02 4.985610e-02 3.006417e-04 5.015674e-02 3.404090e+00 2.056350e+03 + FW 8000 5.013751e-02 4.987326e-02 2.642458e-04 5.013751e-02 3.435477e+00 2.328643e+03 + FW 9000 5.012245e-02 4.988670e-02 2.357462e-04 5.012245e-02 3.542657e+00 2.540466e+03 + FW 10000 5.011036e-02 4.989745e-02 2.129161e-04 5.011036e-02 3.572997e+00 2.798771e+03 + Last 10001 5.011035e-02 4.989379e-02 2.165630e-04 5.011035e-02 3.737794e+00 2.675643e+03 ---------------------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -63,8 +63,8 @@ Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- I 1 8.287728e-01 -4.317123e+01 4.400000e+01 8.287728e-01 0.000000e+00 Inf - FW 1000 5.000000e-02 4.999153e-02 8.474350e-06 5.000000e-02 2.956647e-02 3.382210e+04 - Last 1445 5.000000e-02 4.999896e-02 1.036019e-06 5.000000e-02 8.122359e-02 1.779040e+04 + FW 1000 5.000000e-02 4.999153e-02 8.474350e-06 5.000000e-02 1.502725e-01 6.654576e+03 + Last 1445 5.000000e-02 4.999896e-02 1.036019e-06 5.000000e-02 1.670777e-01 8.648673e+03 ---------------------------------------------------------------------------------------------------------------- Block coordinate Frank-Wolfe (BCFW). @@ -75,9 +75,9 @@ ---------------------------------------------------------------------------------------------------------------- Type Iteration Primal Dual Dual Gap Infeas Time It/sec ---------------------------------------------------------------------------------------------------------------- - I 1 8.287728e-01 -4.317123e+01 4.400000e+01 8.287728e-01 0.000000e+00 Inf - FW 1000 5.000001e-02 4.998340e-02 1.661000e-05 5.000001e-02 2.955647e-02 3.383354e+04 - Last 1575 5.000000e-02 4.999901e-02 9.915069e-07 5.000000e-02 4.701531e-02 3.349973e+04 + I 1 1.470947e+00 -4.252905e+01 4.400000e+01 1.470947e+00 0.000000e+00 Inf + FW 1000 5.000000e-02 4.998663e-02 1.337041e-05 5.000000e-02 4.080015e-02 2.450971e+04 + Last 1531 5.000000e-02 4.999896e-02 1.042885e-06 5.000000e-02 6.641747e-02 2.305117e+04 ----------------------------------------------------------------------------------------------------------------

As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.

_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(
     FrankWolfe.away_frank_wolfe,
     f,
@@ -98,9 +98,9 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec     #ActiveSet
 ----------------------------------------------------------------------------------------------------------------
      I             1   2.300000e+01           -Inf            Inf   0.000000e+00            Inf              2
-  Last           147   5.000000e-02   4.999914e-02   8.622362e-07   1.339877e+00   1.097116e+02             74
+  Last           147   5.000000e-02   4.999914e-02   8.622362e-07   1.302043e+00   1.128995e+02             74
 ----------------------------------------------------------------------------------------------------------------
-    PP           147   5.000000e-02   4.999914e-02   8.622362e-07   1.421270e+00   1.034286e+02             74
+    PP           147   5.000000e-02   4.999914e-02   8.622362e-07   1.380386e+00   1.064920e+02             74
 ----------------------------------------------------------------------------------------------------------------

Running Alternating Projections

Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.

_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(
     lmos,
     x0,
@@ -118,123 +118,123 @@
 ----------------------------------------------------------------------------------
   Type     Iteration       Dual Gap         Infeas           Time         It/sec
 ----------------------------------------------------------------------------------
-     I             1   7.247283e-01   3.623642e-01   0.000000e+00            Inf
-    FW           100   1.045964e-04   5.000029e-02   1.741886e+00   5.740905e+01
-    FW           200   2.549000e-05   5.000002e-02   2.513748e+00   7.956249e+01
-    FW           300   1.123044e-05   5.000000e-02   3.482077e+00   8.615547e+01
-    FW           400   6.488644e-06   5.000000e-02   4.591801e+00   8.711180e+01
-    FW           500   4.160782e-06   5.000000e-02   5.821464e+00   8.588905e+01
-    FW           600   2.869222e-06   5.000000e-02   7.129911e+00   8.415253e+01
-    FW           700   2.123105e-06   5.000000e-02   8.487582e+00   8.247343e+01
-    FW           800   1.581551e-06   5.000000e-02   9.923856e+00   8.061382e+01
-    FW           900   1.264159e-06   5.000000e-02   1.137259e+01   7.913765e+01
-    FW          1000   1.012869e-06   5.000000e-02   1.291593e+01   7.742375e+01
-  Last          1015   9.893090e-07   5.000000e-02   1.317500e+01   7.703987e+01
+     I             1   7.387930e-01   3.693965e-01   0.000000e+00            Inf
+    FW           100   1.045964e-04   5.000029e-02   1.807312e+00   5.533080e+01
+    FW           200   2.549000e-05   5.000002e-02   2.532547e+00   7.897189e+01
+    FW           300   1.123044e-05   5.000000e-02   3.433586e+00   8.737222e+01
+    FW           400   6.488644e-06   5.000000e-02   4.478935e+00   8.930694e+01
+    FW           500   4.160782e-06   5.000000e-02   5.619675e+00   8.897311e+01
+    FW           600   2.869222e-06   5.000000e-02   6.809893e+00   8.810711e+01
+    FW           700   2.123105e-06   5.000000e-02   8.106202e+00   8.635364e+01
+    FW           800   1.581551e-06   5.000000e-02   9.406900e+00   8.504395e+01
+    FW           900   1.264159e-06   5.000000e-02   1.081679e+01   8.320399e+01
+    FW          1000   1.012869e-06   5.000000e-02   1.221100e+01   8.189339e+01
+  Last          1015   9.893090e-07   5.000000e-02   1.244735e+01   8.154345e+01
 ----------------------------------------------------------------------------------

Plotting the resulting trajectories

labels = ["BCFW - Full", "BCFW - Cyclic", "BCFW - Stochastic", "AFW", "AP"]
 
 plot_trajectories(trajectories, labels, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_1_mathopt_lmo/index.html b/dev/examples/docs_1_mathopt_lmo/index.html index cb038548e..c6a410d2e 100644 --- a/dev/examples/docs_1_mathopt_lmo/index.html +++ b/dev/examples/docs_1_mathopt_lmo/index.html @@ -130,191 +130,191 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_2_polynomial_regression/index.html b/dev/examples/docs_2_polynomial_regression/index.html index 29187c8b2..253028f68 100644 --- a/dev/examples/docs_2_polynomial_regression/index.html +++ b/dev/examples/docs_2_polynomial_regression/index.html @@ -246,256 +246,256 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_3_matrix_completion/index.html b/dev/examples/docs_3_matrix_completion/index.html index b768689ee..b49157d0c 100644 --- a/dev/examples/docs_3_matrix_completion/index.html +++ b/dev/examples/docs_3_matrix_completion/index.html @@ -265,240 +265,240 @@ ) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_4_rational_opt/index.html b/dev/examples/docs_4_rational_opt/index.html index 4c4c233b2..14a6332b0 100644 --- a/dev/examples/docs_4_rational_opt/index.html +++ b/dev/examples/docs_4_rational_opt/index.html @@ -34,17 +34,17 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.000000e+00 -1.000000e+00 2.000000e+00 0.000000e+00 Inf - FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 6.261727e-01 1.597003e+01 - FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 6.275331e-01 3.187083e+01 - FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 6.293135e-01 4.767099e+01 - FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 6.308921e-01 6.340228e+01 - FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 6.333268e-01 7.894818e+01 - FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 6.353353e-01 9.443832e+01 - FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 6.380691e-01 1.097060e+02 - FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 6.407618e-01 1.248514e+02 - FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 6.437691e-01 1.398017e+02 - FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 6.476101e-01 1.544139e+02 - Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 6.490692e-01 1.556075e+02 + FW 10 1.407407e-01 -1.407407e-01 2.814815e-01 7.357970e-01 1.359071e+01 + FW 20 6.842105e-02 -6.842105e-02 1.368421e-01 7.380141e-01 2.709975e+01 + FW 30 4.521073e-02 -4.521073e-02 9.042146e-02 7.401740e-01 4.053101e+01 + FW 40 3.376068e-02 -3.376068e-02 6.752137e-02 7.429810e-01 5.383718e+01 + FW 50 2.693878e-02 -2.693878e-02 5.387755e-02 7.461145e-01 6.701385e+01 + FW 60 2.241055e-02 -2.241055e-02 4.482109e-02 7.490019e-01 8.010660e+01 + FW 70 1.918565e-02 -1.918565e-02 3.837129e-02 7.525025e-01 9.302295e+01 + FW 80 1.677215e-02 -1.677215e-02 3.354430e-02 7.559547e-01 1.058265e+02 + FW 90 1.489804e-02 -1.489804e-02 2.979609e-02 7.597608e-01 1.184583e+02 + FW 100 1.340067e-02 -1.340067e-02 2.680135e-02 7.637572e-01 1.309317e+02 + Last 101 1.314422e-02 -1.236767e-02 2.551189e-02 7.650420e-01 1.320189e+02 ------------------------------------------------------------------------------------------------- Output type of solution: BigFloat

Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as $\gamma_t=\frac{\langle \nabla f(x_t),x_t-v_t\rangle}{2L||x_t-v_t||^2}$. However, as this step size depends on an upper bound on the Lipschitz constant $L$ as well as the inner product with the gradient $\nabla f(x_t)$, both have to be of a rational type.

@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(
@@ -67,16 +67,16 @@
   Type     Iteration         Primal           Dual       Dual Gap           Time         It/sec
 -------------------------------------------------------------------------------------------------
      I             1   1.000000e+00  -1.000000e+00   2.000000e+00   0.000000e+00            Inf
-    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   5.660419e-01   1.766654e+01
-    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   5.688051e-01   3.516143e+01
-    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   5.715065e-01   5.249284e+01
-    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   5.740353e-01   6.968213e+01
-    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   5.768741e-01   8.667403e+01
-    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   5.803147e-01   1.033922e+02
-    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   5.837479e-01   1.199148e+02
-    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   5.880313e-01   1.360472e+02
-    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   5.922644e-01   1.519592e+02
-    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   5.969662e-01   1.675137e+02
-  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   5.979426e-01   1.672401e+02
+    FW            10   1.000000e-01  -1.000000e-01   2.000000e-01   6.103741e-01   1.638340e+01
+    FW            20   5.000000e-02  -5.000000e-02   1.000000e-01   6.127742e-01   3.263845e+01
+    FW            30   3.333333e-02  -3.333333e-02   6.666667e-02   6.152931e-01   4.875725e+01
+    FW            40   2.500000e-02  -2.500000e-02   5.000000e-02   6.183010e-01   6.469341e+01
+    FW            50   2.000000e-02  -2.000000e-02   4.000000e-02   6.218787e-01   8.040153e+01
+    FW            60   1.666667e-02  -1.666667e-02   3.333333e-02   6.268345e-01   9.571904e+01
+    FW            70   1.428571e-02  -1.428571e-02   2.857143e-02   6.310514e-01   1.109260e+02
+    FW            80   1.250000e-02  -1.250000e-02   2.500000e-02   6.357340e-01   1.258388e+02
+    FW            90   1.111111e-02  -1.111111e-02   2.222222e-02   6.400374e-01   1.406168e+02
+    FW           100   1.000000e-02   1.000000e-02   1.889162e-78   6.450212e-01   1.550337e+02
+  Last           100   1.000000e-02   1.000000e-02   2.159042e-78   6.460607e-01   1.547842e+02
 -------------------------------------------------------------------------------------------------
-  0.946663 seconds (1.57 M allocations: 87.943 MiB, 1.67% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

+ 1.239368 seconds (1.57 M allocations: 87.975 MiB, 16.15% gc time, 1.38% compilation time)

Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)


This page was generated using Literate.jl.

diff --git a/dev/examples/docs_5_blended_cg/index.html b/dev/examples/docs_5_blended_cg/index.html index 7eeb3ee2b..1d3c13704 100644 --- a/dev/examples/docs_5_blended_cg/index.html +++ b/dev/examples/docs_5_blended_cg/index.html @@ -154,116 +154,116 @@ plot_trajectories(data, label, xscalelog=true) - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_6_spectrahedron/index.html b/dev/examples/docs_6_spectrahedron/index.html index ab6fad5bf..4fdd5a2ab 100644 --- a/dev/examples/docs_6_spectrahedron/index.html +++ b/dev/examples/docs_6_spectrahedron/index.html @@ -67,7 +67,7 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.018651e+00 1.014119e+00 4.531396e-03 0.000000e+00 Inf - Last 26 1.014314e+00 1.014314e+00 8.598814e-09 2.347671e+00 1.107481e+01 + Last 26 1.014314e+00 1.014314e+00 8.598814e-09 2.467651e+00 1.053634e+01 ------------------------------------------------------------------------------------------------- Lazified Conditional Gradient (Frank-Wolfe + Lazification). @@ -80,113 +80,113 @@ Type Iteration Primal Dual Dual Gap Time It/sec Cache Size ---------------------------------------------------------------------------------------------------------------- I 1 1.018651e+00 1.014119e+00 4.531396e-03 0.000000e+00 Inf 1 - LD 2 1.014317e+00 1.014314e+00 3.679257e-06 2.225643e-01 8.986168e+00 2 - LD 3 1.014315e+00 1.014314e+00 1.036964e-06 3.017346e-01 9.942512e+00 3 - LD 4 1.014315e+00 1.014314e+00 5.090329e-07 3.858072e-01 1.036787e+01 4 - LD 6 1.014314e+00 1.014314e+00 2.019539e-07 4.967838e-01 1.207769e+01 5 - LD 9 1.014314e+00 1.014314e+00 8.396068e-08 6.452358e-01 1.394839e+01 6 - LD 13 1.014314e+00 1.014314e+00 3.872634e-08 8.419568e-01 1.544022e+01 7 - LD 19 1.014314e+00 1.014314e+00 1.766051e-08 1.071457e+00 1.773286e+01 8 - LD 27 1.014314e+00 1.014314e+00 8.603148e-09 1.379400e+00 1.957373e+01 9 - Last 27 1.014314e+00 1.014314e+00 7.988600e-09 1.543831e+00 1.748896e+01 10 + LD 2 1.014317e+00 1.014314e+00 3.679257e-06 2.132371e-01 9.379229e+00 2 + LD 3 1.014315e+00 1.014314e+00 1.036964e-06 3.733917e-01 8.034458e+00 3 + LD 4 1.014315e+00 1.014314e+00 5.090329e-07 4.623221e-01 8.651977e+00 4 + LD 6 1.014314e+00 1.014314e+00 2.019539e-07 5.849577e-01 1.025715e+01 5 + LD 9 1.014314e+00 1.014314e+00 8.396068e-08 7.244230e-01 1.242368e+01 6 + LD 13 1.014314e+00 1.014314e+00 3.872634e-08 8.944469e-01 1.453412e+01 7 + LD 19 1.014314e+00 1.014314e+00 1.766051e-08 1.122667e+00 1.692399e+01 8 + LD 27 1.014314e+00 1.014314e+00 8.603148e-09 1.425066e+00 1.894649e+01 9 + Last 27 1.014314e+00 1.014314e+00 7.988600e-09 1.605047e+00 1.682194e+01 10 ----------------------------------------------------------------------------------------------------------------

Plotting the resulting trajectories

data = [trajectory, trajectory_lazy]
 label = ["FW", "LCG"]
 plot_trajectories(data, label, xscalelog=true)
- + - + - + - + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_7_shifted_norm_polytopes/index.html b/dev/examples/docs_7_shifted_norm_polytopes/index.html index f6f87a1bc..8db7c8afe 100644 --- a/dev/examples/docs_7_shifted_norm_polytopes/index.html +++ b/dev/examples/docs_7_shifted_norm_polytopes/index.html @@ -67,27 +67,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 2.000000e+00 -6.000000e+00 8.000000e+00 0.000000e+00 Inf - FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.888971e-01 2.646943e+02 - FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.893304e-01 5.281771e+02 - FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.897412e-01 7.905504e+02 - FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.902035e-01 1.051505e+03 - FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.911434e-01 1.307918e+03 - FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.917337e-01 1.564670e+03 - FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.921632e-01 1.821368e+03 - FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.925849e-01 2.077006e+03 - FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.930486e-01 2.331019e+03 - FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.934517e-01 2.584624e+03 - FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.939460e-01 2.835840e+03 - FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.943475e-01 3.087253e+03 - FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.948793e-01 3.335397e+03 - FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.952867e-01 3.584473e+03 - FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.956855e-01 3.832680e+03 - FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.962520e-01 4.076391e+03 - FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.967882e-01 4.319364e+03 - FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.972082e-01 4.563704e+03 - FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.979071e-01 4.800231e+03 - FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.983204e-01 5.042345e+03 - Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.985404e-01 5.041794e+03 + FW 50 2.198243e-01 1.859119e-01 3.391239e-02 1.514062e-01 3.302375e+02 + FW 100 2.104540e-01 1.927834e-01 1.767061e-02 1.517922e-01 6.587955e+02 + FW 150 2.071345e-01 1.951277e-01 1.200679e-02 1.521765e-01 9.856978e+02 + FW 200 2.054240e-01 1.963167e-01 9.107240e-03 1.526727e-01 1.309992e+03 + FW 250 2.043783e-01 1.970372e-01 7.341168e-03 1.530116e-01 1.633863e+03 + FW 300 2.036722e-01 1.975209e-01 6.151268e-03 1.535510e-01 1.953748e+03 + FW 350 2.031630e-01 1.978684e-01 5.294582e-03 1.540157e-01 2.272495e+03 + FW 400 2.027782e-01 1.981301e-01 4.648079e-03 1.543895e-01 2.590849e+03 + FW 450 2.024772e-01 1.983344e-01 4.142727e-03 1.547487e-01 2.907940e+03 + FW 500 2.022352e-01 1.984984e-01 3.736776e-03 1.551661e-01 3.222353e+03 + FW 550 2.020364e-01 1.986329e-01 3.403479e-03 1.555332e-01 3.536222e+03 + FW 600 2.018701e-01 1.987452e-01 3.124906e-03 1.558780e-01 3.849164e+03 + FW 650 2.017290e-01 1.988404e-01 2.888583e-03 1.562157e-01 4.160913e+03 + FW 700 2.016078e-01 1.989222e-01 2.685564e-03 1.565686e-01 4.470883e+03 + FW 750 2.015024e-01 1.989932e-01 2.509264e-03 1.569411e-01 4.778863e+03 + FW 800 2.014101e-01 1.990554e-01 2.354727e-03 1.573002e-01 5.085817e+03 + FW 850 2.013284e-01 1.991103e-01 2.218154e-03 1.577667e-01 5.387702e+03 + FW 900 2.012558e-01 1.991592e-01 2.096580e-03 1.581439e-01 5.691019e+03 + FW 950 2.011906e-01 1.992030e-01 1.987662e-03 1.587014e-01 5.986085e+03 + FW 1000 2.011319e-01 1.992424e-01 1.889519e-03 1.591426e-01 6.283673e+03 + Last 1001 2.011297e-01 1.992439e-01 1.885794e-03 1.594129e-01 6.279292e+03 ------------------------------------------------------------------------------------------------- Final solution: [1.799813188674937, 0.5986834801090863] @@ -102,27 +102,27 @@ Type Iteration Primal Dual Dual Gap Time It/sec ------------------------------------------------------------------------------------------------- I 1 1.300000e+01 -1.900000e+01 3.200000e+01 0.000000e+00 Inf - FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 8.156644e-02 6.129972e+02 - FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 8.204834e-02 1.218794e+03 - FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 8.252934e-02 1.817535e+03 - FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 8.292844e-02 2.411718e+03 - FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 8.334124e-02 2.999715e+03 - FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 8.373844e-02 3.582584e+03 - FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 8.413165e-02 4.160147e+03 - FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 8.451035e-02 4.733148e+03 - FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 8.487435e-02 5.301955e+03 - FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 8.532535e-02 5.859923e+03 - FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 8.572885e-02 6.415577e+03 - FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 8.608335e-02 6.969989e+03 - FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 8.648745e-02 7.515541e+03 - FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 8.700775e-02 8.045260e+03 - FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 8.766395e-02 8.555398e+03 - FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 8.820415e-02 9.069868e+03 - FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 8.861995e-02 9.591519e+03 - FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 8.900735e-02 1.011152e+04 - FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 8.940995e-02 1.062522e+04 - FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 8.983205e-02 1.113188e+04 - Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 9.003326e-02 1.111811e+04 + FW 50 1.084340e-02 -7.590380e-02 8.674720e-02 7.803746e-02 6.407179e+02 + FW 100 5.509857e-03 -3.856900e-02 4.407886e-02 7.844696e-02 1.274747e+03 + FW 150 3.695414e-03 -2.586790e-02 2.956331e-02 7.888705e-02 1.901453e+03 + FW 200 2.780453e-03 -1.946317e-02 2.224362e-02 7.926655e-02 2.523132e+03 + FW 250 2.228830e-03 -1.560181e-02 1.783064e-02 7.968504e-02 3.137352e+03 + FW 300 1.859926e-03 -1.301948e-02 1.487941e-02 8.009654e-02 3.745480e+03 + FW 350 1.595838e-03 -1.117087e-02 1.276670e-02 8.047874e-02 4.348975e+03 + FW 400 1.397443e-03 -9.782098e-03 1.117954e-02 8.098403e-02 4.939245e+03 + FW 450 1.242935e-03 -8.700548e-03 9.943483e-03 8.135933e-02 5.531019e+03 + FW 500 1.119201e-03 -7.834409e-03 8.953610e-03 8.174573e-02 6.116528e+03 + FW 550 1.017878e-03 -7.125146e-03 8.143024e-03 8.212472e-02 6.697131e+03 + FW 600 9.333816e-04 -6.533671e-03 7.467053e-03 8.250702e-02 7.272109e+03 + FW 650 8.618413e-04 -6.032889e-03 6.894730e-03 8.289961e-02 7.840809e+03 + FW 700 8.004890e-04 -5.603423e-03 6.403912e-03 8.326361e-02 8.407034e+03 + FW 750 7.472928e-04 -5.231050e-03 5.978342e-03 8.372361e-02 8.958047e+03 + FW 800 7.007275e-04 -4.905093e-03 5.605820e-03 8.415470e-02 9.506302e+03 + FW 850 6.596259e-04 -4.617381e-03 5.277007e-03 8.453280e-02 1.005527e+04 + FW 900 6.230796e-04 -4.361557e-03 4.984637e-03 8.494140e-02 1.059554e+04 + FW 950 5.903710e-04 -4.132597e-03 4.722968e-03 8.531759e-02 1.113487e+04 + FW 1000 5.609256e-04 -3.926479e-03 4.487405e-03 8.568859e-02 1.167017e+04 + Last 1001 5.598088e-04 -3.918661e-03 4.478470e-03 8.589359e-02 1.165396e+04 ------------------------------------------------------------------------------------------------- Final solution: [2.0005598087769556, 0.9763463450796975]

We plot the polytopes alongside the solutions from above:

xcoord1 = [1, 3, 1, -1, 1]
@@ -158,53 +158,53 @@
 )
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This page was generated using Literate.jl.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_8_callback_and_tracking/index.html b/dev/examples/docs_8_callback_and_tracking/index.html index 4a7a26965..865f6f57e 100644 --- a/dev/examples/docs_8_callback_and_tracking/index.html +++ b/dev/examples/docs_8_callback_and_tracking/index.html @@ -79,4 +79,4 @@ total_iterations = 500 tf.counter = 501 tgrad!.counter = 501 -tlmo_prob.counter = 13

This page was generated using Literate.jl.

+tlmo_prob.counter = 13

This page was generated using Literate.jl.

diff --git a/dev/examples/docs_9_extra_vertex_storage/index.html b/dev/examples/docs_9_extra_vertex_storage/index.html index d4533a460..e4630a1e4 100644 --- a/dev/examples/docs_9_extra_vertex_storage/index.html +++ b/dev/examples/docs_9_extra_vertex_storage/index.html @@ -66,4 +66,4 @@ [ Info: Number of LMO calls in iter 9: 16 [ Info: Vertex storage size: 77 [ Info: Number of LMO calls in iter 10: 15 -[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

+[ Info: Vertex storage size: 82

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index 705a5f0f1..48cfbc9c2 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ...

If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:

using Plots
 using FrankWolfe
 
-include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl"))
+include(joinpath(dirname(pathof(FrankWolfe)), "../examples/plot_utils.jl")) diff --git a/dev/reference/0_reference/index.html b/dev/reference/0_reference/index.html index 936c5f7bd..e2c3de52a 100644 --- a/dev/reference/0_reference/index.html +++ b/dev/reference/0_reference/index.html @@ -1,2 +1,2 @@ -API Reference · FrankWolfe.jl +API Reference · FrankWolfe.jl diff --git a/dev/reference/1_algorithms/index.html b/dev/reference/1_algorithms/index.html index f02f74143..1be26780b 100644 --- a/dev/reference/1_algorithms/index.html +++ b/dev/reference/1_algorithms/index.html @@ -1,2 +1,2 @@ -Algorithms · FrankWolfe.jl

Algorithms

This section contains all main algorithms of the package. These are the ones typical users will call.

The typical signature for these algorithms is:

my_algorithm(f, grad!, lmo, x0)

Standard algorithms

FrankWolfe.frank_wolfeMethod
frank_wolfe(f, grad!, lmo, x0; ...)

Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

  • x final iterate
  • v last vertex from the LMO
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.
source
FrankWolfe.stochastic_frank_wolfeMethod
stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

source
FrankWolfe.block_coordinate_frank_wolfeFunction
block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.

The method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • traj_data vector of trajectory information.

See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

source

Active-set based methods

The following algorithms maintain the representation of the iterates as a convex combination of vertices.

Away-step

Blended Conditional Gradient

FrankWolfe.blended_conditional_gradientMethod
blended_conditional_gradient(f, grad!, lmo, x0)

Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

source
FrankWolfe.build_reduced_problemMethod
build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

source
FrankWolfe.lp_separation_oracleMethod

Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

source
FrankWolfe.minimize_over_convex_hull!Method
minimize_over_convex_hull!

Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

source
FrankWolfe.simplex_gradient_descent_over_convex_hullFunction
simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

source

Blended Pairwise Conditional Gradient

Alternating Methods

Problems over intersections of convex sets, i.e.

\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

pose a challenge as one has to combine the information of two or more LMOs.

FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

FrankWolfe.alternating_linear_minimizationMethod
alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • primal primal value f(x)
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source
FrankWolfe.alternating_projectionsMethod
alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

  • x cartesian product of final iterates
  • v cartesian product of last vertices of the LMOs
  • dual_gap final Frank-Wolfe gap
  • infeas sum of squared, pairwise distances between iterates
  • traj_data vector of trajectory information.
source

Index

    +Algorithms · FrankWolfe.jl

    Algorithms

    This section contains all main algorithms of the package. These are the ones typical users will call.

    The typical signature for these algorithms is:

    my_algorithm(f, grad!, lmo, x0)

    Standard algorithms

    FrankWolfe.frank_wolfeMethod
    frank_wolfe(f, grad!, lmo, x0; ...)

    Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:

    • x final iterate
    • v last vertex from the LMO
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.
    source
    FrankWolfe.stochastic_frank_wolfeMethod
    stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)

    Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.

    Keyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.

    Similarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).

    source
    FrankWolfe.block_coordinate_frank_wolfeFunction
    block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}

    Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.

    The method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • traj_data vector of trajectory information.

    See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.

    source

    Active-set based methods

    The following algorithms maintain the representation of the iterates as a convex combination of vertices.

    Away-step

    Blended Conditional Gradient

    FrankWolfe.blended_conditional_gradientMethod
    blended_conditional_gradient(f, grad!, lmo, x0)

    Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.

    source
    FrankWolfe.build_reduced_problemMethod
    build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)

    Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:

    f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ

    In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).

    source
    FrankWolfe.lp_separation_oracleMethod

    Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.

    inplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.

    source
    FrankWolfe.minimize_over_convex_hull!Method
    minimize_over_convex_hull!

    Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.

    It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.

    source
    FrankWolfe.simplex_gradient_descent_over_convex_hullFunction
    simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)

    Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.

    source

    Blended Pairwise Conditional Gradient

    Alternating Methods

    Problems over intersections of convex sets, i.e.

    \[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]

    pose a challenge as one has to combine the information of two or more LMOs.

    FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.

    FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.

    FrankWolfe.alternating_linear_minimizationMethod
    alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Alternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • primal primal value f(x)
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source
    FrankWolfe.alternating_projectionsMethod
    alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}

    Computes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:

    • x cartesian product of final iterates
    • v cartesian product of last vertices of the LMOs
    • dual_gap final Frank-Wolfe gap
    • infeas sum of squared, pairwise distances between iterates
    • traj_data vector of trajectory information.
    source

    Index

      diff --git a/dev/reference/2_lmo/index.html b/dev/reference/2_lmo/index.html index 507372841..f6e161642 100644 --- a/dev/reference/2_lmo/index.html +++ b/dev/reference/2_lmo/index.html @@ -1,2 +1,2 @@ -Linear Minimization Oracles · FrankWolfe.jl

      Linear Minimization Oracles

      The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

      \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

      See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

      Interface and wrappers

      FrankWolfe.LinearMinimizationOracleType

      Supertype for linear minimization oracles.

      All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

      source

      All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

      FrankWolfe.compute_extreme_pointFunction
      compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

      Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

      source

      We also provide some meta-LMOs wrapping another one with extended behavior:

      FrankWolfe.CachedLinearMinimizationOracleType
      CachedLinearMinimizationOracle{LMO}

      Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

      By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

      source
      FrankWolfe.SingleLastCachedLMOType
      SingleLastCachedLMO{LMO, VT}

      Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

      source
      FrankWolfe.MultiCacheLMOType
      MultiCacheLMO{N, LMO, A}

      Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

      source
      FrankWolfe.VectorCacheLMOType
      VectorCacheLMO{LMO, VT}

      Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

      source

      Norm balls

      FrankWolfe.EllipsoidLMOType
      EllipsoidLMO(A, c, r)

      Linear minimization over an ellipsoid centered at c of radius r:

      x: (x - c)^T A (x - c) ≤ r

      The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

      source
      FrankWolfe.KNormBallLMOType
      KNormBallLMO{T}(K::Int, right_hand_side::T)

      LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

      C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

      with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

      source
      FrankWolfe.LpNormLMOType
      LpNormLMO{T, p}(right_hand_side)

      LMO with feasible set being an L-p norm ball:

      C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
      source
      FrankWolfe.NuclearNormLMOType
      NuclearNormLMO{T}(radius)

      LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

      source
      FrankWolfe.SpectraplexLMOType
      SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

      Feasible set

      {X ∈ 𝕊_n^+, trace(X) == radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source
      FrankWolfe.UnitSpectrahedronLMOType
      UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

      Feasible set of PSD matrices with bounded trace:

      {X ∈ 𝕊_n^+, trace(X) ≤ radius}

      gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

      source

      Simplex

      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_dual_solutionMethod

      Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

      source
      FrankWolfe.compute_extreme_pointMethod

      LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

      source

      Polytope

      FrankWolfe.BirkhoffPolytopeLMOType
      BirkhoffPolytopeLMO

      The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

      source
      FrankWolfe.KSparseLMOType
      KSparseLMO{T}(K::Int, right_hand_side::T)

      LMO for the K-sparse polytope:

      C = B_1(τK) ∩ B_∞(τ)

      with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

      source
      FrankWolfe.ScaledBoundL1NormBallType
      ScaledBoundL1NormBall(lower_bounds, upper_bounds)

      Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

      source
      FrankWolfe.ScaledBoundLInfNormBallType
      ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

      Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

      source

      MathOptInterface

      FrankWolfe.MathOptLMOType
      MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

      Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

      The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

      The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

      source
      FrankWolfe.convert_mathoptFunction
      convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

      Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

      source

      Index

        +Linear Minimization Oracles · FrankWolfe.jl

        Linear Minimization Oracles

        The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given $d\in \mathcal{X}$, it returns a vertex of the feasible set:

        \[v\in \argmin_{x\in \mathcal{C}} \langle d,x \rangle.\]

        See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.

        Interface and wrappers

        FrankWolfe.LinearMinimizationOracleType

        Supertype for linear minimization oracles.

        All LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.

        source

        All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:

        FrankWolfe.compute_extreme_pointFunction
        compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)

        Computes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.

        source

        We also provide some meta-LMOs wrapping another one with extended behavior:

        FrankWolfe.CachedLinearMinimizationOracleType
        CachedLinearMinimizationOracle{LMO}

        Oracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.

        By convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.

        source
        FrankWolfe.SingleLastCachedLMOType
        SingleLastCachedLMO{LMO, VT}

        Caches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.

        source
        FrankWolfe.MultiCacheLMOType
        MultiCacheLMO{N, LMO, A}

        Cache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO

        source
        FrankWolfe.VectorCacheLMOType
        VectorCacheLMO{LMO, VT}

        Cache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO

        source

        Norm balls

        FrankWolfe.EllipsoidLMOType
        EllipsoidLMO(A, c, r)

        Linear minimization over an ellipsoid centered at c of radius r:

        x: (x - c)^T A (x - c) ≤ r

        The LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.

        source
        FrankWolfe.KNormBallLMOType
        KNormBallLMO{T}(K::Int, right_hand_side::T)

        LMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:

        C_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }

        with τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.

        source
        FrankWolfe.LpNormLMOType
        LpNormLMO{T, p}(right_hand_side)

        LMO with feasible set being an L-p norm ball:

        C = {x ∈ R^n, norm(x, p) ≤ right_hand_side}
        source
        FrankWolfe.NuclearNormLMOType
        NuclearNormLMO{T}(radius)

        LMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.

        source
        FrankWolfe.SpectraplexLMOType
        SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)

        Feasible set

        {X ∈ 𝕊_n^+, trace(X) == radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source
        FrankWolfe.UnitSpectrahedronLMOType
        UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)

        Feasible set of PSD matrices with bounded trace:

        {X ∈ 𝕊_n^+, trace(X) ≤ radius}

        gradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.

        source

        Simplex

        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_dual_solutionMethod

        Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.

        source
        FrankWolfe.compute_extreme_pointMethod

        LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.

        source

        Polytope

        FrankWolfe.BirkhoffPolytopeLMOType
        BirkhoffPolytopeLMO

        The Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.

        source
        FrankWolfe.KSparseLMOType
        KSparseLMO{T}(K::Int, right_hand_side::T)

        LMO for the K-sparse polytope:

        C = B_1(τK) ∩ B_∞(τ)

        with τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).

        source
        FrankWolfe.ScaledBoundL1NormBallType
        ScaledBoundL1NormBall(lower_bounds, upper_bounds)

        Polytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.

        source
        FrankWolfe.ScaledBoundLInfNormBallType
        ScaledBoundLInfNormBall(lower_bounds, upper_bounds)

        Polytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.

        source

        MathOptInterface

        FrankWolfe.MathOptLMOType
        MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle

        Linear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.

        The direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.

        The Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.

        source
        FrankWolfe.convert_mathoptFunction
        convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}

        Converts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.

        source

        Index

          diff --git a/dev/reference/3_backend/index.html b/dev/reference/3_backend/index.html index e6d351bfd..78b86c72c 100644 --- a/dev/reference/3_backend/index.html +++ b/dev/reference/3_backend/index.html @@ -1,2 +1,2 @@ -Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Function
          active_set_update!(active_set::ActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::ActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example "Tracking number of calls to different oracles".

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.

          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, l)

          Returns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Index

          +Utilities and data structures · FrankWolfe.jl

          Utilities and data structures

          Active set

          FrankWolfe.ActiveSetType
          ActiveSet{AT, R, IT}

          Represents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.

          source
          Base.copyMethod

          Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.

          source
          FrankWolfe.active_set_argminMethod
          active_set_argmin(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)

          source
          FrankWolfe.active_set_argminmaxMethod
          active_set_argminmax(active_set::ActiveSet, direction)

          Computes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)

          source
          FrankWolfe.active_set_update!Function
          active_set_update!(active_set::ActiveSet, lambda, atom)

          Adds the atom to the active set with weight lambda or adds lambda to existing atom.

          source
          FrankWolfe.compute_active_set_iterate!Method
          compute_active_set_iterate!(active_set::ActiveSet) -> x

          Recomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.

          source

          Functions and gradients

          FrankWolfe.ObjectiveFunctionType
          ObjectiveFunction

          Represents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least

          • compute_value(::ObjectiveFunction, x) for primal value evaluation
          • compute_gradient(::ObjectiveFunction, x) for gradient evaluation.

          and optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.

          source
          FrankWolfe.SimpleFunctionObjectiveType
          SimpleFunctionObjective{F,G,S}

          An objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.

          source
          FrankWolfe.StochasticObjectiveType
          StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)

          Represents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.

          Note: grad! must not reset the storage to 0 before adding to it.

          source
          FrankWolfe.compute_gradientFunction
          compute_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes the gradient of f at x. May return a reference to an internal storage.

          source
          FrankWolfe.compute_value_gradientMethod
          compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])

          Computes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.

          source

          Callbacks

          Custom vertex storage

          Custom extreme point types

          For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:

          Utils

          FrankWolfe.DeletedVertexStorageType

          Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.

          A vertex storage can be any type that implements two operations:

          1. Base.push!(storage, atom) to add an atom to the storage.

          Note that it is the storage type responsibility to ensure uniqueness of the atoms present.

          1. storage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)

          returning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.

          source
          FrankWolfe.ExpMomentumIteratorType
          ExpMomentumIterator{T}

          Iterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp

          The state corresponds to the iteration count.

          Source: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.

          source
          FrankWolfe.IncrementBatchIteratorType
          IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])

          Batch size starting at startingbatchsize and incrementing by increment at every iteration.

          source
          FrankWolfe.batchsize_iterateFunction
          batchsize_iterate(iter::BatchSizeIterator) -> b

          Method to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.

          source
          FrankWolfe.momentum_iterateFunction
          momentum_iterate(iter::MomentumIterator) -> ρ

          Method to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.

          source
          FrankWolfe.muladd_memory_modeMethod
          (memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)

          Performs storage = x - gamma * d in-place or not depending on MemoryEmphasis

          source
          FrankWolfe.trajectory_callbackMethod
          trajectory_callback(storage)

          Callback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)

          source

          Oracle counting trackers

          The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.

          Also see the example "Tracking number of calls to different oracles".

          Update order for block-coordinate methods

          Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.

          FrankWolfe.select_update_indicesFunction
          select_update_indices(::BlockCoordinateUpdateOrder, l)

          Returns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].

          source
          FrankWolfe.CyclicUpdateType

          The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.

          source
          FrankWolfe.StochasticUpdateType

          The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.

          source

          Index

          diff --git a/dev/reference/4_linesearch/index.html b/dev/reference/4_linesearch/index.html index 4842fda7c..6600cd658 100644 --- a/dev/reference/4_linesearch/index.html +++ b/dev/reference/4_linesearch/index.html @@ -1,2 +1,2 @@ -Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l ≥ 4 is advised only for strongly convex sets, see:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          +Line search and step size settings · FrankWolfe.jl

          Line search and step size settings

          The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

          \[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

          $\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

          The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

          All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

          FrankWolfe.LineSearchMethodType

          Line search method to apply once the direction is computed. A LineSearchMethod must implement

          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

          source
          FrankWolfe.perform_line_searchFunction
          perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

          Returns the step size gamma for step size strategy ls.

          source
          FrankWolfe.AdaptiveType

          Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

          \[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

          The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

          \[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

          This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

          It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

          source
          FrankWolfe.AgnosticType

          Computes step size: l/(l + t) at iteration t, given l > 0.

          Using l ≥ 4 is advised only for strongly convex sets, see:

          Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.

          source
          FrankWolfe.MonotonicNonConvexStepSizeType
          MonotonicNonConvexStepSize{F}

          Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

          source
          FrankWolfe.MonotonicStepSizeType
          MonotonicStepSize{F}

          Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

          source
          FrankWolfe.ShortstepType

          Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

          source

          See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

          Index

          diff --git a/dev/search/index.html b/dev/search/index.html index 3b1c96b9a..dc1751fad 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · FrankWolfe.jl +Search · FrankWolfe.jl diff --git a/dev/search_index.js b/dev/search_index.js index 739bf98a9..cb44b4546 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"reference/3_backend/#Utilities-and-data-structures","page":"Utilities and data structures","title":"Utilities and data structures","text":"","category":"section"},{"location":"reference/3_backend/#Active-set","page":"Utilities and data structures","title":"Active set","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"active_set.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ActiveSet","page":"Utilities and data structures","title":"FrankWolfe.ActiveSet","text":"ActiveSet{AT, R, IT}\n\nRepresents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Base.copy-Union{Tuple{FrankWolfe.ActiveSet{AT, R, IT}}, Tuple{IT}, Tuple{R}, Tuple{AT}} where {AT, R, IT}","page":"Utilities and data structures","title":"Base.copy","text":"Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_argmin-Tuple{FrankWolfe.ActiveSet, Any}","page":"Utilities and data structures","title":"FrankWolfe.active_set_argmin","text":"active_set_argmin(active_set::ActiveSet, direction)\n\nComputes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_argminmax-Tuple{FrankWolfe.ActiveSet, Any}","page":"Utilities and data structures","title":"FrankWolfe.active_set_argminmax","text":"active_set_argminmax(active_set::ActiveSet, direction)\n\nComputes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_initialize!-Union{Tuple{R}, Tuple{AT}, Tuple{FrankWolfe.ActiveSet{AT, R, IT} where IT, Any}} where {AT, R}","page":"Utilities and data structures","title":"FrankWolfe.active_set_initialize!","text":"active_set_initialize!(as, v)\n\nResets the active set structure to a single vertex v with unit weight.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_update!","page":"Utilities and data structures","title":"FrankWolfe.active_set_update!","text":"active_set_update!(active_set::ActiveSet, lambda, atom)\n\nAdds the atom to the active set with weight lambda or adds lambda to existing atom.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.active_set_update_iterate_pairwise!-Union{Tuple{A}, Tuple{IT}, Tuple{IT, Real, A, A}} where {IT, A}","page":"Utilities and data structures","title":"FrankWolfe.active_set_update_iterate_pairwise!","text":"active_set_update_iterate_pairwise!(x, lambda, fw_atom, away_atom)\n\nOperates x ← x + λ a_fw - λ a_aw.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_update_scale!-Union{Tuple{IT}, Tuple{IT, Any, Any}} where IT","page":"Utilities and data structures","title":"FrankWolfe.active_set_update_scale!","text":"active_set_update_scale!(x, lambda, atom)\n\nOperates x ← (1-λ) x + λ a.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.compute_active_set_iterate!-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.compute_active_set_iterate!","text":"compute_active_set_iterate!(active_set::ActiveSet) -> x\n\nRecomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.get_active_set_iterate-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.get_active_set_iterate","text":"get_active_set_iterate(active_set)\n\nReturn the current iterate corresponding. Does not recompute it.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Functions-and-gradients","page":"Utilities and data structures","title":"Functions and gradients","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"function_gradient.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ObjectiveFunction","page":"Utilities and data structures","title":"FrankWolfe.ObjectiveFunction","text":"ObjectiveFunction\n\nRepresents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least\n\ncompute_value(::ObjectiveFunction, x) for primal value evaluation\ncompute_gradient(::ObjectiveFunction, x) for gradient evaluation.\n\nand optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.SimpleFunctionObjective","page":"Utilities and data structures","title":"FrankWolfe.SimpleFunctionObjective","text":"SimpleFunctionObjective{F,G,S}\n\nAn objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.StochasticObjective","page":"Utilities and data structures","title":"FrankWolfe.StochasticObjective","text":"StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)\n\nRepresents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.\n\nNote: grad! must not reset the storage to 0 before adding to it.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.compute_gradient","page":"Utilities and data structures","title":"FrankWolfe.compute_gradient","text":"compute_gradient(f::ObjectiveFunction, x; [kwargs...])\n\nComputes the gradient of f at x. May return a reference to an internal storage.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.compute_value","page":"Utilities and data structures","title":"FrankWolfe.compute_value","text":"compute_value(f::ObjectiveFunction, x; [kwargs...])\n\nComputes the objective f at x.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.compute_value_gradient-Tuple{FrankWolfe.ObjectiveFunction, Any}","page":"Utilities and data structures","title":"FrankWolfe.compute_value_gradient","text":"compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])\n\nComputes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Callbacks","page":"Utilities and data structures","title":"Callbacks","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.CallbackState","category":"page"},{"location":"reference/3_backend/#FrankWolfe.CallbackState","page":"Utilities and data structures","title":"FrankWolfe.CallbackState","text":"Main structure created before and passed to the callback in first position.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Custom-vertex-storage","page":"Utilities and data structures","title":"Custom vertex storage","text":"","category":"section"},{"location":"reference/3_backend/#Custom-extreme-point-types","page":"Utilities and data structures","title":"Custom extreme point types","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.ScaledHotVector\nFrankWolfe.RankOneMatrix","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ScaledHotVector","page":"Utilities and data structures","title":"FrankWolfe.ScaledHotVector","text":"ScaledHotVector{T}\n\nRepresents a vector of at most one value different from 0.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.RankOneMatrix","page":"Utilities and data structures","title":"FrankWolfe.RankOneMatrix","text":"RankOneMatrix{T, UT, VT}\n\nRepresents a rank-one matrix R = u * vt'. Composes like a charm.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"types.jl\"]","category":"page"},{"location":"reference/3_backend/#Utils","page":"Utilities and data structures","title":"Utils","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"utils.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ConstantBatchIterator","page":"Utilities and data structures","title":"FrankWolfe.ConstantBatchIterator","text":"ConstantBatchIterator(batch_size)\n\nBatch iterator always returning a constant batch size.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.ConstantMomentumIterator","page":"Utilities and data structures","title":"FrankWolfe.ConstantMomentumIterator","text":"ConstantMomentumIterator{T}\n\nIterator for momentum with a fixed damping value, always return the value and a dummy state.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.DeletedVertexStorage","page":"Utilities and data structures","title":"FrankWolfe.DeletedVertexStorage","text":"Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.\n\nA vertex storage can be any type that implements two operations:\n\nBase.push!(storage, atom) to add an atom to the storage.\n\nNote that it is the storage type responsibility to ensure uniqueness of the atoms present.\n\nstorage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)\n\nreturning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.ExpMomentumIterator","page":"Utilities and data structures","title":"FrankWolfe.ExpMomentumIterator","text":"ExpMomentumIterator{T}\n\nIterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp\n\nThe state corresponds to the iteration count.\n\nSource: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.IncrementBatchIterator","page":"Utilities and data structures","title":"FrankWolfe.IncrementBatchIterator","text":"IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])\n\nBatch size starting at startingbatchsize and incrementing by increment at every iteration.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe._unsafe_equal-Tuple{Array, Array}","page":"Utilities and data structures","title":"FrankWolfe._unsafe_equal","text":"_unsafe_equal(a, b)\n\nLike isequal on arrays but without the checks. Assumes a and b have the same axes.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.batchsize_iterate","page":"Utilities and data structures","title":"FrankWolfe.batchsize_iterate","text":"batchsize_iterate(iter::BatchSizeIterator) -> b\n\nMethod to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.momentum_iterate","page":"Utilities and data structures","title":"FrankWolfe.momentum_iterate","text":"momentum_iterate(iter::MomentumIterator) -> ρ\n\nMethod to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Any, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"muladd_memory_mode(memory_mode::MemoryEmphasis, d, x, v)\n\nPerforms d = x - v in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Any, Real, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"(memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)\n\nPerforms storage = x - gamma * d in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Real, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"(memory_mode::MemoryEmphasis, x, gamma::Real, d)\n\nPerforms x = x - gamma * d in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.storage_find_argmin_vertex-Tuple{FrankWolfe.DeletedVertexStorage, Any, Any}","page":"Utilities and data structures","title":"FrankWolfe.storage_find_argmin_vertex","text":"Give the vertex v in the storage that minimizes s = direction ⋅ v and whether s achieves s ≤ lazy_threshold.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.trajectory_callback-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.trajectory_callback","text":"trajectory_callback(storage)\n\nCallback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Oracle-counting-trackers","page":"Utilities and data structures","title":"Oracle counting trackers","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.TrackingObjective\nFrankWolfe.TrackingGradient\nFrankWolfe.TrackingLMO","category":"page"},{"location":"reference/3_backend/#FrankWolfe.TrackingObjective","page":"Utilities and data structures","title":"FrankWolfe.TrackingObjective","text":"A function acting like the normal objective f but tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.TrackingGradient","page":"Utilities and data structures","title":"FrankWolfe.TrackingGradient","text":"A function acting like the normal grad! but tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.TrackingLMO","page":"Utilities and data structures","title":"FrankWolfe.TrackingLMO","text":"TrackingLMO{LMO}(lmo)\n\nAn LMO wrapping another one and tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Also see the example \"Tracking number of calls to different oracles\".","category":"page"},{"location":"reference/3_backend/#Update-order-for-block-coordinate-methods","page":"Utilities and data structures","title":"Update order for block-coordinate methods","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.BlockCoordinateUpdateOrder\nFrankWolfe.select_update_indices\nFrankWolfe.FullUpdate\nFrankWolfe.CyclicUpdate\nFrankWolfe.StochasticUpdate","category":"page"},{"location":"reference/3_backend/#FrankWolfe.BlockCoordinateUpdateOrder","page":"Utilities and data structures","title":"FrankWolfe.BlockCoordinateUpdateOrder","text":"Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement\n\nselect_update_indices(::BlockCoordinateUpdateOrder, l)\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.select_update_indices","page":"Utilities and data structures","title":"FrankWolfe.select_update_indices","text":"select_update_indices(::BlockCoordinateUpdateOrder, l)\n\nReturns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.FullUpdate","page":"Utilities and data structures","title":"FrankWolfe.FullUpdate","text":"The full update initiates a parallel update of all blocks in one single round.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.CyclicUpdate","page":"Utilities and data structures","title":"FrankWolfe.CyclicUpdate","text":"The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.StochasticUpdate","page":"Utilities and data structures","title":"FrankWolfe.StochasticUpdate","text":"The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Index","page":"Utilities and data structures","title":"Index","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Pages = [\"3_backend.md\"]","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"EditURL = \"https://github.com/ZIB-IOL/FrankWolfe.jl/blob/master/CONTRIBUTING.md\"","category":"page"},{"location":"contributing/#Contributing-to-FrankWolfe","page":"Contributing","title":"Contributing to FrankWolfe","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"First, thanks for taking the time to contribute. Contributions in any form, such as documentation, bug fix, examples or algorithms, are appreciated and welcome.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"We list below some guidelines to help you contribute to the package.","category":"page"},{"location":"contributing/#Community-Standards","page":"Contributing","title":"Community Standards","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Interactions on this repository must follow the Julia Community Standards including Pull Requests and issues.","category":"page"},{"location":"contributing/#Where-can-I-get-an-overview","page":"Contributing","title":"Where can I get an overview","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Check out the paper presenting the package for a high-level overview of the feature and algorithms and the documentation for more details.","category":"page"},{"location":"contributing/#I-just-have-a-question","page":"Contributing","title":"I just have a question","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If your question is related to Julia, its syntax or tooling, the best places to get help will be tied to the Julia community, see the Julia community page for a number of communication channels.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"For now, the best way to ask a question is to reach out to Mathieu Besançon or Sebastian Pokutta. You can also ask your question on discourse.julialang.org in the optimization topic or on the Julia Slack on #mathematical-optimization, see the Julia community page to gain access.","category":"page"},{"location":"contributing/#How-can-I-file-an-issue","page":"Contributing","title":"How can I file an issue","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you found a bug or want to propose a feature, we track our issues within the GitHub repository. Once opened, you can edit the issue or add new comments to continue the conversation.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you encounter a bug, send the stack trace (the lines appearing after the error occurred containing some source files) and ideally a Minimal Working Example (MWE), a small program that reproduces the bug.","category":"page"},{"location":"contributing/#How-can-I-contribute","page":"Contributing","title":"How can I contribute","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Contributing to the repository will likely be made in a Pull Request (PR). You will need to:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Fork the repository\nClone it on your machine to perform the changes\nCreate a branch for your modifications, based on the branch you want to merge on (typically master)\nPush to this branch on your fork\nThe GitHub web interface will then automatically suggest opening a PR onto the original repository.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"See the GitHub guide to creating PRs for more help on workflows using Git and GitHub.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"A PR should do a single thing to reduce the amount of code that must be reviewed. Do not run the formatter on the whole repository except if your PR is specifically about formatting.","category":"page"},{"location":"contributing/#Improve-the-documentation","page":"Contributing","title":"Improve the documentation","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"The documentation can be improved by changing the files in docs/src, for example to add a section in the documentation, expand a paragraph or add a plot. The documentation attached to a given type of function can be modified in the source files directly, it appears above the thing you try to document with three double quotations mark like this:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"\"\"\"\nThis explains what the function `f` does, it supports markdown.\n\"\"\"\nfunction f(x)\n # ...\nend","category":"page"},{"location":"contributing/#Provide-a-new-example-or-test","page":"Contributing","title":"Provide a new example or test","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.","category":"page"},{"location":"contributing/#Provide-a-new-feature","page":"Contributing","title":"Provide a new feature","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Some typical features to implement are:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"A new Linear Minimization Oracle (LMO)\nA new step size\nA new algorithm (less frequent) following the same API.","category":"page"},{"location":"contributing/#Code-style","page":"Contributing","title":"Code style","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"This contribution guide was inspired by ColPrac and the one in Manopt.jl.","category":"page"},{"location":"basics/#How-does-it-work?","page":"How does it work?","title":"How does it work?","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"FrankWolfe.jl contains generic routines to solve optimization problems of the form","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"min_x in mathcalC f(x)","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"where mathcalC is a compact convex set and f is a differentiable function. These routines work by solving a sequence of linear subproblems:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"min_x in mathcalC langle d_k x rangle quad textwhere quad d_k = nabla f(x_k)","category":"page"},{"location":"basics/#Linear-Minimization-Oracles","page":"How does it work?","title":"Linear Minimization Oracles","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction d, it returns an optimal vertex of the feasible set:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"v in arg min_xin mathcalC langle dx rangle","category":"page"},{"location":"basics/#Custom-LMOs","page":"How does it work?","title":"Custom LMOs","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"compute_extreme_point(lmo::LMO, direction; kwargs...) -> v","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"This method should minimize v mapsto langle d v rangle over the set mathcalC defined by the LMO. Note that this means the set mathcalC doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.","category":"page"},{"location":"basics/#Pre-defined-LMOs","page":"How does it work?","title":"Pre-defined LMOs","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"If you don't want to define your LMO manually, several common implementations are available out-of-the-box:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Simplices: unit simplex, probability simplex\nBalls in various norms\nPolytopes: K-sparse, Birkhoff","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Finally, we provide wrappers to combine oracles easily, for example in a product.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.","category":"page"},{"location":"basics/#Optimization-algorithms","page":"How does it work?","title":"Optimization algorithms","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The package features several variants of Frank-Wolfe that share the same basic API.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).","category":"page"},{"location":"basics/#Standard-Frank-Wolfe-(FW)","page":"How does it work?","title":"Standard Frank-Wolfe (FW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Jaggi (2013) for an overview.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).","category":"page"},{"location":"basics/#Away-step-Frank-Wolfe-(AFW)","page":"How does it work?","title":"Away-step Frank-Wolfe (AFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the away_frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Lacoste-Julien, Jaggi (2015) for an overview.","category":"page"},{"location":"basics/#Stochastic-Frank-Wolfe-(SFW)","page":"How does it work?","title":"Stochastic Frank-Wolfe (SFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.stochastic_frank_wolfe function.","category":"page"},{"location":"basics/#Blended-Conditional-Gradients-(BCG)","page":"How does it work?","title":"Blended Conditional Gradients (BCG)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Braun, Pokutta, Tu, Wright (2018).","category":"page"},{"location":"basics/#Blended-Pairwise-Conditional-Gradients-(BPCG)","page":"How does it work?","title":"Blended Pairwise Conditional Gradients (BPCG)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Tsuji, Tanaka, Pokutta (2021)","category":"page"},{"location":"basics/#Comparison","page":"How does it work?","title":"Comparison","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The following table compares the characteristics of the algorithms presented in the package:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Algorithm Progress/Iteration Time/Iteration Sparsity Numerical Stability Active Set Lazifiable\nFW Low Low Low High No Yes\nAFW Medium Medium-High Medium Medium-High Yes Yes\nB(P)CG High Medium-High High Medium Yes By design\nSFW Low Low Low High No No","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set mathcalC, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"(Image: FW vs AFW).","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.","category":"page"},{"location":"basics/#Block-Coordinate-Frank-Wolfe-(BCFW)","page":"How does it work?","title":"Block-Coordinate Frank-Wolfe (BCFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.","category":"page"},{"location":"basics/#Alternating-Linear-Minimization-(ALM)","page":"How does it work?","title":"Alternating Linear Minimization (ALM)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.alternating_linear_minimization function.","category":"page"},{"location":"reference/2_lmo/#Linear-Minimization-Oracles","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given din mathcalX, it returns a vertex of the feasible set:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"vin argmin_xin mathcalC langle dx rangle","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.","category":"page"},{"location":"reference/2_lmo/#Interface-and-wrappers","page":"Linear Minimization Oracles","title":"Interface and wrappers","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"FrankWolfe.LinearMinimizationOracle","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.LinearMinimizationOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.LinearMinimizationOracle","text":"Supertype for linear minimization oracles.\n\nAll LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"compute_extreme_point","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)\n\nComputes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.\n\n\n\n\n\n","category":"function"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"We also provide some meta-LMOs wrapping another one with extended behavior:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"FrankWolfe.CachedLinearMinimizationOracle\nFrankWolfe.ProductLMO\nFrankWolfe.SingleLastCachedLMO\nFrankWolfe.MultiCacheLMO\nFrankWolfe.VectorCacheLMO","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.CachedLinearMinimizationOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.CachedLinearMinimizationOracle","text":"CachedLinearMinimizationOracle{LMO}\n\nOracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.\n\nBy convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ProductLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.ProductLMO","text":"ProductLMO(lmos...)\n\nLinear minimization oracle over the Cartesian product of multiple LMOs.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.SingleLastCachedLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.SingleLastCachedLMO","text":"SingleLastCachedLMO{LMO, VT}\n\nCaches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.MultiCacheLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.MultiCacheLMO","text":"MultiCacheLMO{N, LMO, A}\n\nCache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.VectorCacheLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.VectorCacheLMO","text":"VectorCacheLMO{LMO, VT}\n\nCache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#Norm-balls","page":"Linear Minimization Oracles","title":"Norm balls","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"norm_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.EllipsoidLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.EllipsoidLMO","text":"EllipsoidLMO(A, c, r)\n\nLinear minimization over an ellipsoid centered at c of radius r:\n\nx: (x - c)^T A (x - c) ≤ r\n\nThe LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.KNormBallLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.KNormBallLMO","text":"KNormBallLMO{T}(K::Int, right_hand_side::T)\n\nLMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:\n\nC_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }\n\nwith τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.LpNormLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.LpNormLMO","text":"LpNormLMO{T, p}(right_hand_side)\n\nLMO with feasible set being an L-p norm ball:\n\nC = {x ∈ R^n, norm(x, p) ≤ right_hand_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.NuclearNormLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.NuclearNormLMO","text":"NuclearNormLMO{T}(radius)\n\nLMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.SpectraplexLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.SpectraplexLMO","text":"SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)\n\nFeasible set\n\n{X ∈ 𝕊_n^+, trace(X) == radius}\n\ngradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.UnitSpectrahedronLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.UnitSpectrahedronLMO","text":"UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)\n\nFeasible set of PSD matrices with bounded trace:\n\n{X ∈ 𝕊_n^+, trace(X) ≤ radius}\n\ngradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#Simplex","page":"Linear Minimization Oracles","title":"Simplex","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"simplex_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.ProbabilitySimplexOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.ProbabilitySimplexOracle","text":"ProbabilitySimplexOracle(right_side)\n\nRepresents the scaled probability simplex:\n\nC = {x ∈ R^n_+, ∑x = right_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.UnitSimplexOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.UnitSimplexOracle","text":"UnitSimplexOracle(right_side)\n\nRepresents the scaled unit simplex:\n\nC = {x ∈ R^n_+, ∑x ≤ right_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.compute_dual_solution-Union{Tuple{T}, Tuple{FrankWolfe.ProbabilitySimplexOracle{T}, Any, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_dual_solution","text":"Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_dual_solution-Union{Tuple{T}, Tuple{FrankWolfe.UnitSimplexOracle{T}, Any, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_dual_solution","text":"Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point-Union{Tuple{T}, Tuple{FrankWolfe.ProbabilitySimplexOracle{T}, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point-Union{Tuple{T}, Tuple{FrankWolfe.UnitSimplexOracle{T}, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#Polytope","page":"Linear Minimization Oracles","title":"Polytope","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"polytope_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.BirkhoffPolytopeLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.BirkhoffPolytopeLMO","text":"BirkhoffPolytopeLMO\n\nThe Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ConvexHullOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.ConvexHullOracle","text":"ConvexHullOracle{AT,VT}\n\nConvex hull of a finite number of vertices of type AT, stored in a vector of type VT.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.KSparseLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.KSparseLMO","text":"KSparseLMO{T}(K::Int, right_hand_side::T)\n\nLMO for the K-sparse polytope:\n\nC = B_1(τK) ∩ B_∞(τ)\n\nwith τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ScaledBoundL1NormBall","page":"Linear Minimization Oracles","title":"FrankWolfe.ScaledBoundL1NormBall","text":"ScaledBoundL1NormBall(lower_bounds, upper_bounds)\n\nPolytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ScaledBoundLInfNormBall","page":"Linear Minimization Oracles","title":"FrankWolfe.ScaledBoundLInfNormBall","text":"ScaledBoundLInfNormBall(lower_bounds, upper_bounds)\n\nPolytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#MathOptInterface","page":"Linear Minimization Oracles","title":"MathOptInterface","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"moi_oracle.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.MathOptLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.MathOptLMO","text":"MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle\n\nLinear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.\n\nThe direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.\n\nThe Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.convert_mathopt","page":"Linear Minimization Oracles","title":"FrankWolfe.convert_mathopt","text":"convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}\n\nConverts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.\n\n\n\n\n\n","category":"function"},{"location":"reference/2_lmo/#Index","page":"Linear Minimization Oracles","title":"Index","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Pages = [\"1_lmo.md\"]","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"EditURL = \"../../../examples/docs_1_mathopt_lmo.jl\"","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/#Comparison-with-MathOptInterface-on-a-Probability-Simplex","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"","category":"section"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"In this example, we project a random point onto a probability simplex with the Frank-Wolfe algorithm using either the specialized LMO defined in the package or a generic LP formulation using MathOptInterface.jl (MOI) and GLPK as underlying LP solver. It can be found as Example 4.4 in the paper.","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"using FrankWolfe\n\nusing LinearAlgebra\nusing LaTeXStrings\n\nusing Plots\n\nusing JuMP\nconst MOI = JuMP.MOI\n\nimport GLPK\n\nn = Int(1e3)\nk = 10000\n\nxpi = rand(n);\ntotal = sum(xpi);\nconst xp = xpi ./ total;\n\nf(x) = norm(x - xp)^2\nfunction grad!(storage, x)\n @. storage = 2 * (x - xp)\n return nothing\nend\n\nlmo_radius = 2.5\nlmo = FrankWolfe.FrankWolfe.ProbabilitySimplexOracle(lmo_radius)\n\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\ngradient = collect(x00)\n\nx_lmo, v, primal, dual_gap, trajectory_lmo = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"Create a MathOptInterface Optimizer and build the same linear constraints:","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"o = GLPK.Optimizer()\nx = MOI.add_variables(o, n)\n\nfor xi in x\n MOI.add_constraint(o, xi, MOI.GreaterThan(0.0))\nend\n\nMOI.add_constraint(\n o,\n MOI.ScalarAffineFunction(MOI.ScalarAffineTerm.(1.0, x), 0.0),\n MOI.EqualTo(lmo_radius),\n)\n\nlmo_moi = FrankWolfe.MathOptLMO(o)\n\nx, v, primal, dual_gap, trajectory_moi = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_moi,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"Alternatively, we can use one of the modelling interfaces based on MOI to formulate the LP. The following example builds the same set of constraints using JuMP:","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"m = JuMP.Model(GLPK.Optimizer)\n@variable(m, y[1:n] ≥ 0)\n\n@constraint(m, sum(y) == lmo_radius)\n\nlmo_jump = FrankWolfe.MathOptLMO(m.moi_backend)\n\nx, v, primal, dual_gap, trajectory_jump = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_jump,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\n\nx_lmo, v, primal, dual_gap, trajectory_lmo_blas = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x00,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\n\nx, v, primal, dual_gap, trajectory_jump_blas = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_jump,\n x00,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"We can now plot the results","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"iteration_list = [[x[1] + 1 for x in trajectory_lmo], [x[1] + 1 for x in trajectory_moi]]\ntime_list = [[x[5] for x in trajectory_lmo], [x[5] for x in trajectory_moi]]\nprimal_gap_list = [[x[2] for x in trajectory_lmo], [x[2] for x in trajectory_moi]]\ndual_gap_list = [[x[4] for x in trajectory_lmo], [x[4] for x in trajectory_moi]]\n\nlabel = [L\"\\textrm{Closed-form LMO}\", L\"\\textrm{MOI LMO}\"]\n\nplot_results(\n [primal_gap_list, primal_gap_list, dual_gap_list, dual_gap_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [\"\", \"\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [L\"\\textrm{Primal Gap}\", \"\", L\"\\textrm{Dual Gap}\", \"\"],\n xscalelog=[:log, :identity, :log, :identity],\n yscalelog=[:log, :log, :log, :log],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"EditURL = \"../../../examples/docs_6_spectrahedron.jl\"","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Spectrahedron","page":"Spectrahedron","title":"Spectrahedron","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"This example shows an optimization problem over the spectraplex:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"S = X in mathbbS_+^n Tr(X) = 1","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"with mathbbS_+^n the set of positive semidefinite matrices. Linear optimization with symmetric objective D over the spetraplex consists in computing the leading eigenvector of D.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"The package also exposes UnitSpectrahedronLMO which corresponds to the feasible set:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"S_u = X in mathbbS_+^n Tr(X) leq 1","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"using FrankWolfe\nusing LinearAlgebra\nusing Random\nusing SparseArrays","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"The objective function will be the symmetric squared distance to a set of known or observed entries Y_ij of the matrix.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"f(X) = sum_(ij) in L 12 (X_ij - Y_ij)^2","category":"page"},{"location":"examples/docs_6_spectrahedron/#Setting-up-the-input-data,-objective,-and-gradient","page":"Spectrahedron","title":"Setting up the input data, objective, and gradient","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Dimension, number of iterations and number of known entries:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"n = 1500\nk = 5000\nn_entries = 1000\n\nRandom.seed!(41)\n\nconst entry_indices = unique!([minmax(rand(1:n, 2)...) for _ in 1:n_entries])\nconst entry_values = randn(length(entry_indices))\n\nfunction f(X)\n r = zero(eltype(X))\n for (idx, (i, j)) in enumerate(entry_indices)\n r += 1 / 2 * (X[i, j] - entry_values[idx])^2\n r += 1 / 2 * (X[j, i] - entry_values[idx])^2\n end\n return r / length(entry_values)\nend\n\nfunction grad!(storage, X)\n storage .= 0\n for (idx, (i, j)) in enumerate(entry_indices)\n storage[i, j] += (X[i, j] - entry_values[idx])\n storage[j, i] += (X[j, i] - entry_values[idx])\n end\n return storage ./= length(entry_values)\nend","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Note that the ensure_symmetry = false argument to SpectraplexLMO. It skips an additional step making the used direction symmetric. It is not necessary when the gradient is a LinearAlgebra.Symmetric (or more rarely a LinearAlgebra.Diagonal or LinearAlgebra.UniformScaling).","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"const lmo = FrankWolfe.SpectraplexLMO(1.0, n, false)\nconst x0 = FrankWolfe.compute_extreme_point(lmo, spzeros(n, n))\n\ntarget_tolerance = 1e-8;\nnothing #hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Running-standard-and-lazified-Frank-Wolfe","page":"Spectrahedron","title":"Running standard and lazified Frank-Wolfe","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Xfinal, Vfinal, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.MonotonicStepSize(),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n)\n\nXfinal, Vfinal, primal, dual_gap, trajectory_lazy = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.MonotonicStepSize(),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Plotting-the-resulting-trajectories","page":"Spectrahedron","title":"Plotting the resulting trajectories","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"data = [trajectory, trajectory_lazy]\nlabel = [\"FW\", \"LCG\"]\nplot_trajectories(data, label, xscalelog=true)","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"EditURL = \"../../../examples/docs_9_extra_vertex_storage.jl\"","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/#Extra-lazification","page":"Extra-lazification","title":"Extra-lazification","text":"","category":"section"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"Sometimes the Frank-Wolfe algorithm will be run multiple times with slightly different settings under which vertices collected in a previous run are still valid.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The extra-lazification feature can be used for this purpose. It consists of a storage that can collect dropped vertices during a run, and the ability to use these vertices in another run, when they are not part of the current active set. The vertices that are part of the active set do not need to be duplicated in the extra-lazification storage. The extra-vertices can be used instead of calling the LMO when it is a relatively expensive operation.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"using FrankWolfe\nusing Test\nusing LinearAlgebra","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"We will use a parameterized objective function 12 x - c^2 over the unit simplex.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"const n = 100\nconst center0 = 5.0 .+ 3 * rand(n)\nf(x) = 0.5 * norm(x .- center0)^2\nfunction grad!(storage, x)\n return storage .= x .- center0\nend","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The TrackingLMO will let us count how many real calls to the LMO are performed by a single run of the algorithm.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"lmo = FrankWolfe.UnitSimplexOracle(4.3)\ntlmo = FrankWolfe.TrackingLMO(lmo)\nx0 = FrankWolfe.compute_extreme_point(lmo, randn(n));\nnothing #hide","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/#Adding-a-vertex-storage","page":"Extra-lazification","title":"Adding a vertex storage","text":"","category":"section"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"FrankWolfe offers a simple FrankWolfe.DeletedVertexStorage storage type which has as parameter return_kth, the number of good directions to find before returning the best. return_kth larger than the number of vertices means that the best-aligned vertex will be found. return_kth = 1 means the first acceptable vertex (with the specified threhsold) is returned.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"See FrankWolfe.DeletedVertexStorage","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"vertex_storage = FrankWolfe.DeletedVertexStorage(typeof(x0)[], 5)\ntlmo.counter = 0\n\nresults = FrankWolfe.blended_pairwise_conditional_gradient(\n f,\n grad!,\n tlmo,\n x0,\n max_iteration=4000,\n verbose=true,\n lazy=true,\n epsilon=1e-5,\n add_dropped_vertices=true,\n extra_vertex_storage=vertex_storage,\n)","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The counter indicates the number of initial calls to the LMO. We will now construct different objective functions based on new centers, call the BPCG algorithm while accumulating vertices in the storage, in addition to warm-starting with the active set of the previous iteration. This allows for a \"double-warmstarted\" algorithm, reducing the number of LMO calls from one problem to the next.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"active_set = results[end]\ntlmo.counter\n\nfor iter in 1:10\n center = 5.0 .+ 3 * rand(n)\n f_i(x) = 0.5 * norm(x .- center)^2\n function grad_i!(storage, x)\n return storage .= x .- center\n end\n tlmo.counter = 0\n FrankWolfe.blended_pairwise_conditional_gradient(\n f_i,\n grad_i!,\n tlmo,\n active_set,\n max_iteration=4000,\n lazy=true,\n epsilon=1e-5,\n add_dropped_vertices=true,\n use_extra_vertex_storage=true,\n extra_vertex_storage=vertex_storage,\n verbose=false,\n )\n @info \"Number of LMO calls in iter $iter: $(tlmo.counter)\"\n @info \"Vertex storage size: $(length(vertex_storage.storage))\"\nend","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"EditURL = \"../../../examples/docs_7_shifted_norm_polytopes.jl\"","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide\nusing FrankWolfe\nusing LinearAlgebra\nusing LaTeXStrings\nusing Plots","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/#FrankWolfe-for-scaled,-shifted-\\ell1-and-\\ell{\\infty}-norm-balls","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"","category":"section"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"In this example, we run the vanilla FrankWolfe algorithm on a scaled and shifted ell^1 and ell^infty norm ball, using the ScaledBoundL1NormBall and ScaledBoundLInfNormBall LMOs. We shift both onto the point (10) and then scale them by a factor of 2 along the x-axis. We project the point (21) onto the polytopes.","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"n = 2\n\nk = 1000\n\nxp = [2.0, 1.0]\n\nf(x) = norm(x - xp)^2\n\nfunction grad!(storage, x)\n @. storage = 2 * (x - xp)\n return nothing\nend\n\nlower = [-1.0, -1.0]\nupper = [3.0, 1.0]\n\nl1 = FrankWolfe.ScaledBoundL1NormBall(lower, upper)\n\nlinf = FrankWolfe.ScaledBoundLInfNormBall(lower, upper)\n\nx1 = FrankWolfe.compute_extreme_point(l1, zeros(n))\ngradient = collect(x1)\n\nx_l1, v_1, primal_1, dual_gap_1, trajectory_1 = FrankWolfe.frank_wolfe(\n f,\n grad!,\n l1,\n collect(copy(x1)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=50,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n);\n\nprintln(\"\\nFinal solution: \", x_l1)\n\nx2 = FrankWolfe.compute_extreme_point(linf, zeros(n))\ngradient = collect(x2)\n\nx_linf, v_2, primal_2, dual_gap_2, trajectory_2 = FrankWolfe.frank_wolfe(\n f,\n grad!,\n linf,\n collect(copy(x2)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=50,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n);\n\nprintln(\"\\nFinal solution: \", x_linf)","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"We plot the polytopes alongside the solutions from above:","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"xcoord1 = [1, 3, 1, -1, 1]\nycoord1 = [-1, 0, 1, 0, -1]\n\nxcoord2 = [3, 3, -1, -1, 3]\nycoord2 = [-1, 1, 1, -1, -1]\n\nplot(\n xcoord1,\n ycoord1,\n title=\"Visualization of scaled shifted norm balls\",\n lw=2,\n label=L\"\\ell^1 \\textrm{ norm}\",\n)\nplot!(xcoord2, ycoord2, lw=2, label=L\"\\ell^{\\infty} \\textrm{ norm}\")\nplot!(\n [x_l1[1]],\n [x_l1[2]],\n seriestype=:scatter,\n lw=5,\n color=\"blue\",\n label=L\"\\ell^1 \\textrm{ solution}\",\n)\nplot!(\n [x_linf[1]],\n [x_linf[2]],\n seriestype=:scatter,\n lw=5,\n color=\"orange\",\n label=L\"\\ell^{\\infty} \\textrm{ solution}\",\n legend=:bottomleft,\n)","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"This page was generated using Literate.jl.","category":"page"},{"location":"reference/4_linesearch/#Line-search-and-step-size-settings","page":"Line search and step size settings","title":"Line search and step size settings","text":"","category":"section"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"The step size dictates how far one traverses along a local descent direction. More specifically, the step size gamma_t is used at each iteration to determine how much the next iterate moves towards the new vertex: ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"x_t+1 = x_t - gamma_t (x_t - v_t)","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"gamma_t = 1 implies that the next iterate is exactly the vertex, a zero gamma_t implies that the iterate is not moving. ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series gamma_t that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive. ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"FrankWolfe.LineSearchMethod\nFrankWolfe.perform_line_search","category":"page"},{"location":"reference/4_linesearch/#FrankWolfe.LineSearchMethod","page":"Line search and step size settings","title":"FrankWolfe.LineSearchMethod","text":"Line search method to apply once the direction is computed. A LineSearchMethod must implement\n\nperform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)\n\nwith d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.perform_line_search","page":"Line search and step size settings","title":"FrankWolfe.perform_line_search","text":"perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)\n\nReturns the step size gamma for step size strategy ls.\n\n\n\n\n\n","category":"function"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"Modules = [FrankWolfe]\nPages = [\"linesearch.jl\"]","category":"page"},{"location":"reference/4_linesearch/#FrankWolfe.Adaptive","page":"Line search and step size settings","title":"FrankWolfe.Adaptive","text":"Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)\n\n f(x_t + gamma_t (x_t - v_t)) - f(x_t) leq - alpha gamma_t langle nabla f(x_t) x_t - v_t rangle + alpha^2 fracgamma_t^2 x_t - v_t^22 M \n\nThe parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition, \n\n langle nabla f(x_t + gamma_t (x_t - v_t) ) - nabla f(x_t) x_t - v_t rangle leq gamma_t M x_t - v_t^2 \n\nThis condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.\n\nIt is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Agnostic","page":"Line search and step size settings","title":"FrankWolfe.Agnostic","text":"Computes step size: l/(l + t) at iteration t, given l > 0.\n\nUsing l ≥ 4 is advised only for strongly convex sets, see:\n\nAcceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Backtracking","page":"Line search and step size settings","title":"FrankWolfe.Backtracking","text":"Backtracking(limit_num_steps, tol, tau)\n\nBacktracking line search strategy, see Pedregosa, Negiar, Askari, Jaggi (2018).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.FixedStep","page":"Line search and step size settings","title":"FrankWolfe.FixedStep","text":"Fixed step size strategy. The step size can still be truncated by the gamma_max argument.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Goldenratio","page":"Line search and step size settings","title":"FrankWolfe.Goldenratio","text":"Goldenratio\n\nSimple golden-ratio based line search Golden Section Search, based on Combettes, Pokutta (2020) code and adapted.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.MonotonicNonConvexStepSize","page":"Line search and step size settings","title":"FrankWolfe.MonotonicNonConvexStepSize","text":"MonotonicNonConvexStepSize{F}\n\nRepresents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.MonotonicStepSize","page":"Line search and step size settings","title":"FrankWolfe.MonotonicStepSize","text":"MonotonicStepSize{F}\n\nRepresents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Nonconvex","page":"Line search and step size settings","title":"FrankWolfe.Nonconvex","text":"Computes a step size for nonconvex functions: 1/sqrt(t + 1).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Shortstep","page":"Line search and step size settings","title":"FrankWolfe.Shortstep","text":"Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.","category":"page"},{"location":"reference/4_linesearch/#Index","page":"Line search and step size settings","title":"Index","text":"","category":"section"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"Pages = [\"4_linesearch.md\"]","category":"page"},{"location":"advanced/#Advanced-features","page":"Advanced features","title":"Advanced features","text":"","category":"section"},{"location":"advanced/#Multi-precision","page":"Advanced features","title":"Multi-precision","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"All algorithms can run in various precisions modes: Float16, Float32, Float64, BigFloat and also for rationals based on various integer types Int32, Int64, BigInt (see e.g., the approximate Carathéodory example)","category":"page"},{"location":"advanced/#Step-size-computation","page":"Advanced features","title":"Step size computation","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For all Frank-Wolfe algorithms, a step size must be determined to move from the current iterate to the next one. This step size can be determined by exact line search or any other rule represented by a subtype of FrankWolfe.LineSearchMethod, which must implement FrankWolfe.line_search_wrapper.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Multiple line search and step size determination rules are already available. See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size and Carderera, Besançon, Pokutta (2021) for the monotonic step size.","category":"page"},{"location":"advanced/#Callbacks","page":"Advanced features","title":"Callbacks","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"All top-level algorithms can take an optional callback argument, which must be a function taking a FrankWolfe.CallbackState struct and additional arguments:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"callback(state::FrankWolfe.CallbackState, args...)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The callback can be used to log additional information or store some values of interest in an external array. If a callback is passed, the trajectory keyword is ignored since it is a special case of callback pushing the 5 first elements of the state to an array returned from the algorithm.","category":"page"},{"location":"advanced/#Custom-extreme-point-types","page":"Advanced features","title":"Custom extreme point types","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. See for example FrankWolfe.ScaledHotVector and FrankWolfe.RankOneMatrix.","category":"page"},{"location":"advanced/#Active-set","page":"Advanced features","title":"Active set","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The active set represents an iterate as a convex combination of atoms (also referred to as extreme points or vertices). It maintains a vector of atoms, the corresponding weights, and the current iterate.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Note: the weights in the active set are currently defined as Float64 in the algorithm. This means that even with vertices using a lower precision, the iterate sum_i(lambda_i * v_i) will be upcast to Float64. One reason for keeping this as-is for now is the higher precision required by the computation of iterates from their barycentric decomposition.","category":"page"},{"location":"advanced/#Extra-lazification-with-a-vertex-storage","page":"Advanced features","title":"Extra-lazification with a vertex storage","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"One can pass the following keyword arguments to some active set-based Frank-Wolfe algorithms:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"add_dropped_vertices=true,\nuse_extra_vertex_storage=true,\nextra_vertex_storage=vertex_storage,","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"add_dropped_vertices activates feeding discarded vertices to the storage while use_extra_vertex_storage determines whether vertices from the storage are used in the algorithm. See Extra-lazification for a complete example.","category":"page"},{"location":"advanced/#Miscellaneous","page":"Advanced features","title":"Miscellaneous","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Emphasis: All solvers support emphasis (parameter Emphasis) to either exploit vectorized linear algebra or be memory efficient, e.g., for large-scale instances\nVarious caching strategies for the lazy implementations. Unbounded cache sizes (can get slow), bounded cache sizes as well as early returns once any sufficient vertex is found in the cache.\nOptionally all algorithms can be endowed with gradient momentum. This might help convergence especially in the stochastic context.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Coming soon: when the LMO can compute dual prices, then the Frank-Wolfe algorithms will return dual prices for the (approximately) optimal solutions (see Braun, Pokutta (2021)).","category":"page"},{"location":"advanced/#Rational-arithmetic","page":"Advanced features","title":"Rational arithmetic","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Example: examples/approximateCaratheodory.jl","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"We can solve the approximate Carathéodory problem with rational arithmetic to obtain rational approximations; see Combettes, Pokutta 2019 for some background about approximate Carathéodory and Conditioanl Gradients. We consider the simple instance of approximating the 0 over the probability simplex here:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"min_x in Delta(n) x^2","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"with n = 100.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Vanilla Frank-Wolfe Algorithm.\nEMPHASIS: blas STEPSIZE: rationalshortstep EPSILON: 1.0e-7 max_iteration: 100 TYPE: Rational{BigInt}\n\n───────────────────────────────────────────────────────────────────────────────────\n Type Iteration Primal Dual Dual Gap Time\n───────────────────────────────────────────────────────────────────────────────────\n I 0 1.000000e+00 -1.000000e+00 2.000000e+00 1.540385e-01\n FW 10 9.090909e-02 -9.090909e-02 1.818182e-01 2.821186e-01\n FW 20 4.761905e-02 -4.761905e-02 9.523810e-02 3.027964e-01\n FW 30 3.225806e-02 -3.225806e-02 6.451613e-02 3.100331e-01\n FW 40 2.439024e-02 -2.439024e-02 4.878049e-02 3.171654e-01\n FW 50 1.960784e-02 -1.960784e-02 3.921569e-02 3.244207e-01\n FW 60 1.639344e-02 -1.639344e-02 3.278689e-02 3.326185e-01\n FW 70 1.408451e-02 -1.408451e-02 2.816901e-02 3.418239e-01\n FW 80 1.234568e-02 -1.234568e-02 2.469136e-02 3.518750e-01\n FW 90 1.098901e-02 -1.098901e-02 2.197802e-02 3.620287e-01\n Last 1.000000e-02 1.000000e-02 0.000000e+00 4.392171e-01\n───────────────────────────────────────────────────────────────────────────────────\n\n 0.600608 seconds (3.83 M allocations: 111.274 MiB, 12.97% gc time)\n\nOutput type of solution: Rational{BigInt}","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The solution returned is rational as we can see and in fact the exactly optimal solution:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"x = Rational{BigInt}[1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100]","category":"page"},{"location":"advanced/#Large-scale-problems","page":"Advanced features","title":"Large-scale problems","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Example: examples/large_scale.jl","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The package is built to scale well, for those conditional gradients variants that can scale well. For example, Away-Step Frank-Wolfe and Pairwise Conditional Gradients do in most cases not scale well because they need to maintain active sets and maintaining them can be very expensive. Similarly, line search methods might become prohibitive at large sizes. However if we consider scale-friendly variants, e.g., the vanilla Frank-Wolfe algorithm with the agnostic step size rule or short step rule, then these algorithms can scale well to extreme sizes esentially only limited by the amount of memory available. However even for these methods that tend to scale well, allocation of memory itself can be very slow when you need to allocate gigabytes of memory for a single gradient computation.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The package is build to support extreme sizes with a special memory efficient emphasis emphasis=FrankWolfe.memory, which minimizes expensive memory allocations and performs as many operations in-place as possible.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Here is an example of a run with 1e9 variables. Each gradient is around 7.5 GB in size. Here is the output of the run broken down into pieces:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Size of single vector (Float64): 7629.39453125 MB\nTesting f... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:23\nTesting grad... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:23\nTesting lmo... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:29\nTesting dual gap... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:46\nTesting update... (Emphasis: blas) 100%|███████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:35\nTesting update... (Emphasis: memory) 100%|█████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:58\n ──────────────────────────────────────────────────────────────────────────\n Time Allocations\n ────────────────────── ───────────────────────\n Tot / % measured: 278s / 31.4% 969GiB / 30.8%\n\n Section ncalls time %tot avg alloc %tot avg\n ──────────────────────────────────────────────────────────────────────────\n update (blas) 10 36.1s 41.3% 3.61s 149GiB 50.0% 14.9GiB\n lmo 10 18.4s 21.1% 1.84s 0.00B 0.00% 0.00B\n grad 10 12.8s 14.6% 1.28s 74.5GiB 25.0% 7.45GiB\n f 10 12.7s 14.5% 1.27s 74.5GiB 25.0% 7.45GiB\n update (memory) 10 5.00s 5.72% 500ms 0.00B 0.00% 0.00B\n dual gap 10 2.40s 2.75% 240ms 0.00B 0.00% 0.00B\n ──────────────────────────────────────────────────────────────────────────","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The above is the optional benchmarking of the oracles that we provide to understand how fast crucial parts of the algorithms are, mostly notably oracle evaluations, the update of the iterate and the computation of the dual gap. As you can see if you compare update (blas) vs. update (memory), the normal update when we use BLAS requires an additional 14.9GB of memory on top of the gradient etc whereas the update (memory) (the memory emphasis mode) does not consume any extra memory. This is also reflected in the computational times: the BLAS version requires 3.61 seconds on average to update the iterate, while the memory emphasis version requires only 500ms. In fact none of the crucial components in the algorithm consume any memory when run in memory efficient mode. Now let us look at the actual footprint of the whole algorithm:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Vanilla Frank-Wolfe Algorithm.\nEMPHASIS: memory STEPSIZE: agnostic EPSILON: 1.0e-7 MAXITERATION: 1000 TYPE: Float64\nMOMENTUM: nothing GRADIENTTYPE: Nothing\nWARNING: In memory emphasis mode iterates are written back into x0!\n\n─────────────────────────────────────────────────────────────────────────────────────────────────\n Type Iteration Primal Dual Dual Gap Time It/sec\n─────────────────────────────────────────────────────────────────────────────────────────────────\n I 0 1.000000e+00 -1.000000e+00 2.000000e+00 8.783523e+00 0.000000e+00\n FW 100 1.326732e-02 -1.326733e-02 2.653465e-02 4.635923e+02 2.157068e-01\n FW 200 6.650080e-03 -6.650086e-03 1.330017e-02 9.181294e+02 2.178342e-01\n FW 300 4.437059e-03 -4.437064e-03 8.874123e-03 1.372615e+03 2.185609e-01\n FW 400 3.329174e-03 -3.329180e-03 6.658354e-03 1.827260e+03 2.189070e-01\n FW 500 2.664003e-03 -2.664008e-03 5.328011e-03 2.281865e+03 2.191190e-01\n FW 600 2.220371e-03 -2.220376e-03 4.440747e-03 2.736387e+03 2.192672e-01\n FW 700 1.903401e-03 -1.903406e-03 3.806807e-03 3.190951e+03 2.193703e-01\n FW 800 1.665624e-03 -1.665629e-03 3.331253e-03 3.645425e+03 2.194532e-01\n FW 900 1.480657e-03 -1.480662e-03 2.961319e-03 4.099931e+03 2.195159e-01\n FW 1000 1.332665e-03 -1.332670e-03 2.665335e-03 4.554703e+03 2.195533e-01\n Last 1000 1.331334e-03 -1.331339e-03 2.662673e-03 4.559822e+03 2.195261e-01\n─────────────────────────────────────────────────────────────────────────────────────────────────\n\n4560.661203 seconds (7.41 M allocations: 112.121 GiB, 0.01% gc time)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"As you can see the algorithm ran for about 4600 secs (single-thread run) allocating 112.121 GiB of memory throughout. So how does this average out to the per-iteration cost in terms of memory: 112.121 / 7.45 / 1000 = 0.0151 so about 15.1MiB per iteration which is much less than the size of the gradient and in fact only stems from the reporting here.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"NB. This example highlights also one of the great features of first-order methods and conditional gradients in particular: we have dimension-independent convergence rates. In fact, we contract the primal gap as 2LD^2 / (t+2) (for the simple agnostic rule) and, e.g., if the feasible region is the probability simplex with D = sqrt(2) and the function has bounded Lipschitzness, e.g., the function || x - xp ||^2 has L = 2, then the convergence rate is completely independent of the input size. The only thing that limits scaling is how much memory you have available and whether you can stomach the (linear) per-iteration cost.","category":"page"},{"location":"advanced/#Iterate-and-atom-expected-interface","page":"Advanced features","title":"Iterate and atom expected interface","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Frank-Wolfe can work on iterate beyond plain vectors, for example with any array-like object. Broadly speaking, the iterate type is assumed to behave as the member of a Hilbert space and optionally be mutable. Assuming the iterate type is IT, some methods must be implemented, with their usual semantics:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Base.similar(::IT)\nBase.similar(::IT, ::Type{T})\nBase.eltype(::IT)\nBase.copy(::IT)\n\nBase.:+(x1::IT, x2::IT)\nBase.:*(scalar::Real, x::IT)\nBase.:-(x1::IT, x2::IT)\nLinearAlgebra.dot(x1::IT, x2::IT)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)\nFrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)","category":"page"},{"location":"reference/1_algorithms/#Algorithms","page":"Algorithms","title":"Algorithms","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"This section contains all main algorithms of the package. These are the ones typical users will call.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"The typical signature for these algorithms is:","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"my_algorithm(f, grad!, lmo, x0)","category":"page"},{"location":"reference/1_algorithms/#Standard-algorithms","page":"Algorithms","title":"Standard algorithms","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"fw_algorithms.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.frank_wolfe-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.frank_wolfe","text":"frank_wolfe(f, grad!, lmo, x0; ...)\n\nSimplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:\n\nx final iterate\nv last vertex from the LMO\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.lazified_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.lazified_conditional_gradient","text":"lazified_conditional_gradient(f, grad!, lmo_base, x0; ...)\n\nSimilar to FrankWolfe.frank_wolfe but lazyfying the LMO: each call is stored in a cache, which is looked up first for a good-enough direction. The cache used is a FrankWolfe.MultiCacheLMO or a FrankWolfe.VectorCacheLMO depending on whether the provided cache_size option is finite.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.stochastic_frank_wolfe-Tuple{FrankWolfe.StochasticObjective, Any, Any}","page":"Algorithms","title":"FrankWolfe.stochastic_frank_wolfe","text":"stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)\n\nStochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.\n\nKeyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.\n\nSimilarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.block_coordinate_frank_wolfe","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.block_coordinate_frank_wolfe","page":"Algorithms","title":"FrankWolfe.block_coordinate_frank_wolfe","text":"block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}\n\nBlock-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.\n\nThe method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ntraj_data vector of trajectory information.\n\nSee S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.\n\n\n\n\n\n","category":"function"},{"location":"reference/1_algorithms/#Active-set-based-methods","page":"Algorithms","title":"Active-set based methods","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"The following algorithms maintain the representation of the iterates as a convex combination of vertices.","category":"page"},{"location":"reference/1_algorithms/#Away-step","page":"Algorithms","title":"Away-step","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"afw.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.away_frank_wolfe-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.away_frank_wolfe","text":"away_frank_wolfe(f, grad!, lmo, x0; ...)\n\nFrank-Wolfe with away steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Blended-Conditional-Gradient","page":"Algorithms","title":"Blended Conditional Gradient","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"blended_cg.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex-Tuple{Any, Any, Any, Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex","text":"accelerated_simplex_gradient_descent_over_probability_simplex\n\nMinimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using Nesterov's accelerated gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.blended_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.blended_conditional_gradient","text":"blended_conditional_gradient(f, grad!, lmo, x0)\n\nEntry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. \"Blended conditonal gradients\" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.build_reduced_problem-Tuple{AbstractVector{var\"#s326\"} where var\"#s326\"<:FrankWolfe.ScaledHotVector, Any, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.build_reduced_problem","text":"build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)\n\nGiven an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:\n\nf(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ\n\nIn the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.lp_separation_oracle-Tuple{FrankWolfe.LinearMinimizationOracle, FrankWolfe.ActiveSet, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.lp_separation_oracle","text":"Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.\n\ninplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.minimize_over_convex_hull!-Tuple{Any, Any, Any, FrankWolfe.ActiveSet, Any, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.minimize_over_convex_hull!","text":"minimize_over_convex_hull!\n\nGiven a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.\n\nIt will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.projection_simplex_sort-Tuple{Any}","page":"Algorithms","title":"FrankWolfe.projection_simplex_sort","text":"projection_simplex_sort(x; s=1.0)\n\nPerform a projection onto the probability simplex of radius s using a sorting algorithm.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.simplex_gradient_descent_over_convex_hull","page":"Algorithms","title":"FrankWolfe.simplex_gradient_descent_over_convex_hull","text":"simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)\n\nMinimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.\n\n\n\n\n\n","category":"function"},{"location":"reference/1_algorithms/#FrankWolfe.simplex_gradient_descent_over_probability_simplex-Tuple{Any, Any, Any, Any, Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.simplex_gradient_descent_over_probability_simplex","text":"simplex_gradient_descent_over_probability_simplex\n\nMinimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.strong_frankwolfe_gap-Tuple{Any}","page":"Algorithms","title":"FrankWolfe.strong_frankwolfe_gap","text":"Checks the strong Frank-Wolfe gap for the reduced problem.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.strong_frankwolfe_gap_probability_simplex-Tuple{Any, Any}","page":"Algorithms","title":"FrankWolfe.strong_frankwolfe_gap_probability_simplex","text":"strong_frankwolfe_gap_probability_simplex\n\nCompute the Strong-Wolfe gap over the unit probability simplex given a gradient.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Blended-Pairwise-Conditional-Gradient","page":"Algorithms","title":"Blended Pairwise Conditional Gradient","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"pairwise.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.blended_pairwise_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.blended_pairwise_conditional_gradient","text":"blended_pairwise_conditional_gradient(f, grad!, lmo, x0; kwargs...)\n\nImplements the BPCG algorithm from Tsuji, Tanaka, Pokutta (2021). The method uses an active set of current vertices. Unlike away-step, it transfers weight from an away vertex to another vertex of the active set.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.blended_pairwise_conditional_gradient-Tuple{Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.blended_pairwise_conditional_gradient","text":"blended_pairwise_conditional_gradient(f, grad!, lmo, active_set::ActiveSet; kwargs...)\n\nWarm-starts BPCG with a pre-defined active_set.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Alternating-Methods","page":"Algorithms","title":"Alternating Methods","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Problems over intersections of convex sets, i.e. ","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"min_x in bigcap_i=1^n P_i f(x)","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"pose a challenge as one has to combine the information of two or more LMOs.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function. ","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"alternating_methods.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.alternating_linear_minimization-Union{Tuple{N}, Tuple{Any, Any, Any, Tuple{Vararg{FrankWolfe.LinearMinimizationOracle, N}}, Any}} where N","page":"Algorithms","title":"FrankWolfe.alternating_linear_minimization","text":"alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}\n\nAlternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ninfeas sum of squared, pairwise distances between iterates \ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.alternating_projections-Union{Tuple{N}, Tuple{Tuple{Vararg{FrankWolfe.LinearMinimizationOracle, N}}, Any}} where N","page":"Algorithms","title":"FrankWolfe.alternating_projections","text":"alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}\n\nComputes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\ndual_gap final Frank-Wolfe gap\ninfeas sum of squared, pairwise distances between iterates \ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Index","page":"Algorithms","title":"Index","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Pages = [\"2_algorithms.md\"]","category":"page"},{"location":"reference/0_reference/#API-Reference","page":"API Reference","title":"API Reference","text":"","category":"section"},{"location":"reference/0_reference/","page":"API Reference","title":"API Reference","text":"The pages in this section reference the documentation for specific types and functions.","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"EditURL = \"../../../examples/docs_5_blended_cg.jl\"","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_5_blended_cg/#Blended-Conditional-Gradients","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"","category":"section"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"The FW and AFW algorithms, and their lazy variants share one feature: they attempt to make primal progress over a reduced set of vertices. The AFW algorithm does this through away steps (which do not increase the cardinality of the active set), and the lazy variants do this through the use of previously exploited vertices. A third strategy that one can follow is to explicitly blend Frank-Wolfe steps with gradient descent steps over the convex hull of the active set (note that this can be done without requiring a projection oracle over C, thus making the algorithm projection-free). This results in the Blended Conditional Gradient (BCG) algorithm, which attempts to make as much progress as possible through the convex hull of the current active set S_t until it automatically detects that in order to make further progress it requires additional calls to the LMO.","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"See also Blended Conditional Gradients: the unconditioning of conditional gradients, Braun et al, 2019, https://arxiv.org/abs/1805.07311","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"using FrankWolfe\nusing LinearAlgebra\nusing Random\nusing SparseArrays\n\nn = 1000\nk = 10000\n\nRandom.seed!(41)\n\nmatrix = rand(n, n)\nhessian = transpose(matrix) * matrix\nlinear = rand(n)\nf(x) = dot(linear, x) + 0.5 * transpose(x) * hessian * x\nfunction grad!(storage, x)\n return storage .= linear + hessian * x\nend\nL = eigmax(hessian)","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"We run over the probability simplex and call the LMO to get an initial feasible point:","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"lmo = FrankWolfe.ProbabilitySimplexOracle(1.0);\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\n\ntarget_tolerance = 1e-5\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_accel_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=true,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=false,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_convex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\ndata = [trajectoryBCG_accel_simplex, trajectoryBCG_simplex, trajectoryBCG_convex]\nlabel = [\"BCG (accel simplex)\", \"BCG (simplex)\", \"BCG (convex)\"]\nplot_trajectories(data, label, xscalelog=true)\n\n\n\nmatrix = rand(n, n)\nhessian = transpose(matrix) * matrix\nlinear = rand(n)\nf(x) = dot(linear, x) + 0.5 * transpose(x) * hessian * x + 10\nfunction grad!(storage, x)\n return storage .= linear + hessian * x\nend\nL = eigmax(hessian)\n\nlmo = FrankWolfe.KSparseLMO(100, 100.0)\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_accel_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=true,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=false,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_convex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\ndata = [trajectoryBCG_accel_simplex, trajectoryBCG_simplex, trajectoryBCG_convex]\nlabel = [\"BCG (accel simplex)\", \"BCG (simplex)\", \"BCG (convex)\"]\nplot_trajectories(data, label, xscalelog=true)","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"EditURL = \"../../../examples/docs_10_alternating_methods.jl\"","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Alternating-methods","page":"Alternating methods","title":"Alternating methods","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"In this example we will compare FrankWolfe.alternating_linear_minimization and FrankWolfe.alternating_projections for a very simple feasibility problem.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"We consider the probability simplex","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"P = x in mathbbR^n colon sum_i=1^n x_i = 1 x_i geq 0 i=1dotsn ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"and a scaled, shifted ell^infty norm ball","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Q = -10^n ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"The goal is to find either a point in the intersection, x in P cap Q, or a pair of points, (x_P x_Q) in P times Q, which attains minimal distance between P and Q,","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"x_P - x_Q_2 = min_(xy) in P times Q x - y _2 ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"using FrankWolfe\ninclude(\"../examples/plot_utils.jl\")","category":"page"},{"location":"examples/docs_10_alternating_methods/#Setting-up-objective,-gradient-and-linear-minimization-oracles","page":"Alternating methods","title":"Setting up objective, gradient and linear minimization oracles","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Since we only consider the feasibility problem the objective function as well as the gradient are zero.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"n = 20\n\nf(x) = 0\n\nfunction grad!(storage, x)\n @. storage = zero(x)\nend\n\n\nlmo1 = FrankWolfe.ProbabilitySimplexOracle(1.0)\nlmo2 = FrankWolfe.ScaledBoundLInfNormBall(-ones(n), zeros(n))\nlmos = (lmo1, lmo2)\n\nx0 = rand(n)\n\ntarget_tolerance = 1e-6\n\ntrajectories = [];\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Running-Alternating-Linear-Minimization","page":"Alternating methods","title":"Running Alternating Linear Minimization","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"We run Alternating Linear Minimization (ALM) with FrankWolfe.block_coordinate_frank_wolfe. This method allows three different update orders, FullUpdate, CyclicUpdate and Stochasticupdate. Accordingly both blocks are updated either simulatenously, sequentially or in random order.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"for order in [FrankWolfe.FullUpdate(), FrankWolfe.CyclicUpdate(), FrankWolfe.StochasticUpdate()]\n\n _, _, _, _, _, alm_trajectory = FrankWolfe.alternating_linear_minimization(\n FrankWolfe.block_coordinate_frank_wolfe,\n f,\n grad!,\n lmos,\n x0,\n update_order=order,\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n )\n push!(trajectories, alm_trajectory)\nend","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(\n FrankWolfe.away_frank_wolfe,\n f,\n grad!,\n lmos,\n x0,\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n)\npush!(trajectories, afw_trajectory);\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Running-Alternating-Projections","page":"Alternating methods","title":"Running Alternating Projections","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(\n lmos,\n x0,\n trajectory=true,\n verbose=true,\n print_iter=100,\n epsilon=target_tolerance,\n)\npush!(trajectories, ap_trajectory);\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Plotting-the-resulting-trajectories","page":"Alternating methods","title":"Plotting the resulting trajectories","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"labels = [\"BCFW - Full\", \"BCFW - Cyclic\", \"BCFW - Stochastic\", \"AFW\", \"AP\"]\n\nplot_trajectories(trajectories, labels, xscalelog=true)","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"EditURL = \"../../../examples/docs_4_rational_opt.jl\"","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_4_rational_opt/#Exact-Optimization-with-Rational-Arithmetic","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"","category":"section"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"This example can be found in section 4.3 in the paper. The package allows for exact optimization with rational arithmetic. For this, it suffices to set up the LMO to be rational and choose an appropriate step-size rule as detailed below. For the LMOs included in the package, this simply means initializing the radius with a rational-compatible element type, e.g., 1, rather than a floating-point number, e.g., 1.0. Given that numerators and denominators can become quite large in rational arithmetic, it is strongly advised to base the used rationals on extended-precision integer types such as BigInt, i.e., we use Rational{BigInt}.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"The second requirement ensuring that the computation runs in rational arithmetic is a rational-compatible step-size rule. The most basic step-size rule compatible with rational optimization is the agnostic step-size rule with gamma_t = 2(2 + t). With this step-size rule, the gradient does not even need to be rational as long as the atom computed by the LMO is of a rational type. Assuming these requirements are met, all iterates and the computed solution will then be rational.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"using FrankWolfe\nusing LinearAlgebra\n\nn = 100\nk = n\n\nx = fill(big(1) // 100, n)\n\nf(x) = dot(x, x)\nfunction grad!(storage, x)\n @. storage = 2 * x\nend","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"pick feasible region radius needs to be integer or rational","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"lmo = FrankWolfe.ProbabilitySimplexOracle{Rational{BigInt}}(1)","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"compute some initial vertex","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"x0 = FrankWolfe.compute_extreme_point(lmo, zeros(n));\n\nx, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.Agnostic(),\n print_iter=k / 10,\n verbose=true,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n);\n\nprintln(\"\\nOutput type of solution: \", eltype(x))","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as gamma_t=fraclangle nabla f(x_t)x_t-v_trangle2Lx_t-v_t^2. However, as this step size depends on an upper bound on the Lipschitz constant L as well as the inner product with the gradient nabla f(x_t), both have to be of a rational type.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2 // 1),\n print_iter=k / 10,\n verbose=true,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n);\nnothing #hide","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"EditURL = \"../../../examples/docs_0_fw_visualized.jl\"","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_0_fw_visualized/#Visualization-of-Frank-Wolfe-running-on-a-2-dimensional-polytope","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"This example provides an intuitive view of the Frank-Wolfe algorithm by running it on a polyhedral set with a quadratic function. The Linear Minimization Oracle (LMO) corresponds to a call to a generic simplex solver from MathOptInterface.jl (MOI).","category":"page"},{"location":"examples/docs_0_fw_visualized/#Import-and-setup","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Import and setup","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We first import the necessary packages, including Polyhedra to visualize the feasible set.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"using LinearAlgebra\nusing FrankWolfe\n\nimport MathOptInterface\nconst MOI = MathOptInterface\nusing GLPK\n\nusing Polyhedra\nusing Plots","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We can then define the objective function, here the squared distance to a point in the place, and its in-place gradient.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"n = 2\ny = [3.2, 0.5]\n\nfunction f(x)\n return 1 / 2 * norm(x - y)^2\nend\nfunction grad!(storage, x)\n @. storage = x - y\nend","category":"page"},{"location":"examples/docs_0_fw_visualized/#Custom-callback","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Custom callback","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"FrankWolfe.jl lets users define custom callbacks to record information about each iteration. In that case, the callback will copy the current iterate x, the current vertex v, and the current step size gamma to an array thanks to a closure. We then declare the array and the callback over this array. Each iteration will then push to this array.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"function build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(trajectory_arr, (copy(state.x), copy(state.v), state.gamma))\n end\nend\n\niterates_information_vector = []\ncallback = build_callback(iterates_information_vector)","category":"page"},{"location":"examples/docs_0_fw_visualized/#Creating-the-Linear-Minimization-Oracle","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Creating the Linear Minimization Oracle","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The LMO is defined as a call to a linear optimization solver, each iteration resets the objective and calls the solver. The linear constraints must be defined only once at the beginning and remain identical along iterations. We use here MathOptInterface directly but the constraints could also be defined with JuMP or Convex.jl.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"o = GLPK.Optimizer()\nx = MOI.add_variables(o, n)\n\n# −x + y ≤ 2\nc1 = MOI.add_constraint(o, -1.0x[1] + x[2], MOI.LessThan(2.0))\n\n# x + 2 y ≤ 4\nc2 = MOI.add_constraint(o, x[1] + 2.0x[2], MOI.LessThan(4.0))\n\n# −2 x − y ≤ 1\nc3 = MOI.add_constraint(o, -2.0x[1] - x[2], MOI.LessThan(1.0))\n\n# x − 2 y ≤ 2\nc4 = MOI.add_constraint(o, x[1] - 2.0x[2], MOI.LessThan(2.0))\n\n# x ≤ 2\nc5 = MOI.add_constraint(o, x[1] + 0.0x[2], MOI.LessThan(2.0))","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The LMO is then built by wrapping the current MOI optimizer","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"lmo_moi = FrankWolfe.MathOptLMO(o)","category":"page"},{"location":"examples/docs_0_fw_visualized/#Calling-Frank-Wolfe","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Calling Frank-Wolfe","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We can now compute an initial starting point from any direction and call the Frank-Wolfe algorithm. Note that we copy x0 before passing it to the algorithm because it is modified in-place by frank_wolfe.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"x0 = FrankWolfe.compute_extreme_point(lmo_moi, zeros(n))\n\nxfinal, vfinal, primal_value, dual_gap, traj_data = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_moi,\n copy(x0),\n line_search=FrankWolfe.Adaptive(),\n max_iteration=10,\n epsilon=1e-8,\n callback=callback,\n verbose=true,\n print_iter=1,\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We now collect the iterates and vertices across iterations.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"iterates = Vector{Vector{Float64}}()\npush!(iterates, x0)\nvertices = Vector{Vector{Float64}}()\nfor s in iterates_information_vector\n push!(iterates, s[1])\n push!(vertices, s[2])\nend","category":"page"},{"location":"examples/docs_0_fw_visualized/#Plotting-the-algorithm-run","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Plotting the algorithm run","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We define another method for f adapted to plot its contours.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"function f(x1, x2)\n x = [x1, x2]\n return f(x)\nend\n\nxlist = collect(range(-1, 3, step=0.2))\nylist = collect(range(-1, 3, step=0.2))\n\nX = repeat(reshape(xlist, 1, :), length(ylist), 1)\nY = repeat(ylist, 1, length(xlist))","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The feasible space is represented using Polyhedra.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"h =\n HalfSpace([-1, 1], 2) ∩ HalfSpace([1, 2], 4) ∩ HalfSpace([-2, -1], 1) ∩ HalfSpace([1, -2], 2) ∩\n HalfSpace([1, 0], 2)\n\np = polyhedron(h)\n\np1 = contour(xlist, ylist, f, fill=true, line_smoothing=0.85)\nplot(p1, opacity=0.5)\nplot!(\n p,\n ratio=:equal,\n opacity=0.5,\n label=\"feasible region\",\n framestyle=:zerolines,\n legend=true,\n color=:blue,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"Finally, we add all iterates and vertices to the plot.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"colors = [\"gold\", \"purple\", \"darkorange2\", \"firebrick3\"]\niterates = unique!(iterates)\nfor i in 1:3\n scatter!(\n [iterates[i][1]],\n [iterates[i][2]],\n label=string(\"x_\", i - 1),\n markersize=6,\n color=colors[i],\n )\nend\nscatter!(\n [last(iterates)[1]],\n [last(iterates)[2]],\n label=string(\"x_\", length(iterates) - 1),\n markersize=6,\n color=last(colors),\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"plot chosen vertices","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label=\"v_1\")\nscatter!(\n [vertices[2][1]],\n [vertices[2][2]],\n m=:diamond,\n markersize=6,\n color=colors[2],\n label=\"v_2\",\n legend=:outerleft,\n colorbar=true,\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"EditURL = \"../../../examples/docs_3_matrix_completion.jl\"","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_3_matrix_completion/#Matrix-Completion","page":"Matrix Completion","title":"Matrix Completion","text":"","category":"section"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"We present another example that is about matrix completion. The idea is, given a partially observed matrix YinmathbbR^mtimes n, to find XinmathbbR^mtimes n to minimize the sum of squared errors from the observed entries while 'completing' the matrix Y, i.e. filling the unobserved entries to match Y as good as possible. A detailed explanation can be found in section 4.2 of the paper. We will try to solve","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"min_X_*le tau sum_(ij)inmathcalI (X_ij-Y_ij)^2","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"where tau0, X_* is the nuclear norm, and mathcalI denotes the indices of the observed entries. We will use FrankWolfe.NuclearNormLMO and compare our Frank-Wolfe implementation with a Projected Gradient Descent (PGD) algorithm which, after each gradient descent step, projects the iterates back onto the nuclear norm ball. We use a movielens dataset for comparison.","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"using FrankWolfe\nusing ZipFile, DataFrames, CSV\n\nusing Random\nusing Plots\n\nusing Profile\n\nimport Arpack\nusing SparseArrays, LinearAlgebra\n\nusing LaTeXStrings\n\ntemp_zipfile = download(\"http://files.grouplens.org/datasets/movielens/ml-latest-small.zip\")\n\nzarchive = ZipFile.Reader(temp_zipfile)\n\nmovies_file = zarchive.files[findfirst(f -> occursin(\"movies\", f.name), zarchive.files)]\nmovies_frame = CSV.read(movies_file, DataFrame)\n\nratings_file = zarchive.files[findfirst(f -> occursin(\"ratings\", f.name), zarchive.files)]\nratings_frame = CSV.read(ratings_file, DataFrame)\n\nusers = unique(ratings_frame[:, :userId])\nmovies = unique(ratings_frame[:, :movieId])\n\n@assert users == eachindex(users)\nmovies_revert = zeros(Int, maximum(movies))\nfor (idx, m) in enumerate(movies)\n movies_revert[m] = idx\nend\nmovies_indices = [movies_revert[idx] for idx in ratings_frame[:, :movieId]]\n\nconst rating_matrix = sparse(\n ratings_frame[:, :userId],\n movies_indices,\n ratings_frame[:, :rating],\n length(users),\n length(movies),\n)\n\nmissing_rate = 0.05\n\nRandom.seed!(42)\n\nconst missing_ratings = Tuple{Int,Int}[]\nconst present_ratings = Tuple{Int,Int}[]\nlet\n (I, J, V) = SparseArrays.findnz(rating_matrix)\n for idx in eachindex(I)\n if V[idx] > 0\n if rand() <= missing_rate\n push!(missing_ratings, (I[idx], J[idx]))\n else\n push!(present_ratings, (I[idx], J[idx]))\n end\n end\n end\nend\n\nfunction f(X)\n r = 0.0\n for (i, j) in present_ratings\n r += 0.5 * (X[i, j] - rating_matrix[i, j])^2\n end\n return r\nend\n\nfunction grad!(storage, X)\n storage .= 0\n for (i, j) in present_ratings\n storage[i, j] = X[i, j] - rating_matrix[i, j]\n end\n return nothing\nend\n\nfunction test_loss(X)\n r = 0.0\n for (i, j) in missing_ratings\n r += 0.5 * (X[i, j] - rating_matrix[i, j])^2\n end\n return r\nend\n\nfunction project_nuclear_norm_ball(X; radius=1.0)\n U, sing_val, Vt = svd(X)\n if (sum(sing_val) <= radius)\n return X, -norm_estimation * U[:, 1] * Vt[:, 1]'\n end\n sing_val = FrankWolfe.projection_simplex_sort(sing_val, s=radius)\n return U * Diagonal(sing_val) * Vt', -norm_estimation * U[:, 1] * Vt[:, 1]'\nend\n\nnorm_estimation = 10 * Arpack.svds(rating_matrix, nsv=1, ritzvec=false)[1].S[1]\n\nconst lmo = FrankWolfe.NuclearNormLMO(norm_estimation)\nconst x0 = FrankWolfe.compute_extreme_point(lmo, ones(size(rating_matrix)))\nconst k = 10\n\ngradient = spzeros(size(x0)...)\ngradient_aux = spzeros(size(x0)...)\n\nfunction build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(trajectory_arr, (FrankWolfe.callback_state(state)..., test_loss(state.x)))\n end\nend","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"The smoothness constant is estimated:","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"num_pairs = 100\nL_estimate = -Inf\nfor i in 1:num_pairs\n global L_estimate\n u1 = rand(size(x0, 1))\n u1 ./= sum(u1)\n u1 .*= norm_estimation\n v1 = rand(size(x0, 2))\n v1 ./= sum(v1)\n x = FrankWolfe.RankOneMatrix(u1, v1)\n u2 = rand(size(x0, 1))\n u2 ./= sum(u2)\n u2 .*= norm_estimation\n v2 = rand(size(x0, 2))\n v2 ./= sum(v2)\n y = FrankWolfe.RankOneMatrix(u2, v2)\n grad!(gradient, x)\n grad!(gradient_aux, y)\n new_L = norm(gradient - gradient_aux) / norm(x - y)\n if new_L > L_estimate\n L_estimate = new_L\n end\nend","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"We can now perform projected gradient descent:","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"xgd = Matrix(x0)\nfunction_values = Float64[]\ntiming_values = Float64[]\nfunction_test_values = Float64[]\n\nls = FrankWolfe.Backtracking()\nls_storage = similar(xgd)\ntime_start = time_ns()\nfor _ in 1:k\n f_val = f(xgd)\n push!(function_values, f_val)\n push!(function_test_values, test_loss(xgd))\n push!(timing_values, (time_ns() - time_start) / 1e9)\n @info f_val\n grad!(gradient, xgd)\n xgd_new, vertex = project_nuclear_norm_ball(xgd - gradient / L_estimate, radius=norm_estimation)\n gamma = FrankWolfe.perform_line_search(\n ls,\n 1,\n f,\n grad!,\n gradient,\n xgd,\n xgd - xgd_new,\n 1.0,\n ls_storage,\n FrankWolfe.InplaceEmphasis(),\n )\n @. xgd -= gamma * (xgd - xgd_new)\nend\n\ntrajectory_arr_fw = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_fw)\nxfin, _, _, _, traj_data = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=10 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\ntrajectory_arr_lazy = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_lazy)\nxlazy, _, _, _, _ = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=10 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\n\ntrajectory_arr_lazy_ref = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_lazy_ref)\nxlazy, _, _, _, _ = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=50 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\nfw_test_values = getindex.(trajectory_arr_fw, 6)\nlazy_test_values = getindex.(trajectory_arr_lazy, 6)\n\nresults = Dict(\n \"svals_gd\" => svdvals(xgd),\n \"svals_fw\" => svdvals(xfin),\n \"svals_lcg\" => svdvals(xlazy),\n \"fw_test_values\" => fw_test_values,\n \"lazy_test_values\" => lazy_test_values,\n \"trajectory_arr_fw\" => trajectory_arr_fw,\n \"trajectory_arr_lazy\" => trajectory_arr_lazy,\n \"function_values_gd\" => function_values,\n \"function_values_test_gd\" => function_test_values,\n \"timing_values_gd\" => timing_values,\n \"trajectory_arr_lazy_ref\" => trajectory_arr_lazy_ref,\n)\n\nref_optimum = results[\"trajectory_arr_lazy_ref\"][end][2]\n\niteration_list = [\n [x[1] + 1 for x in results[\"trajectory_arr_fw\"]],\n [x[1] + 1 for x in results[\"trajectory_arr_lazy\"]],\n collect(1:1:length(results[\"function_values_gd\"])),\n]\ntime_list = [\n [x[5] for x in results[\"trajectory_arr_fw\"]],\n [x[5] for x in results[\"trajectory_arr_lazy\"]],\n results[\"timing_values_gd\"],\n]\nprimal_gap_list = [\n [x[2] - ref_optimum for x in results[\"trajectory_arr_fw\"]],\n [x[2] - ref_optimum for x in results[\"trajectory_arr_lazy\"]],\n [x - ref_optimum for x in results[\"function_values_gd\"]],\n]\ntest_list =\n [results[\"fw_test_values\"], results[\"lazy_test_values\"], results[\"function_values_test_gd\"]]\n\nlabel = [L\"\\textrm{FW}\", L\"\\textrm{L-CG}\", L\"\\textrm{GD}\"]\n\nplot_results(\n [primal_gap_list, primal_gap_list, test_list, test_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [L\"\\textrm{Iteration}\", L\"\\textrm{Time}\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [\n L\"\\textrm{Primal Gap}\",\n L\"\\textrm{Primal Gap}\",\n L\"\\textrm{Test Error}\",\n L\"\\textrm{Test Error}\",\n ],\n xscalelog=[:log, :identity, :log, :identity],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"This page was generated using Literate.jl.","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/ZIB-IOL/FrankWolfe.jl/blob/master/README.md\"","category":"page"},{"location":"#FrankWolfe.jl","page":"Home","title":"FrankWolfe.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Build Status) (Image: Dev) (Image: Stable) (Image: Coverage) (Image: Genie Downloads)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is a toolbox for Frank-Wolfe and conditional gradients algorithms.","category":"page"},{"location":"#Overview","page":"Home","title":"Overview","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Frank-Wolfe algorithms were designed to solve optimization problems of the form min_x C f(x), where f is a differentiable convex function and C is a convex and compact set. They are especially useful when we know how to optimize a linear function over C in an efficient way.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A paper presenting the package with mathematical explanations and numerous examples can be found here:","category":"page"},{"location":"","page":"Home","title":"Home","text":"FrankWolfe.jl: A high-performance and flexible toolbox for Frank-Wolfe algorithms and Conditional Gradients.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The most recent release is available via the julia package manager, e.g., with","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.add(\"FrankWolfe\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"or the master branch:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Pkg.add(url=\"https://github.com/ZIB-IOL/FrankWolfe.jl\", rev=\"master\")","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Let's say we want to minimize the Euclidian norm over the probability simplex Δ. Using FrankWolfe.jl, this is what the code looks like (in dimension 3):","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using FrankWolfe\n\njulia> f(p) = sum(abs2, p) # objective function\n\njulia> grad!(storage, p) = storage .= 2p # in-place gradient computation\n\n# # function d ⟻ argmin ⟨p,d⟩ st. p ∈ Δ\njulia> lmo = FrankWolfe.ProbabilitySimplexOracle(1.)\n\njulia> p0 = [1., 0., 0.]\n\njulia> p_opt, _ = frank_wolfe(f, grad!, lmo, p0; verbose=true);\n\nVanilla Frank-Wolfe Algorithm.\nMEMORY_MODE: FrankWolfe.InplaceEmphasis() STEPSIZE: Adaptive EPSILON: 1.0e-7 MAXITERATION: 10000 TYPE: Float64\nMOMENTUM: nothing GRADIENTTYPE: Nothing\n[ Info: In memory_mode memory iterates are written back into x0!\n\n-------------------------------------------------------------------------------------------------\n Type Iteration Primal Dual Dual Gap Time It/sec\n-------------------------------------------------------------------------------------------------\n I 1 1.000000e+00 -1.000000e+00 2.000000e+00 0.000000e+00 Inf\n Last 24 3.333333e-01 3.333332e-01 9.488992e-08 1.533181e+00 1.565373e+01\n-------------------------------------------------------------------------------------------------\n\njulia> p_opt\n3-element Vector{Float64}:\n 0.33333334349923327\n 0.33333332783841896\n 0.3333333286623478","category":"page"},{"location":"#Documentation-and-examples","page":"Home","title":"Documentation and examples","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To explore the content of the package, go to the documentation.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Beyond those presented in the documentation, many more use cases are implemented in the examples folder. To run them, you will need to activate the test environment, which can be done simply with TestEnv.jl (we recommend you install it in your base Julia).","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using TestEnv\n\njulia> TestEnv.activate()\n\"/tmp/jl_Ux8wKE/Project.toml\"\n\n# necessary for plotting\njulia> include(\"examples/plot_utils.jl\")\njulia> include(\"examples/linear_regression.jl\")\n...","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Plots\nusing FrankWolfe\n\ninclude(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\"))","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"EditURL = \"../../../examples/docs_8_callback_and_tracking.jl\"","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#Tracking,-counters-and-custom-callbacks-for-Frank-Wolfe","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In this example we will run the standard Frank-Wolfe algorithm while tracking the number of calls to the different oracles, namely function, gradient evaluations, and LMO calls. In order to track each of these metrics, a \"Tracking\" version of the Gradient, LMO and Function methods have to be supplied to the frank_wolfe algorithm, which are wrapping a standard one.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"using FrankWolfe\nusing Test\nusing LinearAlgebra\nusing FrankWolfe: ActiveSet","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#The-trackers-for-primal-objective,-gradient-and-LMO.","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"The trackers for primal objective, gradient and LMO.","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In order to count the number of function calls, a TrackingObjective is built from a standard objective function f, which will act in the same way as the original function does, but with an additional .counter field which tracks the number of calls.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"f(x) = norm(x)^2\ntf = FrankWolfe.TrackingObjective(f)\n@show tf.counter\ntf(rand(3))\n@show tf.counter\n# Resetting the counter\ntf.counter = 0;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Similarly, the tgrad! function tracks the number of gradient calls:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function grad!(storage, x)\n return storage .= 2x\nend\ntgrad! = FrankWolfe.TrackingGradient(grad!)\n@show tgrad!.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The tracking LMO operates in a similar fashion and tracks the number of compute_extreme_point calls.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"lmo_prob = FrankWolfe.ProbabilitySimplexOracle(1)\ntlmo_prob = FrankWolfe.TrackingLMO(lmo_prob)\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The tracking LMO can be applied for all types of LMOs and even in a nested way, which can be useful to track the number of calls to a lazified oracle. We can now pass the tracking versions tf, tgrad and tlmo_prob to frank_wolfe and display their call counts after the optimization process.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"x0 = FrankWolfe.compute_extreme_point(tlmo_prob, ones(5))\nfw_results = FrankWolfe.frank_wolfe(\n tf,\n tgrad!,\n tlmo_prob,\n x0,\n max_iteration=1000,\n line_search=FrankWolfe.Agnostic(),\n callback=nothing,\n)\n\n@show tf.counter\n@show tgrad!.counter\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#Adding-a-custom-callback","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Adding a custom callback","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"A callback is a user-defined function called at every iteration of the algorithm with the current state passed as a named tuple.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"We can implement our own callback, for example with:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Extended trajectory logging, similar to the trajectory = true option\nStop criterion after a certain number of calls to the primal objective function","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"To reuse the same tracking functions, Let us first reset their counters:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"tf.counter = 0\ntgrad!.counter = 0\ntlmo_prob.counter = 0;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The storage variable stores in the trajectory array the number of calls to each oracle at each iteration.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"storage = []","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Now define our own trajectory logging function that extends the five default logged elements (iterations, primal, dual, dual_gap, time) with \".counter\" field arguments present in the tracking functions.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function push_tracking_state(state, storage)\n base_tuple = FrankWolfe.callback_state(state)\n if state.lmo isa FrankWolfe.CachedLinearMinimizationOracle\n complete_tuple = tuple(\n base_tuple...,\n state.gamma,\n state.f.counter,\n state.grad!.counter,\n state.lmo.inner.counter,\n )\n else\n complete_tuple = tuple(\n base_tuple...,\n state.gamma,\n state.f.counter,\n state.grad!.counter,\n state.lmo.counter,\n )\n end\n return push!(storage, complete_tuple)\nend","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In case we want to stop the frank_wolfe algorithm prematurely after a certain condition is met, we can return a boolean stop criterion false. Here, we will implement a callback that terminates the algorithm if the primal objective function is evaluated more than 500 times.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function make_callback(storage)\n return function callback(state, args...)\n push_tracking_state(state, storage)\n return state.f.counter < 500\n end\nend\n\ncallback = make_callback(storage)","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"We can show the difference between this standard run and the lazified conditional gradient algorithm which does not call the LMO at each iteration.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"FrankWolfe.lazified_conditional_gradient(\n tf,\n tgrad!,\n tlmo_prob,\n x0,\n max_iteration=1000,\n traj_data=storage,\n line_search=FrankWolfe.Agnostic(),\n callback=callback,\n)\n\ntotal_iterations = storage[end][1]\n@show total_iterations\n@show tf.counter\n@show tgrad!.counter\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"EditURL = \"../../../examples/docs_2_polynomial_regression.jl\"","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_2_polynomial_regression/#Polynomial-Regression","page":"Polynomial Regression","title":"Polynomial Regression","text":"","category":"section"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"The following example features the LMO for polynomial regression on the ell_1 norm ball. Given input/output pairs x_iy_i_i=1^N and sparse coefficients c_j, where","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"y_i=sum_j=1^m c_j f_j(x_i)","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"and f_j mathbbR^ntomathbbR, the task is to recover those c_j that are non-zero alongside their corresponding values. Under certain assumptions, this problem can be convexified into","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"min_cinmathcalCy-Ac^2","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"for a convex set mathcalC. It can also be found as example 4.1 in the paper. In order to evaluate the polynomial, we generate a total of 1000 data points x_i_i=1^N from the standard multivariate Gaussian, with which we will compute the output variables y_i_i=1^N. Before evaluating the polynomial, these points will be contaminated with noise drawn from a standard multivariate Gaussian. We run the away_frank_wolfe and blended_conditional_gradient algorithms, and compare them to Projected Gradient Descent using a smoothness estimate. We will evaluate the output solution on test points drawn in a similar manner as the training points.","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"using FrankWolfe\n\nusing LinearAlgebra\nimport Random\n\nusing MultivariatePolynomials\nusing DynamicPolynomials\n\nusing Plots\n\nusing LaTeXStrings\n\nconst N = 10\n\nDynamicPolynomials.@polyvar X[1:15]\n\nconst max_degree = 4\ncoefficient_magnitude = 10\nnoise_magnitude = 1\n\nconst var_monomials = MultivariatePolynomials.monomials(X, 0:max_degree)\n\nRandom.seed!(42)\nconst all_coeffs = map(var_monomials) do m\n d = MultivariatePolynomials.degree(m)\n return coefficient_magnitude * rand() .* (rand() .> 0.95 * d / max_degree)\nend\n\nconst true_poly = dot(all_coeffs, var_monomials)\n\nconst training_data = map(1:500) do _\n x = 0.1 * randn(N)\n y = MultivariatePolynomials.subs(true_poly, Pair(X, x)) + noise_magnitude * randn()\n return (x, y.a[1])\nend\n\nconst extended_training_data = map(training_data) do (x, y)\n x_ext = MultivariatePolynomials.coefficient.(MultivariatePolynomials.subs.(var_monomials, X => x))\n return (x_ext, y)\nend\n\nconst test_data = map(1:1000) do _\n x = 0.4 * randn(N)\n y = MultivariatePolynomials.subs(true_poly, Pair(X, x)) + noise_magnitude * randn()\n return (x, y.a[1])\nend\n\nconst extended_test_data = map(test_data) do (x, y)\n x_ext = MultivariatePolynomials.coefficient.(MultivariatePolynomials.subs.(var_monomials, X => x))\n return (x_ext, y)\nend\n\nfunction f(coefficients)\n return 0.5 / length(extended_training_data) * sum(extended_training_data) do (x, y)\n return (dot(coefficients, x) - y)^2\n end\nend\n\nfunction f_test(coefficients)\n return 0.5 / length(extended_test_data) * sum(extended_test_data) do (x, y)\n return (dot(coefficients, x) - y)^2\n end\nend\n\nfunction coefficient_errors(coeffs)\n return 0.5 * sum(eachindex(all_coeffs)) do idx\n return (all_coeffs[idx] - coeffs[idx])^2\n end\nend\n\nfunction grad!(storage, coefficients)\n storage .= 0\n for (x, y) in extended_training_data\n p_i = dot(coefficients, x) - y\n @. storage += x * p_i\n end\n storage ./= length(training_data)\n return nothing\nend\n\nfunction build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(\n trajectory_arr,\n (FrankWolfe.callback_state(state)..., f_test(state.x), coefficient_errors(state.x)),\n )\n end\nend\n\ngradient = similar(all_coeffs)\n\nmax_iter = 10000\nrandom_initialization_vector = rand(length(all_coeffs))\n\nlmo = FrankWolfe.LpNormLMO{1}(0.95 * norm(all_coeffs, 1))\n\n# Estimating smoothness parameter\nnum_pairs = 1000\nL_estimate = -Inf\ngradient_aux = similar(gradient)\n\nfor i in 1:num_pairs # hide\n global L_estimate # hide\n x = compute_extreme_point(lmo, randn(size(all_coeffs))) # hide\n y = compute_extreme_point(lmo, randn(size(all_coeffs))) # hide\n grad!(gradient, x) # hide\n grad!(gradient_aux, y) # hide\n new_L = norm(gradient - gradient_aux) / norm(x - y) # hide\n if new_L > L_estimate # hide\n L_estimate = new_L # hide\n end # hide\nend # hide\n\nfunction projnorm1(x, τ)\n n = length(x)\n if norm(x, 1) ≤ τ\n return x\n end\n u = abs.(x)\n # simplex projection\n bget = false\n s_indices = sortperm(u, rev=true)\n tsum = zero(τ)\n\n @inbounds for i in 1:n-1\n tsum += u[s_indices[i]]\n tmax = (tsum - τ) / i\n if tmax ≥ u[s_indices[i+1]]\n bget = true\n break\n end\n end\n if !bget\n tmax = (tsum + u[s_indices[n]] - τ) / n\n end\n\n @inbounds for i in 1:n\n u[i] = max(u[i] - tmax, 0)\n u[i] *= sign(x[i])\n end\n return u\nend\nxgd = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector) # hide\ntraining_gd = Float64[] # hide\ntest_gd = Float64[] # hide\ncoeff_error = Float64[] # hide\ntime_start = time_ns() # hide\ngd_times = Float64[] # hide\nfor iter in 1:max_iter # hide\n global xgd # hide\n grad!(gradient, xgd) # hide\n xgd = projnorm1(xgd - gradient / L_estimate, lmo.right_hand_side) # hide\n push!(training_gd, f(xgd)) # hide\n push!(test_gd, f_test(xgd)) # hide\n push!(coeff_error, coefficient_errors(xgd)) # hide\n push!(gd_times, (time_ns() - time_start) * 1e-9) # hide\nend # hide\n\nx00 = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector) # hide\nx0 = deepcopy(x00) # hide\n\ntrajectory_lafw = [] # hide\ncallback = build_callback(trajectory_lafw) # hide\nx_lafw, v, primal, dual_gap, _ = FrankWolfe.away_frank_wolfe( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n lazy=true, # hide\n gradient=gradient, # hide\n callback=callback, # hide\n) # hide\n\ntrajectory_bcg = [] # hide\ncallback = build_callback(trajectory_bcg) # hide\nx0 = deepcopy(x00) # hide\nx_bcg, v, primal, dual_gap, _, _ = FrankWolfe.blended_conditional_gradient( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n weight_purge_threshold=1e-10, # hide\n callback=callback, # hide\n) # hide\nx0 = deepcopy(x00) # hide\ntrajectory_lafw_ref = [] # hide\ncallback = build_callback(trajectory_lafw_ref) # hide\n_, _, primal_ref, _, _ = FrankWolfe.away_frank_wolfe( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=2 * max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n lazy=true, # hide\n gradient=gradient, # hide\n callback=callback, # hide\n) # hide\n\n\nfor i in 1:num_pairs\n global L_estimate\n x = compute_extreme_point(lmo, randn(size(all_coeffs)))\n y = compute_extreme_point(lmo, randn(size(all_coeffs)))\n grad!(gradient, x)\n grad!(gradient_aux, y)\n new_L = norm(gradient - gradient_aux) / norm(x - y)\n if new_L > L_estimate\n L_estimate = new_L\n end\nend","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"We can now perform projected gradient descent:","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"xgd = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector)\ntraining_gd = Float64[]\ntest_gd = Float64[]\ncoeff_error = Float64[]\ntime_start = time_ns()\ngd_times = Float64[]\nfor iter in 1:max_iter\n global xgd\n grad!(gradient, xgd)\n xgd = projnorm1(xgd - gradient / L_estimate, lmo.right_hand_side)\n push!(training_gd, f(xgd))\n push!(test_gd, f_test(xgd))\n push!(coeff_error, coefficient_errors(xgd))\n push!(gd_times, (time_ns() - time_start) * 1e-9)\nend\n\nx00 = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector)\nx0 = deepcopy(x00)\n\ntrajectory_lafw = []\ncallback = build_callback(trajectory_lafw)\nx_lafw, v, primal, dual_gap, _ = FrankWolfe.away_frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n lazy=true,\n gradient=gradient,\n callback=callback,\n)\n\ntrajectory_bcg = []\ncallback = build_callback(trajectory_bcg)\n\nx0 = deepcopy(x00)\nx_bcg, v, primal, dual_gap, _, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n weight_purge_threshold=1e-10,\n callback=callback,\n)\n\nx0 = deepcopy(x00)\n\ntrajectory_lafw_ref = []\ncallback = build_callback(trajectory_lafw_ref)\n_, _, primal_ref, _, _ = FrankWolfe.away_frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=2 * max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n lazy=true,\n gradient=gradient,\n callback=callback,\n)\n\niteration_list = [\n [x[1] + 1 for x in trajectory_lafw],\n [x[1] + 1 for x in trajectory_bcg],\n collect(eachindex(training_gd)),\n]\ntime_list = [[x[5] for x in trajectory_lafw], [x[5] for x in trajectory_bcg], gd_times]\nprimal_list = [\n [x[2] - primal_ref for x in trajectory_lafw],\n [x[2] - primal_ref for x in trajectory_bcg],\n [x - primal_ref for x in training_gd],\n]\ntest_list = [[x[6] for x in trajectory_lafw], [x[6] for x in trajectory_bcg], test_gd]\nlabel = [L\"\\textrm{L-AFW}\", L\"\\textrm{BCG}\", L\"\\textrm{GD}\"]\ncoefficient_error_values =\n [[x[7] for x in trajectory_lafw], [x[7] for x in trajectory_bcg], coeff_error]\n\n\nplot_results(\n [primal_list, primal_list, test_list, test_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [L\"\\textrm{Iteration}\", L\"\\textrm{Time}\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [L\"\\textrm{Primal Gap}\", L\"\\textrm{Primal Gap}\", L\"\\textrm{Test loss}\", L\"\\textrm{Test loss}\"],\n xscalelog=[:log, :identity, :log, :identity],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"This page was generated using Literate.jl.","category":"page"}] +[{"location":"reference/3_backend/#Utilities-and-data-structures","page":"Utilities and data structures","title":"Utilities and data structures","text":"","category":"section"},{"location":"reference/3_backend/#Active-set","page":"Utilities and data structures","title":"Active set","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"active_set.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ActiveSet","page":"Utilities and data structures","title":"FrankWolfe.ActiveSet","text":"ActiveSet{AT, R, IT}\n\nRepresents an active set of extreme vertices collected in a FW algorithm, along with their coefficients (λ_i, a_i). R is the type of the λ_i, AT is the type of the atoms a_i. The iterate x = ∑λ_i a_i is stored in x with type IT.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Base.copy-Union{Tuple{FrankWolfe.ActiveSet{AT, R, IT}}, Tuple{IT}, Tuple{R}, Tuple{AT}} where {AT, R, IT}","page":"Utilities and data structures","title":"Base.copy","text":"Copies an active set, the weight and atom vectors and the iterate. Individual atoms are not copied.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_argmin-Tuple{FrankWolfe.ActiveSet, Any}","page":"Utilities and data structures","title":"FrankWolfe.active_set_argmin","text":"active_set_argmin(active_set::ActiveSet, direction)\n\nComputes the linear minimizer in the direction on the active set. Returns (λ_i, a_i, i)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_argminmax-Tuple{FrankWolfe.ActiveSet, Any}","page":"Utilities and data structures","title":"FrankWolfe.active_set_argminmax","text":"active_set_argminmax(active_set::ActiveSet, direction)\n\nComputes the linear minimizer in the direction on the active set. Returns (λ_min, a_min, i_min, val_min, λ_max, a_max, i_max, val_max, val_max-val_min ≥ Φ)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_initialize!-Union{Tuple{R}, Tuple{AT}, Tuple{FrankWolfe.ActiveSet{AT, R, IT} where IT, Any}} where {AT, R}","page":"Utilities and data structures","title":"FrankWolfe.active_set_initialize!","text":"active_set_initialize!(as, v)\n\nResets the active set structure to a single vertex v with unit weight.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_update!","page":"Utilities and data structures","title":"FrankWolfe.active_set_update!","text":"active_set_update!(active_set::ActiveSet, lambda, atom)\n\nAdds the atom to the active set with weight lambda or adds lambda to existing atom.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.active_set_update_iterate_pairwise!-Union{Tuple{A}, Tuple{IT}, Tuple{IT, Real, A, A}} where {IT, A}","page":"Utilities and data structures","title":"FrankWolfe.active_set_update_iterate_pairwise!","text":"active_set_update_iterate_pairwise!(x, lambda, fw_atom, away_atom)\n\nOperates x ← x + λ a_fw - λ a_aw.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.active_set_update_scale!-Union{Tuple{IT}, Tuple{IT, Any, Any}} where IT","page":"Utilities and data structures","title":"FrankWolfe.active_set_update_scale!","text":"active_set_update_scale!(x, lambda, atom)\n\nOperates x ← (1-λ) x + λ a.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.compute_active_set_iterate!-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.compute_active_set_iterate!","text":"compute_active_set_iterate!(active_set::ActiveSet) -> x\n\nRecomputes from scratch the iterate x from the current weights and vertices of the active set. Returns the iterate x.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.get_active_set_iterate-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.get_active_set_iterate","text":"get_active_set_iterate(active_set)\n\nReturn the current iterate corresponding. Does not recompute it.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Functions-and-gradients","page":"Utilities and data structures","title":"Functions and gradients","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"function_gradient.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ObjectiveFunction","page":"Utilities and data structures","title":"FrankWolfe.ObjectiveFunction","text":"ObjectiveFunction\n\nRepresents an objective function optimized by algorithms. Subtypes of ObjectiveFunction must implement at least\n\ncompute_value(::ObjectiveFunction, x) for primal value evaluation\ncompute_gradient(::ObjectiveFunction, x) for gradient evaluation.\n\nand optionally compute_value_gradient(::ObjectiveFunction, x) returning the (primal, gradient) pair. compute_gradient may always use the same storage and return a reference to it.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.SimpleFunctionObjective","page":"Utilities and data structures","title":"FrankWolfe.SimpleFunctionObjective","text":"SimpleFunctionObjective{F,G,S}\n\nAn objective function built from separate primal objective f(x) and in-place gradient function grad!(storage, x). It keeps an internal storage of type s used to evaluate the gradient in-place.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.StochasticObjective","page":"Utilities and data structures","title":"FrankWolfe.StochasticObjective","text":"StochasticObjective{F, G, XT, S}(f::F, grad!::G, xs::XT, storage::S)\n\nRepresents a composite function evaluated with stochastic gradient. f(θ, x) evaluates the loss for a single data point x and parameter θ. grad!(storage, θ, x) adds to storage the partial gradient with respect to data point x at parameter θ. xs must be an indexable iterable (Vector{Vector{Float64}} for instance). Functions using a StochasticObjective have optional keyword arguments rng, batch_size and full_evaluation controlling whether the function should be evaluated over all data points.\n\nNote: grad! must not reset the storage to 0 before adding to it.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.compute_gradient","page":"Utilities and data structures","title":"FrankWolfe.compute_gradient","text":"compute_gradient(f::ObjectiveFunction, x; [kwargs...])\n\nComputes the gradient of f at x. May return a reference to an internal storage.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.compute_value","page":"Utilities and data structures","title":"FrankWolfe.compute_value","text":"compute_value(f::ObjectiveFunction, x; [kwargs...])\n\nComputes the objective f at x.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.compute_value_gradient-Tuple{FrankWolfe.ObjectiveFunction, Any}","page":"Utilities and data structures","title":"FrankWolfe.compute_value_gradient","text":"compute_value_gradient(f::ObjectiveFunction, x; [kwargs...])\n\nComputes in one call the pair (value, gradient) evaluated at x. By default, calls compute_value and compute_gradient with keywords kwargs passed down to both.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Callbacks","page":"Utilities and data structures","title":"Callbacks","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.CallbackState","category":"page"},{"location":"reference/3_backend/#FrankWolfe.CallbackState","page":"Utilities and data structures","title":"FrankWolfe.CallbackState","text":"Main structure created before and passed to the callback in first position.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Custom-vertex-storage","page":"Utilities and data structures","title":"Custom vertex storage","text":"","category":"section"},{"location":"reference/3_backend/#Custom-extreme-point-types","page":"Utilities and data structures","title":"Custom extreme point types","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. They are presented below:","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.ScaledHotVector\nFrankWolfe.RankOneMatrix","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ScaledHotVector","page":"Utilities and data structures","title":"FrankWolfe.ScaledHotVector","text":"ScaledHotVector{T}\n\nRepresents a vector of at most one value different from 0.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.RankOneMatrix","page":"Utilities and data structures","title":"FrankWolfe.RankOneMatrix","text":"RankOneMatrix{T, UT, VT}\n\nRepresents a rank-one matrix R = u * vt'. Composes like a charm.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"types.jl\"]","category":"page"},{"location":"reference/3_backend/#Utils","page":"Utilities and data structures","title":"Utils","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Modules = [FrankWolfe]\nPages = [\"utils.jl\"]","category":"page"},{"location":"reference/3_backend/#FrankWolfe.ConstantBatchIterator","page":"Utilities and data structures","title":"FrankWolfe.ConstantBatchIterator","text":"ConstantBatchIterator(batch_size)\n\nBatch iterator always returning a constant batch size.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.ConstantMomentumIterator","page":"Utilities and data structures","title":"FrankWolfe.ConstantMomentumIterator","text":"ConstantMomentumIterator{T}\n\nIterator for momentum with a fixed damping value, always return the value and a dummy state.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.DeletedVertexStorage","page":"Utilities and data structures","title":"FrankWolfe.DeletedVertexStorage","text":"Vertex storage to store dropped vertices or find a suitable direction in lazy settings. The algorithm will look for at most return_kth suitable atoms before returning the best. See Extra-lazification with a vertex storage for usage.\n\nA vertex storage can be any type that implements two operations:\n\nBase.push!(storage, atom) to add an atom to the storage.\n\nNote that it is the storage type responsibility to ensure uniqueness of the atoms present.\n\nstorage_find_argmin_vertex(storage, direction, lazy_threshold) -> (found, vertex)\n\nreturning whether a vertex with sufficient progress was found and the vertex. It is up to the storage to remove vertices (or not) when they have been picked up.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.ExpMomentumIterator","page":"Utilities and data structures","title":"FrankWolfe.ExpMomentumIterator","text":"ExpMomentumIterator{T}\n\nIterator for the momentum used in the variant of Stochastic Frank-Wolfe. Momentum coefficients are the values of the iterator: ρ_t = 1 - num / (offset + t)^exp\n\nThe state corresponds to the iteration count.\n\nSource: Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization Aryan Mokhtari, Hamed Hassani, Amin Karbasi, JMLR 2020.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.IncrementBatchIterator","page":"Utilities and data structures","title":"FrankWolfe.IncrementBatchIterator","text":"IncrementBatchIterator(starting_batch_size, max_batch_size, [increment = 1])\n\nBatch size starting at startingbatchsize and incrementing by increment at every iteration.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe._unsafe_equal-Tuple{Array, Array}","page":"Utilities and data structures","title":"FrankWolfe._unsafe_equal","text":"_unsafe_equal(a, b)\n\nLike isequal on arrays but without the checks. Assumes a and b have the same axes.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.batchsize_iterate","page":"Utilities and data structures","title":"FrankWolfe.batchsize_iterate","text":"batchsize_iterate(iter::BatchSizeIterator) -> b\n\nMethod to implement for a batch size iterator of type BatchSizeIterator. Calling batchsize_iterate returns the next batch size and typically update the internal state of iter.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.momentum_iterate","page":"Utilities and data structures","title":"FrankWolfe.momentum_iterate","text":"momentum_iterate(iter::MomentumIterator) -> ρ\n\nMethod to implement for a type MomentumIterator. Returns the next momentum value ρ and updates the iterator internal state.\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Any, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"muladd_memory_mode(memory_mode::MemoryEmphasis, d, x, v)\n\nPerforms d = x - v in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Any, Real, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"(memory_mode::MemoryEmphasis, storage, x, gamma::Real, d)\n\nPerforms storage = x - gamma * d in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.muladd_memory_mode-Tuple{FrankWolfe.MemoryEmphasis, Any, Real, Any}","page":"Utilities and data structures","title":"FrankWolfe.muladd_memory_mode","text":"(memory_mode::MemoryEmphasis, x, gamma::Real, d)\n\nPerforms x = x - gamma * d in-place or not depending on MemoryEmphasis\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.storage_find_argmin_vertex-Tuple{FrankWolfe.DeletedVertexStorage, Any, Any}","page":"Utilities and data structures","title":"FrankWolfe.storage_find_argmin_vertex","text":"Give the vertex v in the storage that minimizes s = direction ⋅ v and whether s achieves s ≤ lazy_threshold.\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#FrankWolfe.trajectory_callback-Tuple{Any}","page":"Utilities and data structures","title":"FrankWolfe.trajectory_callback","text":"trajectory_callback(storage)\n\nCallback pushing the state at each iteration to the passed storage. The state data is only the 5 first fields, usually: (t,primal,dual,dual_gap,time)\n\n\n\n\n\n","category":"method"},{"location":"reference/3_backend/#Oracle-counting-trackers","page":"Utilities and data structures","title":"Oracle counting trackers","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"The following structures are wrapping given oracles to behave similarly but additionally track the number of calls.","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.TrackingObjective\nFrankWolfe.TrackingGradient\nFrankWolfe.TrackingLMO","category":"page"},{"location":"reference/3_backend/#FrankWolfe.TrackingObjective","page":"Utilities and data structures","title":"FrankWolfe.TrackingObjective","text":"A function acting like the normal objective f but tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.TrackingGradient","page":"Utilities and data structures","title":"FrankWolfe.TrackingGradient","text":"A function acting like the normal grad! but tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.TrackingLMO","page":"Utilities and data structures","title":"FrankWolfe.TrackingLMO","text":"TrackingLMO{LMO}(lmo)\n\nAn LMO wrapping another one and tracking the number of calls.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Also see the example \"Tracking number of calls to different oracles\".","category":"page"},{"location":"reference/3_backend/#Update-order-for-block-coordinate-methods","page":"Utilities and data structures","title":"Update order for block-coordinate methods","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Block-coordinate methods can be run with different update order. All update order are subtypes of FrankWolfe.BlockCoordinateUpdateOrder. They have to implement the method FrankWolfe.select_update_indices which select which blocks to update in what order.","category":"page"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"FrankWolfe.BlockCoordinateUpdateOrder\nFrankWolfe.select_update_indices\nFrankWolfe.FullUpdate\nFrankWolfe.CyclicUpdate\nFrankWolfe.StochasticUpdate","category":"page"},{"location":"reference/3_backend/#FrankWolfe.BlockCoordinateUpdateOrder","page":"Utilities and data structures","title":"FrankWolfe.BlockCoordinateUpdateOrder","text":"Update order for a block-coordinate method. A BlockCoordinateUpdateOrder must implement\n\nselect_update_indices(::BlockCoordinateUpdateOrder, l)\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.select_update_indices","page":"Utilities and data structures","title":"FrankWolfe.select_update_indices","text":"select_update_indices(::BlockCoordinateUpdateOrder, l)\n\nReturns a list of lists of the indices, where l is largest index i.e. the number of blocks. Each sublist represents one round of updates in an iteration. The indices in a list show which blocks should be updated parallely in one round. For example, a full update is given by [1:l] and a blockwise update by [[i] for i=1:l].\n\n\n\n\n\n","category":"function"},{"location":"reference/3_backend/#FrankWolfe.FullUpdate","page":"Utilities and data structures","title":"FrankWolfe.FullUpdate","text":"The full update initiates a parallel update of all blocks in one single round.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.CyclicUpdate","page":"Utilities and data structures","title":"FrankWolfe.CyclicUpdate","text":"The cyclic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is determined by the given order of the LMOs.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#FrankWolfe.StochasticUpdate","page":"Utilities and data structures","title":"FrankWolfe.StochasticUpdate","text":"The stochastic update initiates a sequence of update rounds. In each round only one block is updated. The order of the blocks is a random.\n\n\n\n\n\n","category":"type"},{"location":"reference/3_backend/#Index","page":"Utilities and data structures","title":"Index","text":"","category":"section"},{"location":"reference/3_backend/","page":"Utilities and data structures","title":"Utilities and data structures","text":"Pages = [\"3_backend.md\"]","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"EditURL = \"https://github.com/ZIB-IOL/FrankWolfe.jl/blob/master/CONTRIBUTING.md\"","category":"page"},{"location":"contributing/#Contributing-to-FrankWolfe","page":"Contributing","title":"Contributing to FrankWolfe","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"First, thanks for taking the time to contribute. Contributions in any form, such as documentation, bug fix, examples or algorithms, are appreciated and welcome.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"We list below some guidelines to help you contribute to the package.","category":"page"},{"location":"contributing/#Community-Standards","page":"Contributing","title":"Community Standards","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Interactions on this repository must follow the Julia Community Standards including Pull Requests and issues.","category":"page"},{"location":"contributing/#Where-can-I-get-an-overview","page":"Contributing","title":"Where can I get an overview","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Check out the paper presenting the package for a high-level overview of the feature and algorithms and the documentation for more details.","category":"page"},{"location":"contributing/#I-just-have-a-question","page":"Contributing","title":"I just have a question","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If your question is related to Julia, its syntax or tooling, the best places to get help will be tied to the Julia community, see the Julia community page for a number of communication channels.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"For now, the best way to ask a question is to reach out to Mathieu Besançon or Sebastian Pokutta. You can also ask your question on discourse.julialang.org in the optimization topic or on the Julia Slack on #mathematical-optimization, see the Julia community page to gain access.","category":"page"},{"location":"contributing/#How-can-I-file-an-issue","page":"Contributing","title":"How can I file an issue","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you found a bug or want to propose a feature, we track our issues within the GitHub repository. Once opened, you can edit the issue or add new comments to continue the conversation.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you encounter a bug, send the stack trace (the lines appearing after the error occurred containing some source files) and ideally a Minimal Working Example (MWE), a small program that reproduces the bug.","category":"page"},{"location":"contributing/#How-can-I-contribute","page":"Contributing","title":"How can I contribute","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Contributing to the repository will likely be made in a Pull Request (PR). You will need to:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Fork the repository\nClone it on your machine to perform the changes\nCreate a branch for your modifications, based on the branch you want to merge on (typically master)\nPush to this branch on your fork\nThe GitHub web interface will then automatically suggest opening a PR onto the original repository.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"See the GitHub guide to creating PRs for more help on workflows using Git and GitHub.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"A PR should do a single thing to reduce the amount of code that must be reviewed. Do not run the formatter on the whole repository except if your PR is specifically about formatting.","category":"page"},{"location":"contributing/#Improve-the-documentation","page":"Contributing","title":"Improve the documentation","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"The documentation can be improved by changing the files in docs/src, for example to add a section in the documentation, expand a paragraph or add a plot. The documentation attached to a given type of function can be modified in the source files directly, it appears above the thing you try to document with three double quotations mark like this:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"\"\"\"\nThis explains what the function `f` does, it supports markdown.\n\"\"\"\nfunction f(x)\n # ...\nend","category":"page"},{"location":"contributing/#Provide-a-new-example-or-test","page":"Contributing","title":"Provide a new example or test","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"If you fix a bug, one would typically expect to add a test that validates that the bug is gone. A test would be added in a file in the test/ folder, for which the entry point is runtests.jl.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"The examples/ folder features several examples covering different problem settings and algorithms. The examples are expected to run with the same environment and dependencies as the tests using TestEnv. If the example is lightweight enough, it can be added to the docs/src/examples/ folder which generates pages for the documentation based on Literate.jl.","category":"page"},{"location":"contributing/#Provide-a-new-feature","page":"Contributing","title":"Provide a new feature","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Contributions bringing new features are also welcome. If the feature is likely to impact performance, some benchmarks should be run with BenchmarkTools on several of the examples to assert the effect at different problem sizes. If the feature should only be active in some cases, a keyword should be added to the main algorithms to support it.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"Some typical features to implement are:","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"A new Linear Minimization Oracle (LMO)\nA new step size\nA new algorithm (less frequent) following the same API.","category":"page"},{"location":"contributing/#Code-style","page":"Contributing","title":"Code style","text":"","category":"section"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"We try to follow the Julia documentation guidelines. We run JuliaFormatter.jl on the repo in the way set in the .JuliaFormatter.toml file, which enforces a number of conventions.","category":"page"},{"location":"contributing/","page":"Contributing","title":"Contributing","text":"This contribution guide was inspired by ColPrac and the one in Manopt.jl.","category":"page"},{"location":"basics/#How-does-it-work?","page":"How does it work?","title":"How does it work?","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"FrankWolfe.jl contains generic routines to solve optimization problems of the form","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"min_x in mathcalC f(x)","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"where mathcalC is a compact convex set and f is a differentiable function. These routines work by solving a sequence of linear subproblems:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"min_x in mathcalC langle d_k x rangle quad textwhere quad d_k = nabla f(x_k)","category":"page"},{"location":"basics/#Linear-Minimization-Oracles","page":"How does it work?","title":"Linear Minimization Oracles","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction d, it returns an optimal vertex of the feasible set:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"v in arg min_xin mathcalC langle dx rangle","category":"page"},{"location":"basics/#Custom-LMOs","page":"How does it work?","title":"Custom LMOs","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle and implement the following method:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"compute_extreme_point(lmo::LMO, direction; kwargs...) -> v","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"This method should minimize v mapsto langle d v rangle over the set mathcalC defined by the LMO. Note that this means the set mathcalC doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.","category":"page"},{"location":"basics/#Pre-defined-LMOs","page":"How does it work?","title":"Pre-defined LMOs","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"If you don't want to define your LMO manually, several common implementations are available out-of-the-box:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Simplices: unit simplex, probability simplex\nBalls in various norms\nPolytopes: K-sparse, Birkhoff","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"You can use an oracle defined via a Linear Programming solver (e.g. SCIP or HiGHS) with MathOptInferface: see FrankWolfe.MathOptLMO.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Finally, we provide wrappers to combine oracles easily, for example in a product.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.","category":"page"},{"location":"basics/#Optimization-algorithms","page":"How does it work?","title":"Optimization algorithms","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The package features several variants of Frank-Wolfe that share the same basic API.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).","category":"page"},{"location":"basics/#Standard-Frank-Wolfe-(FW)","page":"How does it work?","title":"Standard Frank-Wolfe (FW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Jaggi (2013) for an overview.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex() in the second case).","category":"page"},{"location":"basics/#Away-step-Frank-Wolfe-(AFW)","page":"How does it work?","title":"Away-step Frank-Wolfe (AFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the away_frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Lacoste-Julien, Jaggi (2015) for an overview.","category":"page"},{"location":"basics/#Stochastic-Frank-Wolfe-(SFW)","page":"How does it work?","title":"Stochastic Frank-Wolfe (SFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.stochastic_frank_wolfe function.","category":"page"},{"location":"basics/#Blended-Conditional-Gradients-(BCG)","page":"How does it work?","title":"Blended Conditional Gradients (BCG)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the blended_conditional_gradient function, with a built-in stability feature that temporarily increases accuracy.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Braun, Pokutta, Tu, Wright (2018).","category":"page"},{"location":"basics/#Blended-Pairwise-Conditional-Gradients-(BPCG)","page":"How does it work?","title":"Blended Pairwise Conditional Gradients (BPCG)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient function, with a minor modification to improve sparsity.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Tsuji, Tanaka, Pokutta (2021)","category":"page"},{"location":"basics/#Comparison","page":"How does it work?","title":"Comparison","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"The following table compares the characteristics of the algorithms presented in the package:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Algorithm Progress/Iteration Time/Iteration Sparsity Numerical Stability Active Set Lazifiable\nFW Low Low Low High No Yes\nAFW Medium Medium-High Medium Medium-High Yes Yes\nB(P)CG High Medium-High High Medium Yes By design\nSFW Low Low Low High No No","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set mathcalC, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"(Image: FW vs AFW).","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.","category":"page"},{"location":"basics/#Block-Coordinate-Frank-Wolfe-(BCFW)","page":"How does it work?","title":"Block-Coordinate Frank-Wolfe (BCFW)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.block_coordinate_frank_wolfe function.","category":"page"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.","category":"page"},{"location":"basics/#Alternating-Linear-Minimization-(ALM)","page":"How does it work?","title":"Alternating Linear Minimization (ALM)","text":"","category":"section"},{"location":"basics/","page":"How does it work?","title":"How does it work?","text":"It is implemented in the FrankWolfe.alternating_linear_minimization function.","category":"page"},{"location":"reference/2_lmo/#Linear-Minimization-Oracles","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"The Linear Minimization Oracle (LMO) is a key component called at each iteration of the FW algorithm. Given din mathcalX, it returns a vertex of the feasible set:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"vin argmin_xin mathcalC langle dx rangle","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"See Combettes, Pokutta 2021 for references on most LMOs implemented in the package and their comparison with projection operators.","category":"page"},{"location":"reference/2_lmo/#Interface-and-wrappers","page":"Linear Minimization Oracles","title":"Interface and wrappers","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"FrankWolfe.LinearMinimizationOracle","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.LinearMinimizationOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.LinearMinimizationOracle","text":"Supertype for linear minimization oracles.\n\nAll LMOs must implement compute_extreme_point(lmo::LMO, direction) and return a vector v of the appropriate type.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"All of them are subtypes of FrankWolfe.LinearMinimizationOracle and implement the following method:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"compute_extreme_point","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"compute_extreme_point(lmo::LinearMinimizationOracle, direction; kwargs...)\n\nComputes the point argmin_{v ∈ C} v ⋅ direction with C the set represented by the LMO. Most LMOs feature v as a keyword argument that allows for an in-place computation whenever v is dense. All LMOs should accept keyword arguments that they can ignore.\n\n\n\n\n\n","category":"function"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"We also provide some meta-LMOs wrapping another one with extended behavior:","category":"page"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"FrankWolfe.CachedLinearMinimizationOracle\nFrankWolfe.ProductLMO\nFrankWolfe.SingleLastCachedLMO\nFrankWolfe.MultiCacheLMO\nFrankWolfe.VectorCacheLMO","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.CachedLinearMinimizationOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.CachedLinearMinimizationOracle","text":"CachedLinearMinimizationOracle{LMO}\n\nOracle wrapping another one of type lmo. Subtypes of CachedLinearMinimizationOracle contain a cache of previous solutions.\n\nBy convention, the inner oracle is named inner. Cached optimizers are expected to implement Base.empty! and Base.length.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ProductLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.ProductLMO","text":"ProductLMO(lmos)\n\nLinear minimization oracle over the Cartesian product of multiple LMOs.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.SingleLastCachedLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.SingleLastCachedLMO","text":"SingleLastCachedLMO{LMO, VT}\n\nCaches only the last result from an LMO and stores it in last_vertex. Vertices of LMO have to be of type VT if provided.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.MultiCacheLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.MultiCacheLMO","text":"MultiCacheLMO{N, LMO, A}\n\nCache for a LMO storing up to N vertices in the cache, removed in FIFO style. oldest_idx keeps track of the oldest index in the tuple, i.e. to replace next. VT, if provided, must be the type of vertices returned by LMO\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.VectorCacheLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.VectorCacheLMO","text":"VectorCacheLMO{LMO, VT}\n\nCache for a LMO storing an unbounded number of vertices of type VT in the cache. VT, if provided, must be the type of vertices returned by LMO\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#Norm-balls","page":"Linear Minimization Oracles","title":"Norm balls","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"norm_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.EllipsoidLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.EllipsoidLMO","text":"EllipsoidLMO(A, c, r)\n\nLinear minimization over an ellipsoid centered at c of radius r:\n\nx: (x - c)^T A (x - c) ≤ r\n\nThe LMO stores the factorization F of A that is used to solve linear systems A⁻¹ x. The result of the linear system solve is stored in buffer. The ellipsoid is assumed to be full-dimensional -> A is positive definite.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.KNormBallLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.KNormBallLMO","text":"KNormBallLMO{T}(K::Int, right_hand_side::T)\n\nLMO with feasible set being the K-norm ball in the sense of 2010.07243, i.e., the convex hull over the union of an L1-ball with radius τ and an L∞-ball with radius τ/K:\n\nC_{K,τ} = conv { B_1(τ) ∪ B_∞(τ / K) }\n\nwith τ the right_hand_side parameter. The K-norm is defined as the sum of the largest K absolute entries in a vector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.LpNormLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.LpNormLMO","text":"LpNormLMO{T, p}(right_hand_side)\n\nLMO with feasible set being an L-p norm ball:\n\nC = {x ∈ R^n, norm(x, p) ≤ right_hand_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.NuclearNormLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.NuclearNormLMO","text":"NuclearNormLMO{T}(radius)\n\nLMO over matrices that have a nuclear norm less than radius. The LMO returns the best rank-one approximation matrix with singular value radius, computed with Arpack.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.SpectraplexLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.SpectraplexLMO","text":"SpectraplexLMO{T,M}(radius::T,gradient_container::M,ensure_symmetry::Bool=true)\n\nFeasible set\n\n{X ∈ 𝕊_n^+, trace(X) == radius}\n\ngradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.UnitSpectrahedronLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.UnitSpectrahedronLMO","text":"UnitSpectrahedronLMO{T,M}(radius::T, gradient_container::M)\n\nFeasible set of PSD matrices with bounded trace:\n\n{X ∈ 𝕊_n^+, trace(X) ≤ radius}\n\ngradient_container is used to store the symmetrized negative direction. ensure_symmetry indicates whether the linear function is made symmetric before computing the eigenvector.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#Simplex","page":"Linear Minimization Oracles","title":"Simplex","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"simplex_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.ProbabilitySimplexOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.ProbabilitySimplexOracle","text":"ProbabilitySimplexOracle(right_side)\n\nRepresents the scaled probability simplex:\n\nC = {x ∈ R^n_+, ∑x = right_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.UnitSimplexOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.UnitSimplexOracle","text":"UnitSimplexOracle(right_side)\n\nRepresents the scaled unit simplex:\n\nC = {x ∈ R^n_+, ∑x ≤ right_side}\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.compute_dual_solution-Union{Tuple{T}, Tuple{FrankWolfe.ProbabilitySimplexOracle{T}, Any, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_dual_solution","text":"Dual costs for a given primal solution to form a primal dual pair for scaled probability simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_dual_solution-Union{Tuple{T}, Tuple{FrankWolfe.UnitSimplexOracle{T}, Any, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_dual_solution","text":"Dual costs for a given primal solution to form a primal dual pair for scaled unit simplex. Returns two vectors. The first one is the dual costs associated with the constraints and the second is the reduced costs for the variables.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point-Union{Tuple{T}, Tuple{FrankWolfe.ProbabilitySimplexOracle{T}, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"LMO for scaled probability simplex. Returns a vector with one active value equal to RHS in the most improving (or least degrading) direction.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#FrankWolfe.compute_extreme_point-Union{Tuple{T}, Tuple{FrankWolfe.UnitSimplexOracle{T}, Any}} where T","page":"Linear Minimization Oracles","title":"FrankWolfe.compute_extreme_point","text":"LMO for scaled unit simplex: ∑ x_i = τ Returns either vector of zeros or vector with one active value equal to RHS if there exists an improving direction.\n\n\n\n\n\n","category":"method"},{"location":"reference/2_lmo/#Polytope","page":"Linear Minimization Oracles","title":"Polytope","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"polytope_oracles.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.BirkhoffPolytopeLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.BirkhoffPolytopeLMO","text":"BirkhoffPolytopeLMO\n\nThe Birkhoff polytope encodes doubly stochastic matrices. Its extreme vertices are all permutation matrices of side-dimension dimension.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ConvexHullOracle","page":"Linear Minimization Oracles","title":"FrankWolfe.ConvexHullOracle","text":"ConvexHullOracle{AT,VT}\n\nConvex hull of a finite number of vertices of type AT, stored in a vector of type VT.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.KSparseLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.KSparseLMO","text":"KSparseLMO{T}(K::Int, right_hand_side::T)\n\nLMO for the K-sparse polytope:\n\nC = B_1(τK) ∩ B_∞(τ)\n\nwith τ the right_hand_side parameter. The LMO results in a vector with the K largest absolute values of direction, taking values -τ sign(x_i).\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ScaledBoundL1NormBall","page":"Linear Minimization Oracles","title":"FrankWolfe.ScaledBoundL1NormBall","text":"ScaledBoundL1NormBall(lower_bounds, upper_bounds)\n\nPolytope similar to a L1-ball with shifted bounds. It is the convex hull of two scaled and shifted unit vectors for each axis (shifted to the center of the polytope, i.e., the elementwise midpoint of the bounds). Lower and upper bounds are passed on as abstract vectors, possibly of different types. For the standard L1-ball, all lower and upper bounds would be -1 and 1.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.ScaledBoundLInfNormBall","page":"Linear Minimization Oracles","title":"FrankWolfe.ScaledBoundLInfNormBall","text":"ScaledBoundLInfNormBall(lower_bounds, upper_bounds)\n\nPolytope similar to a L-inf-ball with shifted bounds or general box constraints. Lower- and upper-bounds are passed on as abstract vectors, possibly of different types. For the standard L-inf ball, all lower- and upper-bounds would be -1 and 1.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#MathOptInterface","page":"Linear Minimization Oracles","title":"MathOptInterface","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Modules = [FrankWolfe]\nPages = [\"moi_oracle.jl\"]","category":"page"},{"location":"reference/2_lmo/#FrankWolfe.MathOptLMO","page":"Linear Minimization Oracles","title":"FrankWolfe.MathOptLMO","text":"MathOptLMO{OT <: MOI.Optimizer} <: LinearMinimizationOracle\n\nLinear minimization oracle with feasible space defined through a MathOptInterface.Optimizer. The oracle call sets the direction and reruns the optimizer.\n\nThe direction vector has to be set in the same order of variables as the MOI.ListOfVariableIndices() getter.\n\nThe Boolean use_modify determines if the objective incompute_extreme_point is updated with MOI.modify(o, ::MOI.ObjectiveFunction, ::MOI.ScalarCoefficientChange) or with MOI.set(o, ::MOI.ObjectiveFunction, f). use_modify = true decreases the runtime and memory allocation for models created as an optimizer object and defined directly with MathOptInterface. use_modify = false should be used for CachingOptimizers.\n\n\n\n\n\n","category":"type"},{"location":"reference/2_lmo/#FrankWolfe.convert_mathopt","page":"Linear Minimization Oracles","title":"FrankWolfe.convert_mathopt","text":"convert_mathopt(lmo::LMO, optimizer::OT; kwargs...) -> MathOptLMO{OT}\n\nConverts the given LMO to its equivalent MathOptInterface representation using optimizer. Must be implemented by LMOs.\n\n\n\n\n\n","category":"function"},{"location":"reference/2_lmo/#Index","page":"Linear Minimization Oracles","title":"Index","text":"","category":"section"},{"location":"reference/2_lmo/","page":"Linear Minimization Oracles","title":"Linear Minimization Oracles","text":"Pages = [\"1_lmo.md\"]","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"EditURL = \"../../../examples/docs_1_mathopt_lmo.jl\"","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/#Comparison-with-MathOptInterface-on-a-Probability-Simplex","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"","category":"section"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"In this example, we project a random point onto a probability simplex with the Frank-Wolfe algorithm using either the specialized LMO defined in the package or a generic LP formulation using MathOptInterface.jl (MOI) and GLPK as underlying LP solver. It can be found as Example 4.4 in the paper.","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"using FrankWolfe\n\nusing LinearAlgebra\nusing LaTeXStrings\n\nusing Plots\n\nusing JuMP\nconst MOI = JuMP.MOI\n\nimport GLPK\n\nn = Int(1e3)\nk = 10000\n\nxpi = rand(n);\ntotal = sum(xpi);\nconst xp = xpi ./ total;\n\nf(x) = norm(x - xp)^2\nfunction grad!(storage, x)\n @. storage = 2 * (x - xp)\n return nothing\nend\n\nlmo_radius = 2.5\nlmo = FrankWolfe.FrankWolfe.ProbabilitySimplexOracle(lmo_radius)\n\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\ngradient = collect(x00)\n\nx_lmo, v, primal, dual_gap, trajectory_lmo = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"Create a MathOptInterface Optimizer and build the same linear constraints:","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"o = GLPK.Optimizer()\nx = MOI.add_variables(o, n)\n\nfor xi in x\n MOI.add_constraint(o, xi, MOI.GreaterThan(0.0))\nend\n\nMOI.add_constraint(\n o,\n MOI.ScalarAffineFunction(MOI.ScalarAffineTerm.(1.0, x), 0.0),\n MOI.EqualTo(lmo_radius),\n)\n\nlmo_moi = FrankWolfe.MathOptLMO(o)\n\nx, v, primal, dual_gap, trajectory_moi = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_moi,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"Alternatively, we can use one of the modelling interfaces based on MOI to formulate the LP. The following example builds the same set of constraints using JuMP:","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"m = JuMP.Model(GLPK.Optimizer)\n@variable(m, y[1:n] ≥ 0)\n\n@constraint(m, sum(y) == lmo_radius)\n\nlmo_jump = FrankWolfe.MathOptLMO(m.moi_backend)\n\nx, v, primal, dual_gap, trajectory_jump = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_jump,\n collect(copy(x00)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\n\nx_lmo, v, primal, dual_gap, trajectory_lmo_blas = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x00,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\n\nx, v, primal, dual_gap, trajectory_jump_blas = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_jump,\n x00,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=k / 10,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n verbose=false,\n trajectory=true,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"We can now plot the results","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"iteration_list = [[x[1] + 1 for x in trajectory_lmo], [x[1] + 1 for x in trajectory_moi]]\ntime_list = [[x[5] for x in trajectory_lmo], [x[5] for x in trajectory_moi]]\nprimal_gap_list = [[x[2] for x in trajectory_lmo], [x[2] for x in trajectory_moi]]\ndual_gap_list = [[x[4] for x in trajectory_lmo], [x[4] for x in trajectory_moi]]\n\nlabel = [L\"\\textrm{Closed-form LMO}\", L\"\\textrm{MOI LMO}\"]\n\nplot_results(\n [primal_gap_list, primal_gap_list, dual_gap_list, dual_gap_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [\"\", \"\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [L\"\\textrm{Primal Gap}\", \"\", L\"\\textrm{Dual Gap}\", \"\"],\n xscalelog=[:log, :identity, :log, :identity],\n yscalelog=[:log, :log, :log, :log],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"","category":"page"},{"location":"examples/docs_1_mathopt_lmo/","page":"Comparison with MathOptInterface on a Probability Simplex","title":"Comparison with MathOptInterface on a Probability Simplex","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"EditURL = \"../../../examples/docs_6_spectrahedron.jl\"","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Spectrahedron","page":"Spectrahedron","title":"Spectrahedron","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"This example shows an optimization problem over the spectraplex:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"S = X in mathbbS_+^n Tr(X) = 1","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"with mathbbS_+^n the set of positive semidefinite matrices. Linear optimization with symmetric objective D over the spetraplex consists in computing the leading eigenvector of D.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"The package also exposes UnitSpectrahedronLMO which corresponds to the feasible set:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"S_u = X in mathbbS_+^n Tr(X) leq 1","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"using FrankWolfe\nusing LinearAlgebra\nusing Random\nusing SparseArrays","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"The objective function will be the symmetric squared distance to a set of known or observed entries Y_ij of the matrix.","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"f(X) = sum_(ij) in L 12 (X_ij - Y_ij)^2","category":"page"},{"location":"examples/docs_6_spectrahedron/#Setting-up-the-input-data,-objective,-and-gradient","page":"Spectrahedron","title":"Setting up the input data, objective, and gradient","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Dimension, number of iterations and number of known entries:","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"n = 1500\nk = 5000\nn_entries = 1000\n\nRandom.seed!(41)\n\nconst entry_indices = unique!([minmax(rand(1:n, 2)...) for _ in 1:n_entries])\nconst entry_values = randn(length(entry_indices))\n\nfunction f(X)\n r = zero(eltype(X))\n for (idx, (i, j)) in enumerate(entry_indices)\n r += 1 / 2 * (X[i, j] - entry_values[idx])^2\n r += 1 / 2 * (X[j, i] - entry_values[idx])^2\n end\n return r / length(entry_values)\nend\n\nfunction grad!(storage, X)\n storage .= 0\n for (idx, (i, j)) in enumerate(entry_indices)\n storage[i, j] += (X[i, j] - entry_values[idx])\n storage[j, i] += (X[j, i] - entry_values[idx])\n end\n return storage ./= length(entry_values)\nend","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Note that the ensure_symmetry = false argument to SpectraplexLMO. It skips an additional step making the used direction symmetric. It is not necessary when the gradient is a LinearAlgebra.Symmetric (or more rarely a LinearAlgebra.Diagonal or LinearAlgebra.UniformScaling).","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"const lmo = FrankWolfe.SpectraplexLMO(1.0, n, false)\nconst x0 = FrankWolfe.compute_extreme_point(lmo, spzeros(n, n))\n\ntarget_tolerance = 1e-8;\nnothing #hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Running-standard-and-lazified-Frank-Wolfe","page":"Spectrahedron","title":"Running standard and lazified Frank-Wolfe","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"Xfinal, Vfinal, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.MonotonicStepSize(),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n)\n\nXfinal, Vfinal, primal, dual_gap, trajectory_lazy = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.MonotonicStepSize(),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_6_spectrahedron/#Plotting-the-resulting-trajectories","page":"Spectrahedron","title":"Plotting the resulting trajectories","text":"","category":"section"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"data = [trajectory, trajectory_lazy]\nlabel = [\"FW\", \"LCG\"]\nplot_trajectories(data, label, xscalelog=true)","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"","category":"page"},{"location":"examples/docs_6_spectrahedron/","page":"Spectrahedron","title":"Spectrahedron","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"EditURL = \"../../../examples/docs_9_extra_vertex_storage.jl\"","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/#Extra-lazification","page":"Extra-lazification","title":"Extra-lazification","text":"","category":"section"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"Sometimes the Frank-Wolfe algorithm will be run multiple times with slightly different settings under which vertices collected in a previous run are still valid.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The extra-lazification feature can be used for this purpose. It consists of a storage that can collect dropped vertices during a run, and the ability to use these vertices in another run, when they are not part of the current active set. The vertices that are part of the active set do not need to be duplicated in the extra-lazification storage. The extra-vertices can be used instead of calling the LMO when it is a relatively expensive operation.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"using FrankWolfe\nusing Test\nusing LinearAlgebra","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"We will use a parameterized objective function 12 x - c^2 over the unit simplex.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"const n = 100\nconst center0 = 5.0 .+ 3 * rand(n)\nf(x) = 0.5 * norm(x .- center0)^2\nfunction grad!(storage, x)\n return storage .= x .- center0\nend","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The TrackingLMO will let us count how many real calls to the LMO are performed by a single run of the algorithm.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"lmo = FrankWolfe.UnitSimplexOracle(4.3)\ntlmo = FrankWolfe.TrackingLMO(lmo)\nx0 = FrankWolfe.compute_extreme_point(lmo, randn(n));\nnothing #hide","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/#Adding-a-vertex-storage","page":"Extra-lazification","title":"Adding a vertex storage","text":"","category":"section"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"FrankWolfe offers a simple FrankWolfe.DeletedVertexStorage storage type which has as parameter return_kth, the number of good directions to find before returning the best. return_kth larger than the number of vertices means that the best-aligned vertex will be found. return_kth = 1 means the first acceptable vertex (with the specified threhsold) is returned.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"See FrankWolfe.DeletedVertexStorage","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"vertex_storage = FrankWolfe.DeletedVertexStorage(typeof(x0)[], 5)\ntlmo.counter = 0\n\nresults = FrankWolfe.blended_pairwise_conditional_gradient(\n f,\n grad!,\n tlmo,\n x0,\n max_iteration=4000,\n verbose=true,\n lazy=true,\n epsilon=1e-5,\n add_dropped_vertices=true,\n extra_vertex_storage=vertex_storage,\n)","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"The counter indicates the number of initial calls to the LMO. We will now construct different objective functions based on new centers, call the BPCG algorithm while accumulating vertices in the storage, in addition to warm-starting with the active set of the previous iteration. This allows for a \"double-warmstarted\" algorithm, reducing the number of LMO calls from one problem to the next.","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"active_set = results[end]\ntlmo.counter\n\nfor iter in 1:10\n center = 5.0 .+ 3 * rand(n)\n f_i(x) = 0.5 * norm(x .- center)^2\n function grad_i!(storage, x)\n return storage .= x .- center\n end\n tlmo.counter = 0\n FrankWolfe.blended_pairwise_conditional_gradient(\n f_i,\n grad_i!,\n tlmo,\n active_set,\n max_iteration=4000,\n lazy=true,\n epsilon=1e-5,\n add_dropped_vertices=true,\n use_extra_vertex_storage=true,\n extra_vertex_storage=vertex_storage,\n verbose=false,\n )\n @info \"Number of LMO calls in iter $iter: $(tlmo.counter)\"\n @info \"Vertex storage size: $(length(vertex_storage.storage))\"\nend","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"","category":"page"},{"location":"examples/docs_9_extra_vertex_storage/","page":"Extra-lazification","title":"Extra-lazification","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"EditURL = \"../../../examples/docs_7_shifted_norm_polytopes.jl\"","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide\nusing FrankWolfe\nusing LinearAlgebra\nusing LaTeXStrings\nusing Plots","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/#FrankWolfe-for-scaled,-shifted-\\ell1-and-\\ell{\\infty}-norm-balls","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"","category":"section"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"In this example, we run the vanilla FrankWolfe algorithm on a scaled and shifted ell^1 and ell^infty norm ball, using the ScaledBoundL1NormBall and ScaledBoundLInfNormBall LMOs. We shift both onto the point (10) and then scale them by a factor of 2 along the x-axis. We project the point (21) onto the polytopes.","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"n = 2\n\nk = 1000\n\nxp = [2.0, 1.0]\n\nf(x) = norm(x - xp)^2\n\nfunction grad!(storage, x)\n @. storage = 2 * (x - xp)\n return nothing\nend\n\nlower = [-1.0, -1.0]\nupper = [3.0, 1.0]\n\nl1 = FrankWolfe.ScaledBoundL1NormBall(lower, upper)\n\nlinf = FrankWolfe.ScaledBoundLInfNormBall(lower, upper)\n\nx1 = FrankWolfe.compute_extreme_point(l1, zeros(n))\ngradient = collect(x1)\n\nx_l1, v_1, primal_1, dual_gap_1, trajectory_1 = FrankWolfe.frank_wolfe(\n f,\n grad!,\n l1,\n collect(copy(x1)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=50,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n);\n\nprintln(\"\\nFinal solution: \", x_l1)\n\nx2 = FrankWolfe.compute_extreme_point(linf, zeros(n))\ngradient = collect(x2)\n\nx_linf, v_2, primal_2, dual_gap_2, trajectory_2 = FrankWolfe.frank_wolfe(\n f,\n grad!,\n linf,\n collect(copy(x2)),\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2.0),\n print_iter=50,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n);\n\nprintln(\"\\nFinal solution: \", x_linf)","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"We plot the polytopes alongside the solutions from above:","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"xcoord1 = [1, 3, 1, -1, 1]\nycoord1 = [-1, 0, 1, 0, -1]\n\nxcoord2 = [3, 3, -1, -1, 3]\nycoord2 = [-1, 1, 1, -1, -1]\n\nplot(\n xcoord1,\n ycoord1,\n title=\"Visualization of scaled shifted norm balls\",\n lw=2,\n label=L\"\\ell^1 \\textrm{ norm}\",\n)\nplot!(xcoord2, ycoord2, lw=2, label=L\"\\ell^{\\infty} \\textrm{ norm}\")\nplot!(\n [x_l1[1]],\n [x_l1[2]],\n seriestype=:scatter,\n lw=5,\n color=\"blue\",\n label=L\"\\ell^1 \\textrm{ solution}\",\n)\nplot!(\n [x_linf[1]],\n [x_linf[2]],\n seriestype=:scatter,\n lw=5,\n color=\"orange\",\n label=L\"\\ell^{\\infty} \\textrm{ solution}\",\n legend=:bottomleft,\n)","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"","category":"page"},{"location":"examples/docs_7_shifted_norm_polytopes/","page":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","title":"FrankWolfe for scaled, shifted ell^1 and ell^infty norm balls","text":"This page was generated using Literate.jl.","category":"page"},{"location":"reference/4_linesearch/#Line-search-and-step-size-settings","page":"Line search and step size settings","title":"Line search and step size settings","text":"","category":"section"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"The step size dictates how far one traverses along a local descent direction. More specifically, the step size gamma_t is used at each iteration to determine how much the next iterate moves towards the new vertex: ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"x_t+1 = x_t - gamma_t (x_t - v_t)","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"gamma_t = 1 implies that the next iterate is exactly the vertex, a zero gamma_t implies that the iterate is not moving. ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series gamma_t that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive. ","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.","category":"page"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"FrankWolfe.LineSearchMethod\nFrankWolfe.perform_line_search","category":"page"},{"location":"reference/4_linesearch/#FrankWolfe.LineSearchMethod","page":"Line search and step size settings","title":"FrankWolfe.LineSearchMethod","text":"Line search method to apply once the direction is computed. A LineSearchMethod must implement\n\nperform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)\n\nwith d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.perform_line_search","page":"Line search and step size settings","title":"FrankWolfe.perform_line_search","text":"perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)\n\nReturns the step size gamma for step size strategy ls.\n\n\n\n\n\n","category":"function"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"Modules = [FrankWolfe]\nPages = [\"linesearch.jl\"]","category":"page"},{"location":"reference/4_linesearch/#FrankWolfe.Adaptive","page":"Line search and step size settings","title":"FrankWolfe.Adaptive","text":"Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)\n\n f(x_t + gamma_t (x_t - v_t)) - f(x_t) leq - alpha gamma_t langle nabla f(x_t) x_t - v_t rangle + alpha^2 fracgamma_t^2 x_t - v_t^22 M \n\nThe parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition, \n\n langle nabla f(x_t + gamma_t (x_t - v_t) ) - nabla f(x_t) x_t - v_t rangle leq gamma_t M x_t - v_t^2 \n\nThis condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.\n\nIt is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Agnostic","page":"Line search and step size settings","title":"FrankWolfe.Agnostic","text":"Computes step size: l/(l + t) at iteration t, given l > 0.\n\nUsing l ≥ 4 is advised only for strongly convex sets, see:\n\nAcceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, 2023.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Backtracking","page":"Line search and step size settings","title":"FrankWolfe.Backtracking","text":"Backtracking(limit_num_steps, tol, tau)\n\nBacktracking line search strategy, see Pedregosa, Negiar, Askari, Jaggi (2018).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.FixedStep","page":"Line search and step size settings","title":"FrankWolfe.FixedStep","text":"Fixed step size strategy. The step size can still be truncated by the gamma_max argument.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Goldenratio","page":"Line search and step size settings","title":"FrankWolfe.Goldenratio","text":"Goldenratio\n\nSimple golden-ratio based line search Golden Section Search, based on Combettes, Pokutta (2020) code and adapted.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.MonotonicNonConvexStepSize","page":"Line search and step size settings","title":"FrankWolfe.MonotonicNonConvexStepSize","text":"MonotonicNonConvexStepSize{F}\n\nRepresents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.MonotonicStepSize","page":"Line search and step size settings","title":"FrankWolfe.MonotonicStepSize","text":"MonotonicStepSize{F}\n\nRepresents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Nonconvex","page":"Line search and step size settings","title":"FrankWolfe.Nonconvex","text":"Computes a step size for nonconvex functions: 1/sqrt(t + 1).\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/#FrankWolfe.Shortstep","page":"Line search and step size settings","title":"FrankWolfe.Shortstep","text":"Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.\n\n\n\n\n\n","category":"type"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.","category":"page"},{"location":"reference/4_linesearch/#Index","page":"Line search and step size settings","title":"Index","text":"","category":"section"},{"location":"reference/4_linesearch/","page":"Line search and step size settings","title":"Line search and step size settings","text":"Pages = [\"4_linesearch.md\"]","category":"page"},{"location":"advanced/#Advanced-features","page":"Advanced features","title":"Advanced features","text":"","category":"section"},{"location":"advanced/#Multi-precision","page":"Advanced features","title":"Multi-precision","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"All algorithms can run in various precisions modes: Float16, Float32, Float64, BigFloat and also for rationals based on various integer types Int32, Int64, BigInt (see e.g., the approximate Carathéodory example)","category":"page"},{"location":"advanced/#Step-size-computation","page":"Advanced features","title":"Step size computation","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For all Frank-Wolfe algorithms, a step size must be determined to move from the current iterate to the next one. This step size can be determined by exact line search or any other rule represented by a subtype of FrankWolfe.LineSearchMethod, which must implement FrankWolfe.line_search_wrapper.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Multiple line search and step size determination rules are already available. See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size and Carderera, Besançon, Pokutta (2021) for the monotonic step size.","category":"page"},{"location":"advanced/#Callbacks","page":"Advanced features","title":"Callbacks","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"All top-level algorithms can take an optional callback argument, which must be a function taking a FrankWolfe.CallbackState struct and additional arguments:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"callback(state::FrankWolfe.CallbackState, args...)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The callback can be used to log additional information or store some values of interest in an external array. If a callback is passed, the trajectory keyword is ignored since it is a special case of callback pushing the 5 first elements of the state to an array returned from the algorithm.","category":"page"},{"location":"advanced/#Custom-extreme-point-types","page":"Advanced features","title":"Custom extreme point types","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For some feasible sets, the extreme points of the feasible set returned by the LMO possess a specific structure that can be represented in an efficient manner both for storage and for common operations like scaling and addition with an iterate. See for example FrankWolfe.ScaledHotVector and FrankWolfe.RankOneMatrix.","category":"page"},{"location":"advanced/#Active-set","page":"Advanced features","title":"Active set","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The active set represents an iterate as a convex combination of atoms (also referred to as extreme points or vertices). It maintains a vector of atoms, the corresponding weights, and the current iterate.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Note: the weights in the active set are currently defined as Float64 in the algorithm. This means that even with vertices using a lower precision, the iterate sum_i(lambda_i * v_i) will be upcast to Float64. One reason for keeping this as-is for now is the higher precision required by the computation of iterates from their barycentric decomposition.","category":"page"},{"location":"advanced/#Extra-lazification-with-a-vertex-storage","page":"Advanced features","title":"Extra-lazification with a vertex storage","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"One can pass the following keyword arguments to some active set-based Frank-Wolfe algorithms:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"add_dropped_vertices=true,\nuse_extra_vertex_storage=true,\nextra_vertex_storage=vertex_storage,","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"add_dropped_vertices activates feeding discarded vertices to the storage while use_extra_vertex_storage determines whether vertices from the storage are used in the algorithm. See Extra-lazification for a complete example.","category":"page"},{"location":"advanced/#Miscellaneous","page":"Advanced features","title":"Miscellaneous","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Emphasis: All solvers support emphasis (parameter Emphasis) to either exploit vectorized linear algebra or be memory efficient, e.g., for large-scale instances\nVarious caching strategies for the lazy implementations. Unbounded cache sizes (can get slow), bounded cache sizes as well as early returns once any sufficient vertex is found in the cache.\nOptionally all algorithms can be endowed with gradient momentum. This might help convergence especially in the stochastic context.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Coming soon: when the LMO can compute dual prices, then the Frank-Wolfe algorithms will return dual prices for the (approximately) optimal solutions (see Braun, Pokutta (2021)).","category":"page"},{"location":"advanced/#Rational-arithmetic","page":"Advanced features","title":"Rational arithmetic","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Example: examples/approximateCaratheodory.jl","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"We can solve the approximate Carathéodory problem with rational arithmetic to obtain rational approximations; see Combettes, Pokutta 2019 for some background about approximate Carathéodory and Conditioanl Gradients. We consider the simple instance of approximating the 0 over the probability simplex here:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"min_x in Delta(n) x^2","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"with n = 100.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Vanilla Frank-Wolfe Algorithm.\nEMPHASIS: blas STEPSIZE: rationalshortstep EPSILON: 1.0e-7 max_iteration: 100 TYPE: Rational{BigInt}\n\n───────────────────────────────────────────────────────────────────────────────────\n Type Iteration Primal Dual Dual Gap Time\n───────────────────────────────────────────────────────────────────────────────────\n I 0 1.000000e+00 -1.000000e+00 2.000000e+00 1.540385e-01\n FW 10 9.090909e-02 -9.090909e-02 1.818182e-01 2.821186e-01\n FW 20 4.761905e-02 -4.761905e-02 9.523810e-02 3.027964e-01\n FW 30 3.225806e-02 -3.225806e-02 6.451613e-02 3.100331e-01\n FW 40 2.439024e-02 -2.439024e-02 4.878049e-02 3.171654e-01\n FW 50 1.960784e-02 -1.960784e-02 3.921569e-02 3.244207e-01\n FW 60 1.639344e-02 -1.639344e-02 3.278689e-02 3.326185e-01\n FW 70 1.408451e-02 -1.408451e-02 2.816901e-02 3.418239e-01\n FW 80 1.234568e-02 -1.234568e-02 2.469136e-02 3.518750e-01\n FW 90 1.098901e-02 -1.098901e-02 2.197802e-02 3.620287e-01\n Last 1.000000e-02 1.000000e-02 0.000000e+00 4.392171e-01\n───────────────────────────────────────────────────────────────────────────────────\n\n 0.600608 seconds (3.83 M allocations: 111.274 MiB, 12.97% gc time)\n\nOutput type of solution: Rational{BigInt}","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The solution returned is rational as we can see and in fact the exactly optimal solution:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"x = Rational{BigInt}[1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100, 1//100]","category":"page"},{"location":"advanced/#Large-scale-problems","page":"Advanced features","title":"Large-scale problems","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Example: examples/large_scale.jl","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The package is built to scale well, for those conditional gradients variants that can scale well. For example, Away-Step Frank-Wolfe and Pairwise Conditional Gradients do in most cases not scale well because they need to maintain active sets and maintaining them can be very expensive. Similarly, line search methods might become prohibitive at large sizes. However if we consider scale-friendly variants, e.g., the vanilla Frank-Wolfe algorithm with the agnostic step size rule or short step rule, then these algorithms can scale well to extreme sizes esentially only limited by the amount of memory available. However even for these methods that tend to scale well, allocation of memory itself can be very slow when you need to allocate gigabytes of memory for a single gradient computation.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The package is build to support extreme sizes with a special memory efficient emphasis emphasis=FrankWolfe.memory, which minimizes expensive memory allocations and performs as many operations in-place as possible.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Here is an example of a run with 1e9 variables. Each gradient is around 7.5 GB in size. Here is the output of the run broken down into pieces:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Size of single vector (Float64): 7629.39453125 MB\nTesting f... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:23\nTesting grad... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:23\nTesting lmo... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:29\nTesting dual gap... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:46\nTesting update... (Emphasis: blas) 100%|███████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:35\nTesting update... (Emphasis: memory) 100%|█████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:58\n ──────────────────────────────────────────────────────────────────────────\n Time Allocations\n ────────────────────── ───────────────────────\n Tot / % measured: 278s / 31.4% 969GiB / 30.8%\n\n Section ncalls time %tot avg alloc %tot avg\n ──────────────────────────────────────────────────────────────────────────\n update (blas) 10 36.1s 41.3% 3.61s 149GiB 50.0% 14.9GiB\n lmo 10 18.4s 21.1% 1.84s 0.00B 0.00% 0.00B\n grad 10 12.8s 14.6% 1.28s 74.5GiB 25.0% 7.45GiB\n f 10 12.7s 14.5% 1.27s 74.5GiB 25.0% 7.45GiB\n update (memory) 10 5.00s 5.72% 500ms 0.00B 0.00% 0.00B\n dual gap 10 2.40s 2.75% 240ms 0.00B 0.00% 0.00B\n ──────────────────────────────────────────────────────────────────────────","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The above is the optional benchmarking of the oracles that we provide to understand how fast crucial parts of the algorithms are, mostly notably oracle evaluations, the update of the iterate and the computation of the dual gap. As you can see if you compare update (blas) vs. update (memory), the normal update when we use BLAS requires an additional 14.9GB of memory on top of the gradient etc whereas the update (memory) (the memory emphasis mode) does not consume any extra memory. This is also reflected in the computational times: the BLAS version requires 3.61 seconds on average to update the iterate, while the memory emphasis version requires only 500ms. In fact none of the crucial components in the algorithm consume any memory when run in memory efficient mode. Now let us look at the actual footprint of the whole algorithm:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Vanilla Frank-Wolfe Algorithm.\nEMPHASIS: memory STEPSIZE: agnostic EPSILON: 1.0e-7 MAXITERATION: 1000 TYPE: Float64\nMOMENTUM: nothing GRADIENTTYPE: Nothing\nWARNING: In memory emphasis mode iterates are written back into x0!\n\n─────────────────────────────────────────────────────────────────────────────────────────────────\n Type Iteration Primal Dual Dual Gap Time It/sec\n─────────────────────────────────────────────────────────────────────────────────────────────────\n I 0 1.000000e+00 -1.000000e+00 2.000000e+00 8.783523e+00 0.000000e+00\n FW 100 1.326732e-02 -1.326733e-02 2.653465e-02 4.635923e+02 2.157068e-01\n FW 200 6.650080e-03 -6.650086e-03 1.330017e-02 9.181294e+02 2.178342e-01\n FW 300 4.437059e-03 -4.437064e-03 8.874123e-03 1.372615e+03 2.185609e-01\n FW 400 3.329174e-03 -3.329180e-03 6.658354e-03 1.827260e+03 2.189070e-01\n FW 500 2.664003e-03 -2.664008e-03 5.328011e-03 2.281865e+03 2.191190e-01\n FW 600 2.220371e-03 -2.220376e-03 4.440747e-03 2.736387e+03 2.192672e-01\n FW 700 1.903401e-03 -1.903406e-03 3.806807e-03 3.190951e+03 2.193703e-01\n FW 800 1.665624e-03 -1.665629e-03 3.331253e-03 3.645425e+03 2.194532e-01\n FW 900 1.480657e-03 -1.480662e-03 2.961319e-03 4.099931e+03 2.195159e-01\n FW 1000 1.332665e-03 -1.332670e-03 2.665335e-03 4.554703e+03 2.195533e-01\n Last 1000 1.331334e-03 -1.331339e-03 2.662673e-03 4.559822e+03 2.195261e-01\n─────────────────────────────────────────────────────────────────────────────────────────────────\n\n4560.661203 seconds (7.41 M allocations: 112.121 GiB, 0.01% gc time)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"As you can see the algorithm ran for about 4600 secs (single-thread run) allocating 112.121 GiB of memory throughout. So how does this average out to the per-iteration cost in terms of memory: 112.121 / 7.45 / 1000 = 0.0151 so about 15.1MiB per iteration which is much less than the size of the gradient and in fact only stems from the reporting here.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"NB. This example highlights also one of the great features of first-order methods and conditional gradients in particular: we have dimension-independent convergence rates. In fact, we contract the primal gap as 2LD^2 / (t+2) (for the simple agnostic rule) and, e.g., if the feasible region is the probability simplex with D = sqrt(2) and the function has bounded Lipschitzness, e.g., the function || x - xp ||^2 has L = 2, then the convergence rate is completely independent of the input size. The only thing that limits scaling is how much memory you have available and whether you can stomach the (linear) per-iteration cost.","category":"page"},{"location":"advanced/#Iterate-and-atom-expected-interface","page":"Advanced features","title":"Iterate and atom expected interface","text":"","category":"section"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Frank-Wolfe can work on iterate beyond plain vectors, for example with any array-like object. Broadly speaking, the iterate type is assumed to behave as the member of a Hilbert space and optionally be mutable. Assuming the iterate type is IT, some methods must be implemented, with their usual semantics:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"Base.similar(::IT)\nBase.similar(::IT, ::Type{T})\nBase.eltype(::IT)\nBase.copy(::IT)\n\nBase.:+(x1::IT, x2::IT)\nBase.:*(scalar::Real, x::IT)\nBase.:-(x1::IT, x2::IT)\nLinearAlgebra.dot(x1::IT, x2::IT)","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"For methods using an FrankWolfe.ActiveSet, the atoms or individual extreme points of the feasible region are not necessarily of the same type as the iterate. They are assumed to be immutable, must implement LinearAlgebra.dot with a gradient object. See for example FrankWolfe.RankOneMatrix or FrankWolfe.ScaledHotVector.","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"The iterate type IT must be a broadcastable mutable object or implement FrankWolfe.compute_active_set_iterate!:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"FrankWolfe.compute_active_set_iterate!(active_set::FrankWolfe.ActiveSet{AT, R, IT}) where {AT, R}","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"which recomputes the iterate from the current convex decomposition and the following methods FrankWolfe.active_set_update_scale! and FrankWolfe.active_set_update_iterate_pairwise!:","category":"page"},{"location":"advanced/","page":"Advanced features","title":"Advanced features","text":"FrankWolfe.active_set_update_scale!(x::IT, lambda, atom)\nFrankWolfe.active_set_update_iterate_pairwise!(x::IT, lambda, fw_atom, away_atom)","category":"page"},{"location":"reference/1_algorithms/#Algorithms","page":"Algorithms","title":"Algorithms","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"This section contains all main algorithms of the package. These are the ones typical users will call.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"The typical signature for these algorithms is:","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"my_algorithm(f, grad!, lmo, x0)","category":"page"},{"location":"reference/1_algorithms/#Standard-algorithms","page":"Algorithms","title":"Standard algorithms","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"fw_algorithms.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.frank_wolfe-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.frank_wolfe","text":"frank_wolfe(f, grad!, lmo, x0; ...)\n\nSimplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data) with:\n\nx final iterate\nv last vertex from the LMO\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.lazified_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.lazified_conditional_gradient","text":"lazified_conditional_gradient(f, grad!, lmo_base, x0; ...)\n\nSimilar to FrankWolfe.frank_wolfe but lazyfying the LMO: each call is stored in a cache, which is looked up first for a good-enough direction. The cache used is a FrankWolfe.MultiCacheLMO or a FrankWolfe.VectorCacheLMO depending on whether the provided cache_size option is finite.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.stochastic_frank_wolfe-Tuple{FrankWolfe.StochasticObjective, Any, Any}","page":"Algorithms","title":"FrankWolfe.stochastic_frank_wolfe","text":"stochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)\n\nStochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective interface.\n\nKeyword arguments include batch_size to pass a fixed batch_size or a batch_iterator implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator) for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.\n\nSimilarly, a constant momentum can be passed or replaced by a momentum_iterator implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator).\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.block_coordinate_frank_wolfe","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.block_coordinate_frank_wolfe","page":"Algorithms","title":"FrankWolfe.block_coordinate_frank_wolfe","text":"block_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}\n\nBlock-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f over the product of feasible domains specified by the lmo. The optional argument the update_order is of type FrankWolfe.BlockCoordinateUpdateOrder and controls the order in which the blocks are updated.\n\nThe method returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ntraj_data vector of trajectory information.\n\nSee S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.\n\n\n\n\n\n","category":"function"},{"location":"reference/1_algorithms/#Active-set-based-methods","page":"Algorithms","title":"Active-set based methods","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"The following algorithms maintain the representation of the iterates as a convex combination of vertices.","category":"page"},{"location":"reference/1_algorithms/#Away-step","page":"Algorithms","title":"Away-step","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"afw.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.away_frank_wolfe-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.away_frank_wolfe","text":"away_frank_wolfe(f, grad!, lmo, x0; ...)\n\nFrank-Wolfe with away steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Blended-Conditional-Gradient","page":"Algorithms","title":"Blended Conditional Gradient","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"blended_cg.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex-Tuple{Any, Any, Any, Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex","text":"accelerated_simplex_gradient_descent_over_probability_simplex\n\nMinimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using Nesterov's accelerated gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.blended_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.blended_conditional_gradient","text":"blended_conditional_gradient(f, grad!, lmo, x0)\n\nEntry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. \"Blended conditonal gradients\" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.build_reduced_problem-Tuple{AbstractVector{var\"#s324\"} where var\"#s324\"<:FrankWolfe.ScaledHotVector, Any, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.build_reduced_problem","text":"build_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)\n\nGiven an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:\n\nf(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ\n\nIn the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.lp_separation_oracle-Tuple{FrankWolfe.LinearMinimizationOracle, FrankWolfe.ActiveSet, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.lp_separation_oracle","text":"Returns either a tuple (y, val) with y an atom from the active set satisfying the progress criterion and val the corresponding gap dot(y, direction) or the same tuple with y from the LMO.\n\ninplace_loop controls whether the iterate type allows in-place writes. kwargs are passed on to the LMO oracle.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.minimize_over_convex_hull!-Tuple{Any, Any, Any, FrankWolfe.ActiveSet, Any, Any, Any, Any}","page":"Algorithms","title":"FrankWolfe.minimize_over_convex_hull!","text":"minimize_over_convex_hull!\n\nGiven a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.\n\nIt will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.projection_simplex_sort-Tuple{Any}","page":"Algorithms","title":"FrankWolfe.projection_simplex_sort","text":"projection_simplex_sort(x; s=1.0)\n\nPerform a projection onto the probability simplex of radius s using a sorting algorithm.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.simplex_gradient_descent_over_convex_hull","page":"Algorithms","title":"FrankWolfe.simplex_gradient_descent_over_convex_hull","text":"simplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)\n\nMinimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.\n\n\n\n\n\n","category":"function"},{"location":"reference/1_algorithms/#FrankWolfe.simplex_gradient_descent_over_probability_simplex-Tuple{Any, Any, Any, Any, Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.simplex_gradient_descent_over_probability_simplex","text":"simplex_gradient_descent_over_probability_simplex\n\nMinimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using gradient descent.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.strong_frankwolfe_gap-Tuple{Any}","page":"Algorithms","title":"FrankWolfe.strong_frankwolfe_gap","text":"Checks the strong Frank-Wolfe gap for the reduced problem.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.strong_frankwolfe_gap_probability_simplex-Tuple{Any, Any}","page":"Algorithms","title":"FrankWolfe.strong_frankwolfe_gap_probability_simplex","text":"strong_frankwolfe_gap_probability_simplex\n\nCompute the Strong-Wolfe gap over the unit probability simplex given a gradient.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Blended-Pairwise-Conditional-Gradient","page":"Algorithms","title":"Blended Pairwise Conditional Gradient","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"pairwise.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.blended_pairwise_conditional_gradient-NTuple{4, Any}","page":"Algorithms","title":"FrankWolfe.blended_pairwise_conditional_gradient","text":"blended_pairwise_conditional_gradient(f, grad!, lmo, x0; kwargs...)\n\nImplements the BPCG algorithm from Tsuji, Tanaka, Pokutta (2021). The method uses an active set of current vertices. Unlike away-step, it transfers weight from an away vertex to another vertex of the active set.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.blended_pairwise_conditional_gradient-Tuple{Any, Any, Any, FrankWolfe.ActiveSet}","page":"Algorithms","title":"FrankWolfe.blended_pairwise_conditional_gradient","text":"blended_pairwise_conditional_gradient(f, grad!, lmo, active_set::ActiveSet; kwargs...)\n\nWarm-starts BPCG with a pre-defined active_set.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Alternating-Methods","page":"Algorithms","title":"Alternating Methods","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Problems over intersections of convex sets, i.e. ","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"min_x in bigcap_i=1^n P_i f(x)","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"pose a challenge as one has to combine the information of two or more LMOs.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.alternating_linear_minimization converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function. ","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"FrankWolfe.alternating_projections solves feasibility problems over intersections of feasible regions.","category":"page"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Modules = [FrankWolfe]\nPages = [\"alternating_methods.jl\"]","category":"page"},{"location":"reference/1_algorithms/#FrankWolfe.alternating_linear_minimization-Union{Tuple{N}, Tuple{Any, Any, Any, Tuple{Vararg{FrankWolfe.LinearMinimizationOracle, N}}, Any}} where N","page":"Algorithms","title":"FrankWolfe.alternating_linear_minimization","text":"alternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}\n\nAlternating Linear Minimization minimizes the objective f over the intersections of the feasible domains specified by lmos. Returns a tuple (x, v, primal, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\nprimal primal value f(x)\ndual_gap final Frank-Wolfe gap\ninfeas sum of squared, pairwise distances between iterates \ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#FrankWolfe.alternating_projections-Union{Tuple{N}, Tuple{Tuple{Vararg{FrankWolfe.LinearMinimizationOracle, N}}, Any}} where N","page":"Algorithms","title":"FrankWolfe.alternating_projections","text":"alternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}\n\nComputes a point in the intersection of feasible domains specified by lmos. Returns a tuple (x, v, dual_gap, infeas, traj_data) with:\n\nx cartesian product of final iterates\nv cartesian product of last vertices of the LMOs\ndual_gap final Frank-Wolfe gap\ninfeas sum of squared, pairwise distances between iterates \ntraj_data vector of trajectory information.\n\n\n\n\n\n","category":"method"},{"location":"reference/1_algorithms/#Index","page":"Algorithms","title":"Index","text":"","category":"section"},{"location":"reference/1_algorithms/","page":"Algorithms","title":"Algorithms","text":"Pages = [\"2_algorithms.md\"]","category":"page"},{"location":"reference/0_reference/#API-Reference","page":"API Reference","title":"API Reference","text":"","category":"section"},{"location":"reference/0_reference/","page":"API Reference","title":"API Reference","text":"The pages in this section reference the documentation for specific types and functions.","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"EditURL = \"../../../examples/docs_5_blended_cg.jl\"","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_5_blended_cg/#Blended-Conditional-Gradients","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"","category":"section"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"The FW and AFW algorithms, and their lazy variants share one feature: they attempt to make primal progress over a reduced set of vertices. The AFW algorithm does this through away steps (which do not increase the cardinality of the active set), and the lazy variants do this through the use of previously exploited vertices. A third strategy that one can follow is to explicitly blend Frank-Wolfe steps with gradient descent steps over the convex hull of the active set (note that this can be done without requiring a projection oracle over C, thus making the algorithm projection-free). This results in the Blended Conditional Gradient (BCG) algorithm, which attempts to make as much progress as possible through the convex hull of the current active set S_t until it automatically detects that in order to make further progress it requires additional calls to the LMO.","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"See also Blended Conditional Gradients: the unconditioning of conditional gradients, Braun et al, 2019, https://arxiv.org/abs/1805.07311","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"using FrankWolfe\nusing LinearAlgebra\nusing Random\nusing SparseArrays\n\nn = 1000\nk = 10000\n\nRandom.seed!(41)\n\nmatrix = rand(n, n)\nhessian = transpose(matrix) * matrix\nlinear = rand(n)\nf(x) = dot(linear, x) + 0.5 * transpose(x) * hessian * x\nfunction grad!(storage, x)\n return storage .= linear + hessian * x\nend\nL = eigmax(hessian)","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"We run over the probability simplex and call the LMO to get an initial feasible point:","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"lmo = FrankWolfe.ProbabilitySimplexOracle(1.0);\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\n\ntarget_tolerance = 1e-5\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_accel_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=true,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=false,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_convex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\ndata = [trajectoryBCG_accel_simplex, trajectoryBCG_simplex, trajectoryBCG_convex]\nlabel = [\"BCG (accel simplex)\", \"BCG (simplex)\", \"BCG (convex)\"]\nplot_trajectories(data, label, xscalelog=true)\n\n\n\nmatrix = rand(n, n)\nhessian = transpose(matrix) * matrix\nlinear = rand(n)\nf(x) = dot(linear, x) + 0.5 * transpose(x) * hessian * x + 10\nfunction grad!(storage, x)\n return storage .= linear + hessian * x\nend\nL = eigmax(hessian)\n\nlmo = FrankWolfe.KSparseLMO(100, 100.0)\nx00 = FrankWolfe.compute_extreme_point(lmo, zeros(n))\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_accel_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=true,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_simplex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n hessian=hessian,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n accelerated=false,\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\nx0 = deepcopy(x00)\nx, v, primal, dual_gap, trajectoryBCG_convex, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n epsilon=target_tolerance,\n max_iteration=k,\n line_search=FrankWolfe.Adaptive(L_est=L),\n print_iter=k / 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=true,\n trajectory=true,\n lazy_tolerance=1.0,\n weight_purge_threshold=1e-10,\n)\n\ndata = [trajectoryBCG_accel_simplex, trajectoryBCG_simplex, trajectoryBCG_convex]\nlabel = [\"BCG (accel simplex)\", \"BCG (simplex)\", \"BCG (convex)\"]\nplot_trajectories(data, label, xscalelog=true)","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"","category":"page"},{"location":"examples/docs_5_blended_cg/","page":"Blended Conditional Gradients","title":"Blended Conditional Gradients","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"EditURL = \"../../../examples/docs_10_alternating_methods.jl\"","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Alternating-methods","page":"Alternating methods","title":"Alternating methods","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"In this example we will compare FrankWolfe.alternating_linear_minimization and FrankWolfe.alternating_projections for a very simple feasibility problem.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"We consider the probability simplex","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"P = x in mathbbR^n colon sum_i=1^n x_i = 1 x_i geq 0 i=1dotsn ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"and a scaled, shifted ell^infty norm ball","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Q = -10^n ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"The goal is to find either a point in the intersection, x in P cap Q, or a pair of points, (x_P x_Q) in P times Q, which attains minimal distance between P and Q,","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"x_P - x_Q_2 = min_(xy) in P times Q x - y _2 ","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"using FrankWolfe\ninclude(\"../examples/plot_utils.jl\")","category":"page"},{"location":"examples/docs_10_alternating_methods/#Setting-up-objective,-gradient-and-linear-minimization-oracles","page":"Alternating methods","title":"Setting up objective, gradient and linear minimization oracles","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Since we only consider the feasibility problem the objective function as well as the gradient are zero.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"n = 20\n\nf(x) = 0\n\nfunction grad!(storage, x)\n @. storage = zero(x)\nend\n\n\nlmo1 = FrankWolfe.ProbabilitySimplexOracle(1.0)\nlmo2 = FrankWolfe.ScaledBoundLInfNormBall(-ones(n), zeros(n))\nlmos = (lmo1, lmo2)\n\nx0 = rand(n)\n\ntarget_tolerance = 1e-6\n\ntrajectories = [];\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Running-Alternating-Linear-Minimization","page":"Alternating methods","title":"Running Alternating Linear Minimization","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"We run Alternating Linear Minimization (ALM) with FrankWolfe.block_coordinate_frank_wolfe. This method allows three different update orders, FullUpdate, CyclicUpdate and Stochasticupdate. Accordingly both blocks are updated either simulatenously, sequentially or in random order.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"for order in [FrankWolfe.FullUpdate(), FrankWolfe.CyclicUpdate(), FrankWolfe.StochasticUpdate()]\n\n _, _, _, _, _, alm_trajectory = FrankWolfe.alternating_linear_minimization(\n FrankWolfe.block_coordinate_frank_wolfe,\n f,\n grad!,\n lmos,\n x0,\n update_order=order,\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n )\n push!(trajectories, alm_trajectory)\nend","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"As an alternative to Block-Coordiante Frank-Wolfe (BCFW), one can also run alternating linear minimization with standard Frank-Wolfe algorithm. These methods perform then the full (simulatenous) update at each iteration. In this example we also use FrankWolfe.away_frank_wolfe.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"_, _, _, _, _, afw_trajectory = FrankWolfe.alternating_linear_minimization(\n FrankWolfe.away_frank_wolfe,\n f,\n grad!,\n lmos,\n x0,\n verbose=true,\n trajectory=true,\n epsilon=target_tolerance,\n)\npush!(trajectories, afw_trajectory);\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Running-Alternating-Projections","page":"Alternating methods","title":"Running Alternating Projections","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"Unlike ALM, Alternating Projections (AP) is only suitable for feasibility problems. One omits the objective and gradient as parameters.","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"_, _, _, _, ap_trajectory = FrankWolfe.alternating_projections(\n lmos,\n x0,\n trajectory=true,\n verbose=true,\n print_iter=100,\n epsilon=target_tolerance,\n)\npush!(trajectories, ap_trajectory);\nnothing #hide","category":"page"},{"location":"examples/docs_10_alternating_methods/#Plotting-the-resulting-trajectories","page":"Alternating methods","title":"Plotting the resulting trajectories","text":"","category":"section"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"labels = [\"BCFW - Full\", \"BCFW - Cyclic\", \"BCFW - Stochastic\", \"AFW\", \"AP\"]\n\nplot_trajectories(trajectories, labels, xscalelog=true)","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"","category":"page"},{"location":"examples/docs_10_alternating_methods/","page":"Alternating methods","title":"Alternating methods","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"EditURL = \"../../../examples/docs_4_rational_opt.jl\"","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_4_rational_opt/#Exact-Optimization-with-Rational-Arithmetic","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"","category":"section"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"This example can be found in section 4.3 in the paper. The package allows for exact optimization with rational arithmetic. For this, it suffices to set up the LMO to be rational and choose an appropriate step-size rule as detailed below. For the LMOs included in the package, this simply means initializing the radius with a rational-compatible element type, e.g., 1, rather than a floating-point number, e.g., 1.0. Given that numerators and denominators can become quite large in rational arithmetic, it is strongly advised to base the used rationals on extended-precision integer types such as BigInt, i.e., we use Rational{BigInt}.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"The second requirement ensuring that the computation runs in rational arithmetic is a rational-compatible step-size rule. The most basic step-size rule compatible with rational optimization is the agnostic step-size rule with gamma_t = 2(2 + t). With this step-size rule, the gradient does not even need to be rational as long as the atom computed by the LMO is of a rational type. Assuming these requirements are met, all iterates and the computed solution will then be rational.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"using FrankWolfe\nusing LinearAlgebra\n\nn = 100\nk = n\n\nx = fill(big(1) // 100, n)\n\nf(x) = dot(x, x)\nfunction grad!(storage, x)\n @. storage = 2 * x\nend","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"pick feasible region radius needs to be integer or rational","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"lmo = FrankWolfe.ProbabilitySimplexOracle{Rational{BigInt}}(1)","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"compute some initial vertex","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"x0 = FrankWolfe.compute_extreme_point(lmo, zeros(n));\n\nx, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.Agnostic(),\n print_iter=k / 10,\n verbose=true,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n);\n\nprintln(\"\\nOutput type of solution: \", eltype(x))","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"Another possible step-size rule is rationalshortstep which computes the step size by minimizing the smoothness inequality as gamma_t=fraclangle nabla f(x_t)x_t-v_trangle2Lx_t-v_t^2. However, as this step size depends on an upper bound on the Lipschitz constant L as well as the inner product with the gradient nabla f(x_t), both have to be of a rational type.","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"@time x, v, primal, dual_gap, trajectory = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=k,\n line_search=FrankWolfe.Shortstep(2 // 1),\n print_iter=k / 10,\n verbose=true,\n memory_mode=FrankWolfe.OutplaceEmphasis(),\n);\nnothing #hide","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"Note: at the last step, we exactly close the gap, finding the solution 1//n * ones(n)","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"","category":"page"},{"location":"examples/docs_4_rational_opt/","page":"Exact Optimization with Rational Arithmetic","title":"Exact Optimization with Rational Arithmetic","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"EditURL = \"../../../examples/docs_0_fw_visualized.jl\"","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_0_fw_visualized/#Visualization-of-Frank-Wolfe-running-on-a-2-dimensional-polytope","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"This example provides an intuitive view of the Frank-Wolfe algorithm by running it on a polyhedral set with a quadratic function. The Linear Minimization Oracle (LMO) corresponds to a call to a generic simplex solver from MathOptInterface.jl (MOI).","category":"page"},{"location":"examples/docs_0_fw_visualized/#Import-and-setup","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Import and setup","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We first import the necessary packages, including Polyhedra to visualize the feasible set.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"using LinearAlgebra\nusing FrankWolfe\n\nimport MathOptInterface\nconst MOI = MathOptInterface\nusing GLPK\n\nusing Polyhedra\nusing Plots","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We can then define the objective function, here the squared distance to a point in the place, and its in-place gradient.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"n = 2\ny = [3.2, 0.5]\n\nfunction f(x)\n return 1 / 2 * norm(x - y)^2\nend\nfunction grad!(storage, x)\n @. storage = x - y\nend","category":"page"},{"location":"examples/docs_0_fw_visualized/#Custom-callback","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Custom callback","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"FrankWolfe.jl lets users define custom callbacks to record information about each iteration. In that case, the callback will copy the current iterate x, the current vertex v, and the current step size gamma to an array thanks to a closure. We then declare the array and the callback over this array. Each iteration will then push to this array.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"function build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(trajectory_arr, (copy(state.x), copy(state.v), state.gamma))\n end\nend\n\niterates_information_vector = []\ncallback = build_callback(iterates_information_vector)","category":"page"},{"location":"examples/docs_0_fw_visualized/#Creating-the-Linear-Minimization-Oracle","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Creating the Linear Minimization Oracle","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The LMO is defined as a call to a linear optimization solver, each iteration resets the objective and calls the solver. The linear constraints must be defined only once at the beginning and remain identical along iterations. We use here MathOptInterface directly but the constraints could also be defined with JuMP or Convex.jl.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"o = GLPK.Optimizer()\nx = MOI.add_variables(o, n)\n\n# −x + y ≤ 2\nc1 = MOI.add_constraint(o, -1.0x[1] + x[2], MOI.LessThan(2.0))\n\n# x + 2 y ≤ 4\nc2 = MOI.add_constraint(o, x[1] + 2.0x[2], MOI.LessThan(4.0))\n\n# −2 x − y ≤ 1\nc3 = MOI.add_constraint(o, -2.0x[1] - x[2], MOI.LessThan(1.0))\n\n# x − 2 y ≤ 2\nc4 = MOI.add_constraint(o, x[1] - 2.0x[2], MOI.LessThan(2.0))\n\n# x ≤ 2\nc5 = MOI.add_constraint(o, x[1] + 0.0x[2], MOI.LessThan(2.0))","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The LMO is then built by wrapping the current MOI optimizer","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"lmo_moi = FrankWolfe.MathOptLMO(o)","category":"page"},{"location":"examples/docs_0_fw_visualized/#Calling-Frank-Wolfe","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Calling Frank-Wolfe","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We can now compute an initial starting point from any direction and call the Frank-Wolfe algorithm. Note that we copy x0 before passing it to the algorithm because it is modified in-place by frank_wolfe.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"x0 = FrankWolfe.compute_extreme_point(lmo_moi, zeros(n))\n\nxfinal, vfinal, primal_value, dual_gap, traj_data = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo_moi,\n copy(x0),\n line_search=FrankWolfe.Adaptive(),\n max_iteration=10,\n epsilon=1e-8,\n callback=callback,\n verbose=true,\n print_iter=1,\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We now collect the iterates and vertices across iterations.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"iterates = Vector{Vector{Float64}}()\npush!(iterates, x0)\nvertices = Vector{Vector{Float64}}()\nfor s in iterates_information_vector\n push!(iterates, s[1])\n push!(vertices, s[2])\nend","category":"page"},{"location":"examples/docs_0_fw_visualized/#Plotting-the-algorithm-run","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Plotting the algorithm run","text":"","category":"section"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"We define another method for f adapted to plot its contours.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"function f(x1, x2)\n x = [x1, x2]\n return f(x)\nend\n\nxlist = collect(range(-1, 3, step=0.2))\nylist = collect(range(-1, 3, step=0.2))\n\nX = repeat(reshape(xlist, 1, :), length(ylist), 1)\nY = repeat(ylist, 1, length(xlist))","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"The feasible space is represented using Polyhedra.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"h =\n HalfSpace([-1, 1], 2) ∩ HalfSpace([1, 2], 4) ∩ HalfSpace([-2, -1], 1) ∩ HalfSpace([1, -2], 2) ∩\n HalfSpace([1, 0], 2)\n\np = polyhedron(h)\n\np1 = contour(xlist, ylist, f, fill=true, line_smoothing=0.85)\nplot(p1, opacity=0.5)\nplot!(\n p,\n ratio=:equal,\n opacity=0.5,\n label=\"feasible region\",\n framestyle=:zerolines,\n legend=true,\n color=:blue,\n);\nnothing #hide","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"Finally, we add all iterates and vertices to the plot.","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"colors = [\"gold\", \"purple\", \"darkorange2\", \"firebrick3\"]\niterates = unique!(iterates)\nfor i in 1:3\n scatter!(\n [iterates[i][1]],\n [iterates[i][2]],\n label=string(\"x_\", i - 1),\n markersize=6,\n color=colors[i],\n )\nend\nscatter!(\n [last(iterates)[1]],\n [last(iterates)[2]],\n label=string(\"x_\", length(iterates) - 1),\n markersize=6,\n color=last(colors),\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"plot chosen vertices","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"scatter!([vertices[1][1]], [vertices[1][2]], m=:diamond, markersize=6, color=colors[1], label=\"v_1\")\nscatter!(\n [vertices[2][1]],\n [vertices[2][2]],\n m=:diamond,\n markersize=6,\n color=colors[2],\n label=\"v_2\",\n legend=:outerleft,\n colorbar=true,\n)","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"","category":"page"},{"location":"examples/docs_0_fw_visualized/","page":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","title":"Visualization of Frank-Wolfe running on a 2-dimensional polytope","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"EditURL = \"../../../examples/docs_3_matrix_completion.jl\"","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_3_matrix_completion/#Matrix-Completion","page":"Matrix Completion","title":"Matrix Completion","text":"","category":"section"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"We present another example that is about matrix completion. The idea is, given a partially observed matrix YinmathbbR^mtimes n, to find XinmathbbR^mtimes n to minimize the sum of squared errors from the observed entries while 'completing' the matrix Y, i.e. filling the unobserved entries to match Y as good as possible. A detailed explanation can be found in section 4.2 of the paper. We will try to solve","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"min_X_*le tau sum_(ij)inmathcalI (X_ij-Y_ij)^2","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"where tau0, X_* is the nuclear norm, and mathcalI denotes the indices of the observed entries. We will use FrankWolfe.NuclearNormLMO and compare our Frank-Wolfe implementation with a Projected Gradient Descent (PGD) algorithm which, after each gradient descent step, projects the iterates back onto the nuclear norm ball. We use a movielens dataset for comparison.","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"using FrankWolfe\nusing ZipFile, DataFrames, CSV\n\nusing Random\nusing Plots\n\nusing Profile\n\nimport Arpack\nusing SparseArrays, LinearAlgebra\n\nusing LaTeXStrings\n\ntemp_zipfile = download(\"http://files.grouplens.org/datasets/movielens/ml-latest-small.zip\")\n\nzarchive = ZipFile.Reader(temp_zipfile)\n\nmovies_file = zarchive.files[findfirst(f -> occursin(\"movies\", f.name), zarchive.files)]\nmovies_frame = CSV.read(movies_file, DataFrame)\n\nratings_file = zarchive.files[findfirst(f -> occursin(\"ratings\", f.name), zarchive.files)]\nratings_frame = CSV.read(ratings_file, DataFrame)\n\nusers = unique(ratings_frame[:, :userId])\nmovies = unique(ratings_frame[:, :movieId])\n\n@assert users == eachindex(users)\nmovies_revert = zeros(Int, maximum(movies))\nfor (idx, m) in enumerate(movies)\n movies_revert[m] = idx\nend\nmovies_indices = [movies_revert[idx] for idx in ratings_frame[:, :movieId]]\n\nconst rating_matrix = sparse(\n ratings_frame[:, :userId],\n movies_indices,\n ratings_frame[:, :rating],\n length(users),\n length(movies),\n)\n\nmissing_rate = 0.05\n\nRandom.seed!(42)\n\nconst missing_ratings = Tuple{Int,Int}[]\nconst present_ratings = Tuple{Int,Int}[]\nlet\n (I, J, V) = SparseArrays.findnz(rating_matrix)\n for idx in eachindex(I)\n if V[idx] > 0\n if rand() <= missing_rate\n push!(missing_ratings, (I[idx], J[idx]))\n else\n push!(present_ratings, (I[idx], J[idx]))\n end\n end\n end\nend\n\nfunction f(X)\n r = 0.0\n for (i, j) in present_ratings\n r += 0.5 * (X[i, j] - rating_matrix[i, j])^2\n end\n return r\nend\n\nfunction grad!(storage, X)\n storage .= 0\n for (i, j) in present_ratings\n storage[i, j] = X[i, j] - rating_matrix[i, j]\n end\n return nothing\nend\n\nfunction test_loss(X)\n r = 0.0\n for (i, j) in missing_ratings\n r += 0.5 * (X[i, j] - rating_matrix[i, j])^2\n end\n return r\nend\n\nfunction project_nuclear_norm_ball(X; radius=1.0)\n U, sing_val, Vt = svd(X)\n if (sum(sing_val) <= radius)\n return X, -norm_estimation * U[:, 1] * Vt[:, 1]'\n end\n sing_val = FrankWolfe.projection_simplex_sort(sing_val, s=radius)\n return U * Diagonal(sing_val) * Vt', -norm_estimation * U[:, 1] * Vt[:, 1]'\nend\n\nnorm_estimation = 10 * Arpack.svds(rating_matrix, nsv=1, ritzvec=false)[1].S[1]\n\nconst lmo = FrankWolfe.NuclearNormLMO(norm_estimation)\nconst x0 = FrankWolfe.compute_extreme_point(lmo, ones(size(rating_matrix)))\nconst k = 10\n\ngradient = spzeros(size(x0)...)\ngradient_aux = spzeros(size(x0)...)\n\nfunction build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(trajectory_arr, (FrankWolfe.callback_state(state)..., test_loss(state.x)))\n end\nend","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"The smoothness constant is estimated:","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"num_pairs = 100\nL_estimate = -Inf\nfor i in 1:num_pairs\n global L_estimate\n u1 = rand(size(x0, 1))\n u1 ./= sum(u1)\n u1 .*= norm_estimation\n v1 = rand(size(x0, 2))\n v1 ./= sum(v1)\n x = FrankWolfe.RankOneMatrix(u1, v1)\n u2 = rand(size(x0, 1))\n u2 ./= sum(u2)\n u2 .*= norm_estimation\n v2 = rand(size(x0, 2))\n v2 ./= sum(v2)\n y = FrankWolfe.RankOneMatrix(u2, v2)\n grad!(gradient, x)\n grad!(gradient_aux, y)\n new_L = norm(gradient - gradient_aux) / norm(x - y)\n if new_L > L_estimate\n L_estimate = new_L\n end\nend","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"We can now perform projected gradient descent:","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"xgd = Matrix(x0)\nfunction_values = Float64[]\ntiming_values = Float64[]\nfunction_test_values = Float64[]\n\nls = FrankWolfe.Backtracking()\nls_storage = similar(xgd)\ntime_start = time_ns()\nfor _ in 1:k\n f_val = f(xgd)\n push!(function_values, f_val)\n push!(function_test_values, test_loss(xgd))\n push!(timing_values, (time_ns() - time_start) / 1e9)\n @info f_val\n grad!(gradient, xgd)\n xgd_new, vertex = project_nuclear_norm_ball(xgd - gradient / L_estimate, radius=norm_estimation)\n gamma = FrankWolfe.perform_line_search(\n ls,\n 1,\n f,\n grad!,\n gradient,\n xgd,\n xgd - xgd_new,\n 1.0,\n ls_storage,\n FrankWolfe.InplaceEmphasis(),\n )\n @. xgd -= gamma * (xgd - xgd_new)\nend\n\ntrajectory_arr_fw = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_fw)\nxfin, _, _, _, traj_data = FrankWolfe.frank_wolfe(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=10 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\ntrajectory_arr_lazy = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_lazy)\nxlazy, _, _, _, _ = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=10 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\n\ntrajectory_arr_lazy_ref = Vector{Tuple{Int64,Float64,Float64,Float64,Float64,Float64}}()\ncallback = build_callback(trajectory_arr_lazy_ref)\nxlazy, _, _, _, _ = FrankWolfe.lazified_conditional_gradient(\n f,\n grad!,\n lmo,\n x0;\n epsilon=1e-9,\n max_iteration=50 * k,\n print_iter=k / 10,\n verbose=false,\n line_search=FrankWolfe.Adaptive(),\n memory_mode=FrankWolfe.InplaceEmphasis(),\n gradient=gradient,\n callback=callback,\n)\n\nfw_test_values = getindex.(trajectory_arr_fw, 6)\nlazy_test_values = getindex.(trajectory_arr_lazy, 6)\n\nresults = Dict(\n \"svals_gd\" => svdvals(xgd),\n \"svals_fw\" => svdvals(xfin),\n \"svals_lcg\" => svdvals(xlazy),\n \"fw_test_values\" => fw_test_values,\n \"lazy_test_values\" => lazy_test_values,\n \"trajectory_arr_fw\" => trajectory_arr_fw,\n \"trajectory_arr_lazy\" => trajectory_arr_lazy,\n \"function_values_gd\" => function_values,\n \"function_values_test_gd\" => function_test_values,\n \"timing_values_gd\" => timing_values,\n \"trajectory_arr_lazy_ref\" => trajectory_arr_lazy_ref,\n)\n\nref_optimum = results[\"trajectory_arr_lazy_ref\"][end][2]\n\niteration_list = [\n [x[1] + 1 for x in results[\"trajectory_arr_fw\"]],\n [x[1] + 1 for x in results[\"trajectory_arr_lazy\"]],\n collect(1:1:length(results[\"function_values_gd\"])),\n]\ntime_list = [\n [x[5] for x in results[\"trajectory_arr_fw\"]],\n [x[5] for x in results[\"trajectory_arr_lazy\"]],\n results[\"timing_values_gd\"],\n]\nprimal_gap_list = [\n [x[2] - ref_optimum for x in results[\"trajectory_arr_fw\"]],\n [x[2] - ref_optimum for x in results[\"trajectory_arr_lazy\"]],\n [x - ref_optimum for x in results[\"function_values_gd\"]],\n]\ntest_list =\n [results[\"fw_test_values\"], results[\"lazy_test_values\"], results[\"function_values_test_gd\"]]\n\nlabel = [L\"\\textrm{FW}\", L\"\\textrm{L-CG}\", L\"\\textrm{GD}\"]\n\nplot_results(\n [primal_gap_list, primal_gap_list, test_list, test_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [L\"\\textrm{Iteration}\", L\"\\textrm{Time}\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [\n L\"\\textrm{Primal Gap}\",\n L\"\\textrm{Primal Gap}\",\n L\"\\textrm{Test Error}\",\n L\"\\textrm{Test Error}\",\n ],\n xscalelog=[:log, :identity, :log, :identity],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"","category":"page"},{"location":"examples/docs_3_matrix_completion/","page":"Matrix Completion","title":"Matrix Completion","text":"This page was generated using Literate.jl.","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/ZIB-IOL/FrankWolfe.jl/blob/master/README.md\"","category":"page"},{"location":"#FrankWolfe.jl","page":"Home","title":"FrankWolfe.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Build Status) (Image: Dev) (Image: Stable) (Image: Coverage) (Image: Genie Downloads)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is a toolbox for Frank-Wolfe and conditional gradients algorithms.","category":"page"},{"location":"#Overview","page":"Home","title":"Overview","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Frank-Wolfe algorithms were designed to solve optimization problems of the form min_x C f(x), where f is a differentiable convex function and C is a convex and compact set. They are especially useful when we know how to optimize a linear function over C in an efficient way.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A paper presenting the package with mathematical explanations and numerous examples can be found here:","category":"page"},{"location":"","page":"Home","title":"Home","text":"FrankWolfe.jl: A high-performance and flexible toolbox for Frank-Wolfe algorithms and Conditional Gradients.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The most recent release is available via the julia package manager, e.g., with","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.add(\"FrankWolfe\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"or the master branch:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Pkg.add(url=\"https://github.com/ZIB-IOL/FrankWolfe.jl\", rev=\"master\")","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Let's say we want to minimize the Euclidian norm over the probability simplex Δ. Using FrankWolfe.jl, this is what the code looks like (in dimension 3):","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using FrankWolfe\n\njulia> f(p) = sum(abs2, p) # objective function\n\njulia> grad!(storage, p) = storage .= 2p # in-place gradient computation\n\n# # function d ⟻ argmin ⟨p,d⟩ st. p ∈ Δ\njulia> lmo = FrankWolfe.ProbabilitySimplexOracle(1.)\n\njulia> p0 = [1., 0., 0.]\n\njulia> p_opt, _ = frank_wolfe(f, grad!, lmo, p0; verbose=true);\n\nVanilla Frank-Wolfe Algorithm.\nMEMORY_MODE: FrankWolfe.InplaceEmphasis() STEPSIZE: Adaptive EPSILON: 1.0e-7 MAXITERATION: 10000 TYPE: Float64\nMOMENTUM: nothing GRADIENTTYPE: Nothing\n[ Info: In memory_mode memory iterates are written back into x0!\n\n-------------------------------------------------------------------------------------------------\n Type Iteration Primal Dual Dual Gap Time It/sec\n-------------------------------------------------------------------------------------------------\n I 1 1.000000e+00 -1.000000e+00 2.000000e+00 0.000000e+00 Inf\n Last 24 3.333333e-01 3.333332e-01 9.488992e-08 1.533181e+00 1.565373e+01\n-------------------------------------------------------------------------------------------------\n\njulia> p_opt\n3-element Vector{Float64}:\n 0.33333334349923327\n 0.33333332783841896\n 0.3333333286623478","category":"page"},{"location":"#Documentation-and-examples","page":"Home","title":"Documentation and examples","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To explore the content of the package, go to the documentation.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Beyond those presented in the documentation, many more use cases are implemented in the examples folder. To run them, you will need to activate the test environment, which can be done simply with TestEnv.jl (we recommend you install it in your base Julia).","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> using TestEnv\n\njulia> TestEnv.activate()\n\"/tmp/jl_Ux8wKE/Project.toml\"\n\n# necessary for plotting\njulia> include(\"examples/plot_utils.jl\")\njulia> include(\"examples/linear_regression.jl\")\n...","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you need the plotting utilities in your own code, make sure Plots.jl is included in your current project and run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Plots\nusing FrankWolfe\n\ninclude(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\"))","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"EditURL = \"../../../examples/docs_8_callback_and_tracking.jl\"","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#Tracking,-counters-and-custom-callbacks-for-Frank-Wolfe","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In this example we will run the standard Frank-Wolfe algorithm while tracking the number of calls to the different oracles, namely function, gradient evaluations, and LMO calls. In order to track each of these metrics, a \"Tracking\" version of the Gradient, LMO and Function methods have to be supplied to the frank_wolfe algorithm, which are wrapping a standard one.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"using FrankWolfe\nusing Test\nusing LinearAlgebra\nusing FrankWolfe: ActiveSet","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#The-trackers-for-primal-objective,-gradient-and-LMO.","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"The trackers for primal objective, gradient and LMO.","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In order to count the number of function calls, a TrackingObjective is built from a standard objective function f, which will act in the same way as the original function does, but with an additional .counter field which tracks the number of calls.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"f(x) = norm(x)^2\ntf = FrankWolfe.TrackingObjective(f)\n@show tf.counter\ntf(rand(3))\n@show tf.counter\n# Resetting the counter\ntf.counter = 0;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Similarly, the tgrad! function tracks the number of gradient calls:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function grad!(storage, x)\n return storage .= 2x\nend\ntgrad! = FrankWolfe.TrackingGradient(grad!)\n@show tgrad!.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The tracking LMO operates in a similar fashion and tracks the number of compute_extreme_point calls.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"lmo_prob = FrankWolfe.ProbabilitySimplexOracle(1)\ntlmo_prob = FrankWolfe.TrackingLMO(lmo_prob)\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The tracking LMO can be applied for all types of LMOs and even in a nested way, which can be useful to track the number of calls to a lazified oracle. We can now pass the tracking versions tf, tgrad and tlmo_prob to frank_wolfe and display their call counts after the optimization process.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"x0 = FrankWolfe.compute_extreme_point(tlmo_prob, ones(5))\nfw_results = FrankWolfe.frank_wolfe(\n tf,\n tgrad!,\n tlmo_prob,\n x0,\n max_iteration=1000,\n line_search=FrankWolfe.Agnostic(),\n callback=nothing,\n)\n\n@show tf.counter\n@show tgrad!.counter\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/#Adding-a-custom-callback","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Adding a custom callback","text":"","category":"section"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"A callback is a user-defined function called at every iteration of the algorithm with the current state passed as a named tuple.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"We can implement our own callback, for example with:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Extended trajectory logging, similar to the trajectory = true option\nStop criterion after a certain number of calls to the primal objective function","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"To reuse the same tracking functions, Let us first reset their counters:","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"tf.counter = 0\ntgrad!.counter = 0\ntlmo_prob.counter = 0;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"The storage variable stores in the trajectory array the number of calls to each oracle at each iteration.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"storage = []","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"Now define our own trajectory logging function that extends the five default logged elements (iterations, primal, dual, dual_gap, time) with \".counter\" field arguments present in the tracking functions.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function push_tracking_state(state, storage)\n base_tuple = FrankWolfe.callback_state(state)\n if state.lmo isa FrankWolfe.CachedLinearMinimizationOracle\n complete_tuple = tuple(\n base_tuple...,\n state.gamma,\n state.f.counter,\n state.grad!.counter,\n state.lmo.inner.counter,\n )\n else\n complete_tuple = tuple(\n base_tuple...,\n state.gamma,\n state.f.counter,\n state.grad!.counter,\n state.lmo.counter,\n )\n end\n return push!(storage, complete_tuple)\nend","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"In case we want to stop the frank_wolfe algorithm prematurely after a certain condition is met, we can return a boolean stop criterion false. Here, we will implement a callback that terminates the algorithm if the primal objective function is evaluated more than 500 times.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"function make_callback(storage)\n return function callback(state, args...)\n push_tracking_state(state, storage)\n return state.f.counter < 500\n end\nend\n\ncallback = make_callback(storage)","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"We can show the difference between this standard run and the lazified conditional gradient algorithm which does not call the LMO at each iteration.","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"FrankWolfe.lazified_conditional_gradient(\n tf,\n tgrad!,\n tlmo_prob,\n x0,\n max_iteration=1000,\n traj_data=storage,\n line_search=FrankWolfe.Agnostic(),\n callback=callback,\n)\n\ntotal_iterations = storage[end][1]\n@show total_iterations\n@show tf.counter\n@show tgrad!.counter\n@show tlmo_prob.counter;\nnothing #hide","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"","category":"page"},{"location":"examples/docs_8_callback_and_tracking/","page":"Tracking, counters and custom callbacks for Frank Wolfe","title":"Tracking, counters and custom callbacks for Frank Wolfe","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"EditURL = \"../../../examples/docs_2_polynomial_regression.jl\"","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"import FrankWolfe; include(joinpath(dirname(pathof(FrankWolfe)), \"../examples/plot_utils.jl\")) # hide","category":"page"},{"location":"examples/docs_2_polynomial_regression/#Polynomial-Regression","page":"Polynomial Regression","title":"Polynomial Regression","text":"","category":"section"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"The following example features the LMO for polynomial regression on the ell_1 norm ball. Given input/output pairs x_iy_i_i=1^N and sparse coefficients c_j, where","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"y_i=sum_j=1^m c_j f_j(x_i)","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"and f_j mathbbR^ntomathbbR, the task is to recover those c_j that are non-zero alongside their corresponding values. Under certain assumptions, this problem can be convexified into","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"min_cinmathcalCy-Ac^2","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"for a convex set mathcalC. It can also be found as example 4.1 in the paper. In order to evaluate the polynomial, we generate a total of 1000 data points x_i_i=1^N from the standard multivariate Gaussian, with which we will compute the output variables y_i_i=1^N. Before evaluating the polynomial, these points will be contaminated with noise drawn from a standard multivariate Gaussian. We run the away_frank_wolfe and blended_conditional_gradient algorithms, and compare them to Projected Gradient Descent using a smoothness estimate. We will evaluate the output solution on test points drawn in a similar manner as the training points.","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"using FrankWolfe\n\nusing LinearAlgebra\nimport Random\n\nusing MultivariatePolynomials\nusing DynamicPolynomials\n\nusing Plots\n\nusing LaTeXStrings\n\nconst N = 10\n\nDynamicPolynomials.@polyvar X[1:15]\n\nconst max_degree = 4\ncoefficient_magnitude = 10\nnoise_magnitude = 1\n\nconst var_monomials = MultivariatePolynomials.monomials(X, 0:max_degree)\n\nRandom.seed!(42)\nconst all_coeffs = map(var_monomials) do m\n d = MultivariatePolynomials.degree(m)\n return coefficient_magnitude * rand() .* (rand() .> 0.95 * d / max_degree)\nend\n\nconst true_poly = dot(all_coeffs, var_monomials)\n\nconst training_data = map(1:500) do _\n x = 0.1 * randn(N)\n y = MultivariatePolynomials.subs(true_poly, Pair(X, x)) + noise_magnitude * randn()\n return (x, y.a[1])\nend\n\nconst extended_training_data = map(training_data) do (x, y)\n x_ext = MultivariatePolynomials.coefficient.(MultivariatePolynomials.subs.(var_monomials, X => x))\n return (x_ext, y)\nend\n\nconst test_data = map(1:1000) do _\n x = 0.4 * randn(N)\n y = MultivariatePolynomials.subs(true_poly, Pair(X, x)) + noise_magnitude * randn()\n return (x, y.a[1])\nend\n\nconst extended_test_data = map(test_data) do (x, y)\n x_ext = MultivariatePolynomials.coefficient.(MultivariatePolynomials.subs.(var_monomials, X => x))\n return (x_ext, y)\nend\n\nfunction f(coefficients)\n return 0.5 / length(extended_training_data) * sum(extended_training_data) do (x, y)\n return (dot(coefficients, x) - y)^2\n end\nend\n\nfunction f_test(coefficients)\n return 0.5 / length(extended_test_data) * sum(extended_test_data) do (x, y)\n return (dot(coefficients, x) - y)^2\n end\nend\n\nfunction coefficient_errors(coeffs)\n return 0.5 * sum(eachindex(all_coeffs)) do idx\n return (all_coeffs[idx] - coeffs[idx])^2\n end\nend\n\nfunction grad!(storage, coefficients)\n storage .= 0\n for (x, y) in extended_training_data\n p_i = dot(coefficients, x) - y\n @. storage += x * p_i\n end\n storage ./= length(training_data)\n return nothing\nend\n\nfunction build_callback(trajectory_arr)\n return function callback(state, args...)\n return push!(\n trajectory_arr,\n (FrankWolfe.callback_state(state)..., f_test(state.x), coefficient_errors(state.x)),\n )\n end\nend\n\ngradient = similar(all_coeffs)\n\nmax_iter = 10000\nrandom_initialization_vector = rand(length(all_coeffs))\n\nlmo = FrankWolfe.LpNormLMO{1}(0.95 * norm(all_coeffs, 1))\n\n# Estimating smoothness parameter\nnum_pairs = 1000\nL_estimate = -Inf\ngradient_aux = similar(gradient)\n\nfor i in 1:num_pairs # hide\n global L_estimate # hide\n x = compute_extreme_point(lmo, randn(size(all_coeffs))) # hide\n y = compute_extreme_point(lmo, randn(size(all_coeffs))) # hide\n grad!(gradient, x) # hide\n grad!(gradient_aux, y) # hide\n new_L = norm(gradient - gradient_aux) / norm(x - y) # hide\n if new_L > L_estimate # hide\n L_estimate = new_L # hide\n end # hide\nend # hide\n\nfunction projnorm1(x, τ)\n n = length(x)\n if norm(x, 1) ≤ τ\n return x\n end\n u = abs.(x)\n # simplex projection\n bget = false\n s_indices = sortperm(u, rev=true)\n tsum = zero(τ)\n\n @inbounds for i in 1:n-1\n tsum += u[s_indices[i]]\n tmax = (tsum - τ) / i\n if tmax ≥ u[s_indices[i+1]]\n bget = true\n break\n end\n end\n if !bget\n tmax = (tsum + u[s_indices[n]] - τ) / n\n end\n\n @inbounds for i in 1:n\n u[i] = max(u[i] - tmax, 0)\n u[i] *= sign(x[i])\n end\n return u\nend\nxgd = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector) # hide\ntraining_gd = Float64[] # hide\ntest_gd = Float64[] # hide\ncoeff_error = Float64[] # hide\ntime_start = time_ns() # hide\ngd_times = Float64[] # hide\nfor iter in 1:max_iter # hide\n global xgd # hide\n grad!(gradient, xgd) # hide\n xgd = projnorm1(xgd - gradient / L_estimate, lmo.right_hand_side) # hide\n push!(training_gd, f(xgd)) # hide\n push!(test_gd, f_test(xgd)) # hide\n push!(coeff_error, coefficient_errors(xgd)) # hide\n push!(gd_times, (time_ns() - time_start) * 1e-9) # hide\nend # hide\n\nx00 = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector) # hide\nx0 = deepcopy(x00) # hide\n\ntrajectory_lafw = [] # hide\ncallback = build_callback(trajectory_lafw) # hide\nx_lafw, v, primal, dual_gap, _ = FrankWolfe.away_frank_wolfe( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n lazy=true, # hide\n gradient=gradient, # hide\n callback=callback, # hide\n) # hide\n\ntrajectory_bcg = [] # hide\ncallback = build_callback(trajectory_bcg) # hide\nx0 = deepcopy(x00) # hide\nx_bcg, v, primal, dual_gap, _, _ = FrankWolfe.blended_conditional_gradient( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n weight_purge_threshold=1e-10, # hide\n callback=callback, # hide\n) # hide\nx0 = deepcopy(x00) # hide\ntrajectory_lafw_ref = [] # hide\ncallback = build_callback(trajectory_lafw_ref) # hide\n_, _, primal_ref, _, _ = FrankWolfe.away_frank_wolfe( # hide\n f, # hide\n grad!, # hide\n lmo, # hide\n x0, # hide\n max_iteration=2 * max_iter, # hide\n line_search=FrankWolfe.Adaptive(L_est=L_estimate), # hide\n print_iter=max_iter ÷ 10, # hide\n memory_mode=FrankWolfe.InplaceEmphasis(), # hide\n verbose=false, # hide\n lazy=true, # hide\n gradient=gradient, # hide\n callback=callback, # hide\n) # hide\n\n\nfor i in 1:num_pairs\n global L_estimate\n x = compute_extreme_point(lmo, randn(size(all_coeffs)))\n y = compute_extreme_point(lmo, randn(size(all_coeffs)))\n grad!(gradient, x)\n grad!(gradient_aux, y)\n new_L = norm(gradient - gradient_aux) / norm(x - y)\n if new_L > L_estimate\n L_estimate = new_L\n end\nend","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"We can now perform projected gradient descent:","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"xgd = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector)\ntraining_gd = Float64[]\ntest_gd = Float64[]\ncoeff_error = Float64[]\ntime_start = time_ns()\ngd_times = Float64[]\nfor iter in 1:max_iter\n global xgd\n grad!(gradient, xgd)\n xgd = projnorm1(xgd - gradient / L_estimate, lmo.right_hand_side)\n push!(training_gd, f(xgd))\n push!(test_gd, f_test(xgd))\n push!(coeff_error, coefficient_errors(xgd))\n push!(gd_times, (time_ns() - time_start) * 1e-9)\nend\n\nx00 = FrankWolfe.compute_extreme_point(lmo, random_initialization_vector)\nx0 = deepcopy(x00)\n\ntrajectory_lafw = []\ncallback = build_callback(trajectory_lafw)\nx_lafw, v, primal, dual_gap, _ = FrankWolfe.away_frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n lazy=true,\n gradient=gradient,\n callback=callback,\n)\n\ntrajectory_bcg = []\ncallback = build_callback(trajectory_bcg)\n\nx0 = deepcopy(x00)\nx_bcg, v, primal, dual_gap, _, _ = FrankWolfe.blended_conditional_gradient(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n weight_purge_threshold=1e-10,\n callback=callback,\n)\n\nx0 = deepcopy(x00)\n\ntrajectory_lafw_ref = []\ncallback = build_callback(trajectory_lafw_ref)\n_, _, primal_ref, _, _ = FrankWolfe.away_frank_wolfe(\n f,\n grad!,\n lmo,\n x0,\n max_iteration=2 * max_iter,\n line_search=FrankWolfe.Adaptive(L_est=L_estimate),\n print_iter=max_iter ÷ 10,\n memory_mode=FrankWolfe.InplaceEmphasis(),\n verbose=false,\n lazy=true,\n gradient=gradient,\n callback=callback,\n)\n\niteration_list = [\n [x[1] + 1 for x in trajectory_lafw],\n [x[1] + 1 for x in trajectory_bcg],\n collect(eachindex(training_gd)),\n]\ntime_list = [[x[5] for x in trajectory_lafw], [x[5] for x in trajectory_bcg], gd_times]\nprimal_list = [\n [x[2] - primal_ref for x in trajectory_lafw],\n [x[2] - primal_ref for x in trajectory_bcg],\n [x - primal_ref for x in training_gd],\n]\ntest_list = [[x[6] for x in trajectory_lafw], [x[6] for x in trajectory_bcg], test_gd]\nlabel = [L\"\\textrm{L-AFW}\", L\"\\textrm{BCG}\", L\"\\textrm{GD}\"]\ncoefficient_error_values =\n [[x[7] for x in trajectory_lafw], [x[7] for x in trajectory_bcg], coeff_error]\n\n\nplot_results(\n [primal_list, primal_list, test_list, test_list],\n [iteration_list, time_list, iteration_list, time_list],\n label,\n [L\"\\textrm{Iteration}\", L\"\\textrm{Time}\", L\"\\textrm{Iteration}\", L\"\\textrm{Time}\"],\n [L\"\\textrm{Primal Gap}\", L\"\\textrm{Primal Gap}\", L\"\\textrm{Test loss}\", L\"\\textrm{Test loss}\"],\n xscalelog=[:log, :identity, :log, :identity],\n legend_position=[:bottomleft, nothing, nothing, nothing],\n)","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"","category":"page"},{"location":"examples/docs_2_polynomial_regression/","page":"Polynomial Regression","title":"Polynomial Regression","text":"This page was generated using Literate.jl.","category":"page"}] }