Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R2N, R2DH #153

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

MohamedLaghdafHABIBOULLAH
Copy link
Contributor

Add still allocating versions of R2N and R2DH.
Update LM to give the possibility to use R2DH as subsolver

Copy link
Member

@dpo dpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update LM in a separate PR. It's unrelated to R2N.

src/LM_alg.jl Outdated
@@ -244,6 +248,7 @@ function LM(
jtprod_residual!(nls, xk, Fk, ∇fk)

σmax = opnorm(Jk)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add redundant blank lines.

src/LM_alg.jl Outdated
@@ -252,6 +257,7 @@ function LM(
σk = σk * γ
end
νInv = (1 + θ) * (σmax^2 + σk) # ‖J'J + σₖ I‖ = ‖J‖² + σₖ

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not here either.

@MohamedLaghdafHABIBOULLAH MohamedLaghdafHABIBOULLAH force-pushed the R2N-R2DH branch 3 times, most recently from 4700798 to b6cd089 Compare September 17, 2024 05:43
@MohamedLaghdafHABIBOULLAH
Copy link
Contributor Author

@dpo here I add a test in line 194 to discuss the case if $$mk(s)$$ is big enough

@MohamedLaghdafHABIBOULLAH
Copy link
Contributor Author

@dpo il est à noter que je me suis permis de changer R2_alg.jl afin de pouvoir récupérer le temps pour chaque iter et tracer la courbe objective vs temps

@dpo
Copy link
Member

dpo commented Sep 17, 2024

@dpo il est à noter que je me suis permis de changer R2_alg.jl afin de pouvoir récupérer le temps pour chaque iter et tracer la courbe objective vs temps

@dpo dpo closed this Sep 17, 2024
@dpo dpo reopened this Sep 17, 2024
@MohamedLaghdafHABIBOULLAH MohamedLaghdafHABIBOULLAH force-pushed the R2N-R2DH branch 2 times, most recently from acaad65 to 652701d Compare September 17, 2024 19:29
@MohamedLaghdafHABIBOULLAH MohamedLaghdafHABIBOULLAH changed the title Add R2N, R2DH and update LM Add R2N, R2DH Sep 27, 2024
src/R2DH.jl Outdated

* `x0::AbstractVector`: an initial guess (in the first calling form: default = `nlp.meta.x0`)
* `selected::AbstractVector{<:Integer}`: (default `1:length(x0)`).
* `Bk`: initial diagonal Hessian approximation (default: `(one(R) / options.ν) * I`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cette interface ne prend pas ce kwarg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix.

src/R2DH.jl Outdated
D.d .= summation ? D.d .+ σk : D.d .* σk
DNorm = norm(D.d, Inf)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all duplicate blank lines.

src/R2DH.jl Outdated
Hobj_hist[k] = hk
Mmonotone > 0 && (FHobj_hist[mod(k-1, Mmonotone) + 1] = fk + hk)

D.d .= max.(D.d, eps(R))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this?

src/R2_alg.jl Outdated
@@ -21,6 +21,7 @@ mutable struct R2Solver{
Fobj_hist::Vector{R}
Hobj_hist::Vector{R}
Complex_hist::Vector{Int}
Time_hist::Vector{R}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On veut se débarrasser de Fobj_hist, etc. Les callbacks sont là pour ça. Il vaut mieux utiliser le callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok je vais ouvrir une issue pour ça

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t change R2 in this PR.

src/R2_alg.jl Outdated
@@ -21,6 +21,7 @@ mutable struct R2Solver{
Fobj_hist::Vector{R}
Hobj_hist::Vector{R}
Complex_hist::Vector{Int}
Time_hist::Vector{R}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t change R2 in this PR.

@@ -9,6 +9,7 @@ mutable struct ROSolverOptions{R}
maxIter::Int # maximum amount of inner iterations
maxTime::Float64 #maximum time allotted to the algorithm in s
σmin::R # minimum σk allowed for LM/R2 method
σk::R # initial σk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LM did not need this. R2N and LM should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to the old code, we need an initialization $$\sigma_k$$, similar to $$\Delta_k$$ as well as a different $$\sigma_{min}$$ for which there is not really a relationship between them.

src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated
min φ(s; xₖ) + ψ(s; xₖ)

where φ(s ; xₖ) = f(xₖ) + ∇f(xₖ)ᵀs + ½ sᵀ (σₖ+Dₖ) s is a quadratic approximation of f about xₖ,
ψ(s; xₖ) = h(xₖ + s), ‖⋅‖ is a user-defined norm, Dₖ is a diagonal Hessian approximation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no norm here.

src/R2DH.jl Outdated

* `x0::AbstractVector`: an initial guess (in the first calling form: default = `nlp.meta.x0`)
* `selected::AbstractVector{<:Integer}`: (default `1:length(x0)`).
* `Bk`: initial diagonal Hessian approximation (default: `(one(R) / options.ν) * I`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix.

src/R2DH.jl Outdated
∇f!(∇fk, xk)
∇fk⁻ = copy(∇fk)
spectral_test = isa(D, SpectralGradient)
# D.d .= D.d .+ σk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dead code.

src/R2DH.jl Outdated
end
mks = mk(s)

if mks < -1e5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is triggered when $m_{ks}$ is very small (approaching $-\infty$). In such cases, we do not need to compute $f(x_{kn})$, and we simply assume that the iteration is unsuccessful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand why we would need that. Firstly, -1e5 is completely arbitrary. And secondly, why would the iteration be unsuccessful if the model decrease is large?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this (page 7 of R2N)
Condition-rhok=0

So $$\rho_k = 0$$
Hence the iteration is automatically uncessuful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I agree that -1e5 might be somewhat arbitrary. Should we set it to -Inf instead?
We should also ensure that the ShiftedProximalOperators handle this appropriately when encountering $d[i] &lt; 0$.

My concern is that Julia might struggle with calculus involving -Inf and interpret it as NaN. However, this does not seem to be the case, except in scenarios like $\text{Inf} - \text{Inf}$ or $\frac{\text{Inf}}{\text{Inf}}$, which yield NaN. In the paper, the latter, which may be $\rho$, should be equal to 0 by convention, but Julia treats it as NaN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be -Inf, and ShiftedProximalOperators should return -Inf.

@MohamedLaghdafHABIBOULLAH
Copy link
Contributor Author

Thanks @dpo for the comments. I tried to incorporate and answer all your comments.
Now I need to add some unitests

src/R2DH.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated
end
mks = mk(s)

if mks < -1e5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be -Inf, and ShiftedProximalOperators should return -Inf.

src/R2N.jl Outdated

min f(x) + h(x)

where f: ℝⁿ → ℝ is C¹ and h: ℝⁿ → ℝ is lower semi-continuous and proper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and prox-bounded.

src/R2N.jl Show resolved Hide resolved
@MaxenceGollier
Copy link
Contributor

This PR has been open for a while now, can we merge it ? @MohamedLaghdafHABIBOULLAH @dpo
I will open a PR thereafter that will remove the allocations.

@dpo
Copy link
Member

dpo commented Dec 23, 2024

I would much prefer to review one PR instead of two. Could you please put your work together into a single PR? Maybe just close this one and open a new one?!

test/runtests.jl Outdated Show resolved Hide resolved
@dpo
Copy link
Member

dpo commented Jan 9, 2025

@MohamedLaghdafHABIBOULLAH Please rebase this branch against main.

Copy link
Member

@dpo dpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MohamedLaghdafHABIBOULLAH Let’s please finish this fast.

src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2N.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated Show resolved Hide resolved
src/R2DH.jl Outdated
ν = 1 / ((DNorm + σk) * (1 + θ))
@. mν∇fk = -ν * ∇fk
continue
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this bit for? If it is to increase σk until the prox returns a finite value, there should be a while. Otherwise, please remove it. It doesn’t appear in any other solver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is occurs when mks = -Inf: so we update DNorm and ν for spectral case.
Because there is a distinction between iprox in spectral (which is a prox) and other regularizers (e.g., PSB, Andrei), as rank and nuclear norm regularizers do not have an iprox.

src/R2DH.jl Outdated
if mks == -Inf
σk = σk * γ
Dkσk .= D.d .+ σk
DNorm = norm(D.d, Inf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DNorm = norm(D.d, Inf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason #153 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D has not changed here. I don’t see why you would recompute its norm.

@MaxenceGollier
Copy link
Contributor

@dpo, are we planning to merge this first in the end ?

I would much prefer to review one PR instead of two. Could you please put your work together into a single PR? Maybe just close this one and open a new one?!

Some of your comments are already taken care of in my version

What is this bit for? If it is to increase σk until the prox returns a finite value, there should be a while. Otherwise, please remove it. It doesn’t appear in any other solver.

Co-authored-by: Dominique <[email protected]>
precise when to debug

Co-authored-by: Dominique <[email protected]>
correct $\Delta_{mod}$

Co-authored-by: Dominique <[email protected]>
Add some unit tests for R2DH/R2N/R2N_R2DH
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
I will add (-Inf) condition in ShiftedProximalOperators as well
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Dominique <[email protected]>
Co-authored-by: Maxence Gollier <[email protected]>
Add a documentation for second calling form of R2DH
@MohamedLaghdafHABIBOULLAH
Copy link
Contributor Author

Thanks very much @dpo and @MaxenceGollier for the comments and commit suggestions. Incorporated all suggestions in this PR.

@dpo
Copy link
Member

dpo commented Jan 10, 2025

@MaxenceGollier What are the errors in R2DH?

# take first proximal gradient step s1 and see if current xk is nearly stationary
# s1 minimizes φ1(s) + ‖s‖² / 2 / ν + ψ(s) ⟺ s1 ∈ prox{νψ}(-ν∇φ1(0)).

subsolver_options.ν = 1 / νInv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, R2N should initialize subsolver_options.σk if the subsolver is R2DH.

R2DH should receive an initial σk and compute ν based on that (currently, it does it the other way around).


subsolver_options.ϵa = k == 1 ? 1.0e-3 : min(sqrt_ξ1_νInv ^ (1.5) , sqrt_ξ1_νInv * 1e-3)
verbose > 0 && @debug "setting inner stopping tolerance to" subsolver_options.optTol
subsolver_args = subsolver == R2DH ? (SpectralGradient(νInv, f.meta.nvar),) : ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpo what do you think about this line?

If I consider $$\sigma_k$$ as input to R2DH, should I let $$\nu$$ as the initialization of the diagonal Hessian approximation for the SpectralGradient as it is done here in TR in the case where TRDH is the sub solver https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl/blob/master/src/TR_alg.jl#L189

mk(d) = φ(d) + ψ(d)

if spectral_test
prox!(s, ψ, mν∇fk, ν)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t correspond to R2DH; I think this should use ν = 1 / (DNorm + σk).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paper, we say that R2DH is R2N where $$B_k$$ is diagonal, so for the coherence, I thought we should stick with $$\nu$$ of R2N which is ν = θ / (DNorm + σk) where θis close to one.

@MohamedLaghdafHABIBOULLAH
Copy link
Contributor Author

@dpo @MaxenceGollier, if there are no further comments, we can proceed with the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants