Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIN_DIAGONAL and MAX_DIAGONAL constants depend on overall scaling ("units") #39

Closed
stevengj opened this issue Nov 8, 2024 · 1 comment

Comments

@stevengj
Copy link

stevengj commented Nov 8, 2024

One basic invariant that it is good to maintain if possible is that the algorithm should be insensitive to the overall scaling ("units") of the problem. e.g. if my variables x are in meters, and I change them to µm, which will scale the Jacobian matrix J by 1e6, the algorithm should take the same steps (scaled by 1e6). Unfortunately, the current implementation violates this assumption.

In particular, the "magic" constants MIN_DIAGONAL = 1e-6 and MAX_DIAGONAL = 1e32, which are used to limit the damping/regularization factor:

colsumabs2!(dtd, J)
clamp!(dtd, MIN_DIAGONAL, MAX_DIAGONAL)
rmul!(dtd, 1/Δ)

don't scale with the units. This is falls into a common floating-point fallacy, confusing floating-point arithmetic with fixed-point arithmetic — it assumes that numbers are scaled to be of order unity, so that 1e-6 is "small" and 1e32 is "large" in an "absolute" sense, which makes no sense since floating-point numbers have 15 digits of precision over 600 orders of magnitude.

Instead, you have to ask whether the diagonal elements are small or large compared to what. In this case, I think that a natural comparison would be to something related to norm(J)^2 == sum(colsumabs2!(dtd, J)), which sets the overall scale of the entries here and the matrix $J^T J$ that you are regularizing. In particular, a natural choice would be norm(J)^2 / n, which is simply the mean of the dtd entries before clamping. So maybe just change clamp!(dtd, MIN_DIAGONAL, MAX_DIAGONAL) to:

dtd_mean = sum(dtd) / length(dtd)
clamp!(dtd, MIN_DIAGONAL * dtd_mean, MAX_DIAGONAL * dtd_mean)

which would make the algorithm insensitive to overall scaling unless I'm missing something (your other "magic constants" seem to be dimensionless).

cc @mochen4

@matthieugomez
Copy link
Owner

matthieugomez commented Nov 8, 2024

Thanks a lot for the report! This is fixed by #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants