You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One basic invariant that it is good to maintain if possible is that the algorithm should be insensitive to the overall scaling ("units") of the problem. e.g. if my variables x are in meters, and I change them to µm, which will scale the Jacobian matrix J by 1e6, the algorithm should take the same steps (scaled by 1e6). Unfortunately, the current implementation violates this assumption.
In particular, the "magic" constants MIN_DIAGONAL = 1e-6 and MAX_DIAGONAL = 1e32, which are used to limit the damping/regularization factor:
don't scale with the units. This is falls into a common floating-point fallacy, confusing floating-point arithmetic with fixed-point arithmetic — it assumes that numbers are scaled to be of order unity, so that 1e-6 is "small" and 1e32 is "large" in an "absolute" sense, which makes no sense since floating-point numbers have 15 digits of precision over 600 orders of magnitude.
Instead, you have to ask whether the diagonal elements are small or large compared to what. In this case, I think that a natural comparison would be to something related to norm(J)^2 == sum(colsumabs2!(dtd, J)), which sets the overall scale of the entries here and the matrix $J^T J$ that you are regularizing. In particular, a natural choice would be norm(J)^2 / n, which is simply the mean of the dtd entries before clamping. So maybe just change clamp!(dtd, MIN_DIAGONAL, MAX_DIAGONAL) to:
One basic invariant that it is good to maintain if possible is that the algorithm should be insensitive to the overall scaling ("units") of the problem. e.g. if my variables
x
are in meters, and I change them to µm, which will scale the Jacobian matrix J by1e6
, the algorithm should take the same steps (scaled by1e6
). Unfortunately, the current implementation violates this assumption.In particular, the "magic" constants
MIN_DIAGONAL = 1e-6
andMAX_DIAGONAL = 1e32
, which are used to limit the damping/regularization factor:LeastSquaresOptim.jl/src/optimizer/levenberg_marquardt.jl
Lines 87 to 89 in 2aea700
don't scale with the units. This is falls into a common floating-point fallacy, confusing floating-point arithmetic with fixed-point arithmetic — it assumes that numbers are scaled to be of order unity, so that
1e-6
is "small" and1e32
is "large" in an "absolute" sense, which makes no sense since floating-point numbers have 15 digits of precision over 600 orders of magnitude.Instead, you have to ask whether the diagonal elements are small or large compared to what. In this case, I think that a natural comparison would be to something related to$J^T J$ that you are regularizing. In particular, a natural choice would be
norm(J)^2 == sum(colsumabs2!(dtd, J))
, which sets the overall scale of the entries here and the matrixnorm(J)^2 / n
, which is simply the mean of thedtd
entries before clamping. So maybe just changeclamp!(dtd, MIN_DIAGONAL, MAX_DIAGONAL)
to:which would make the algorithm insensitive to overall scaling unless I'm missing something (your other "magic constants" seem to be dimensionless).
cc @mochen4
The text was updated successfully, but these errors were encountered: