-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do not merge] Fix 246 - QR ortho fallback for LOBPCG #247
base: master
Are you sure you want to change the base?
Conversation
@mfherbst may you try this PR for your real application and see if it does the trick or not? With this, I am getting a 100% success rate for your toy problem. |
src/lobpcg.jl
Outdated
@views gram_view = ortho!.gramVBV[1:sizeX, 1:sizeX] | ||
@views if useview | ||
mul!(gram_view, adjoint(X[:, 1:sizeX]), BX[:, 1:sizeX]) | ||
else | ||
mul!(gram_view, adjoint(X), BX) | ||
end | ||
realdiag!(gram_view) | ||
cholf = cholesky!(Hermitian(gram_view)) | ||
cholf = cholesky!(Hermitian(gram_view + eps(T)*I)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will make it in-place.
This method seems to produce unexpected results elsewhere. I will implement the QR orthogonalization. |
Thanks for looking into this! This time I am travelling for the week 😄. I'll try to get to it asap. |
@mschauer if tests pass, please give this a second review. |
Codecov Report
@@ Coverage Diff @@
## master #247 +/- ##
==========================================
- Coverage 90.52% 89.29% -1.24%
==========================================
Files 17 17
Lines 1077 1093 +16
==========================================
+ Hits 975 976 +1
- Misses 102 117 +15
Continue to review full report at Codecov.
|
Can you say shortly in words what you are doing? |
I added some comments to the PR. In short, the role of orthogonalization (technically orthonormalization) in LOBPCG is to make sure that the matrix However, when |
@mohamed82008 I've done a little testing in our application (DFTK.jl). The solver now runs a lot more stable, but I still seem to get some spurious (zero) eigenvalues. I'll take another, closer look. |
@mohamed82008 For a possible future improvement consideration: columns in X in LOBPCG are often badly scaled, with norm ranging from 1e1 to 1e-16, which makes the matrix X' B X extremely ill-conditioned. The original LOBPCG implementation relies on the fact that many Cholesky codes (e.g., in LAPACK) are numerically invariant to scaling. so re-scaling of X has no effect. However, many home-made Cholesky implementations are not scale-invariant and fail - the most common cause of LOBPCG errors. In the latter case, it helps first to try making Q to be the same as X, but with normalized columns - which is cheaper than QR. If this still fails, then the QR, which is implemented in this PR, is needed. |
If you run LOBPCG with full internal output, printing the current approximate eigenvalues and residual norms, and post it here, I may be able to help with an advice. I have never seen LOBPCG getting spurious (zero) eigenvalues. |
@lobpcg thanks for your advice. I will try the column normalization method. The nice thing about this approach is that no additional matrix multiplications would be required sine If done right, this method will only add a tiny overhead since we can symmetrically normalize the tiny Gram matrix using |
@mfherbst When you say you get spurious zero eigenvalues, why are they spurious? If the residuals' norms |
If the Cholesky implementation is scale-invariant, re-scaling is just a waste, so you do not want to do it all the time. |
Julia uses LAPACK by default for the supported number types. Any funky, e.g. high precision, number type though will dispatch to a fallback implementation in pure Julia. Both seem scale-invariant. |
Since users can link to external libraries, I recommend adding the Cholesky scale-invariance check in the LOBPCG initialization - that would likely save you from most troubles in the future. We have not done it in https://bitbucket.org/joseroman/blopex/src/master/ to my recollection and I regret it. |
That strategy (cholesky then QR if cholesky fails) sounds reasonable, at least when B is relatively well-conditioned. https://arxiv.org/pdf/1704.07458.pdf recommends the SVQB algorithm, but that's probably more expensive, at least for a relatively large number of eigenpairs. Diving by R as you do seems dangerous, though. This might be the cause of @mfherbst 's zero eigenvalues. A possibility is to drop columns of R (possibly with pivoting in the QR factorization) to ensure good conditioning. Another alternative is to try to save the cholesky factorization by simply discarding everything after what went wrong. I played with that some time ago, but don't remember if it was a good idea or not... |
Spurious eigenvalues are impossible, if the output is separately verified for accuracy, which I believe is the case in this code. The present Julia implementation of LOBPCG, if I understand correctly, basically follows my https://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m probably without "restarts" where the P is simply dropped. Instabilities are practically rare and usually result from users doing something adventurous. https://arxiv.org/pdf/1704.07458.pdf aims at very large block sizes >400, which brings multiple difficulties, still not perfectly addressed. |
Yet this is the regime in which a number of codes work. In particular some applications in electronic structure need quite stringent tolerances, where it's important to get stability right. |
LOBPCG code gives an option to calculate, e.g., 1000, eigenstates in smaller blocks, say of size 100, in orthogonal complement to the previously found eigenstates. Not only this is more stable, but more importantly controls the cubic (with the block size) cost increase. Increasing the block size improves LOBPCG efficiency, but only to a certain point. |
It does cut the rayleigh-ritz costs, but not really the orthogonalizations. Also it's trickier in a massively parallel setting. LOBPCG, like pretty much all iterative eigensolvers, is intrinsically an unstable algorithm for what is ultimately a well-conditioned problem (ie assuming a reasonable eigenvalue separation and spectral radius, the output is well-defined and not susceptible to roundoff errors); any high-quality general-purpose implementation should implement countermeasures to remedy this (a library cannot rely on users not "doing something adventurous"). In this case, I'd be very much surprised if dividing by R didn't cause some trouble down the line. |
As requested by @lobpcg and @mohamed82008 I came up with another test case taken from our DFTK code as an example, see this gist. I just ran it on my machine using the implementation of this PR and got the following output:
The interesting step in the trace is 30 -> 31 where convergence worsens for one iteration only for the algorithm to terminate the following step. This behaviour is reproduced in each failing trace with the example problem. If I investigate the ritz values from the iteration states, it shows that in iteration 30 everything still looks good and on track for convergence, but in iteration 31, a spurious value appears at the lower end of the spectrum, which is then "converged" to a numerical zero in iteration 32. |
I amended the gist to allow for requesting a varying number of eigenvalues. For |
Ok I found the source of "instability". The problem is that I am performing inner orthogonalization of The solution to this is simple, detect the rank deficiency of the Gram matrix |
Note that it is possible to prove that While these may seem like limitations of the "basic" algorithm, assuming my understanding is correct, I think they are all completely avoidable using careful checks to prepare for the worst case when it happens. DFTK seems like a great test case for this algorithm, so thanks @mfherbst for sharing your pleasantly reproducible examples! |
@mohamed82008 how the spurious output has passed the tolerance check at the end? This is a strange crash, that I do not recollect normally seeing in my LOBPCG codes: I am curious to see how this email runs in my codes, e.g., MATLAB, but have no time to check, sorry. It appears that your code follows https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py
|
@mfherbst Thanks for documenting! This surely is an odd behavior. There is something funny going on. This is a strange crash, that I do not recollect normally seeing in my LOBPCG codes: I am curious to see how this email runs in my codes, e.g., MATLAB, but have no time to check, sorry. P.S. Just noticed #246 (comment) by @mfherbst, stating that our scipy version runs fine. But our scipy version of LOBPCG uses Cholesky, so there's nothing wrong with Cholesky and dividing by R in this example. It looks like there's just some bug in the Julia code, unrelated to Cholesky or this proposed QR fix. |
@lobpcg I don't have much time during the week to work on this. But I will try to give it some time this weekend, hopefully! Thanks @lobpcg for the links, I need to take a closer look when I get some time. |
@lobpcg your Matlab code is LGPL licensed. The code here is MIT licensed so I was careful not to adapt your Matlab version of the code for legal purposes. I didn't even read it in depth not to be influenced by it while writing my code. If you change that code's license to BSD like the Python version or to MIT like IterativeSolvers.jl then I can study and adapt it more closely. Alternatively, you can give me a license/permission here to look at it and adapt it in IterativeSolvers.jl. The email I get from Github will be my "license". |
FWIW I've seen this slow down convergence significantly (esp. in the high-accuracy regime), so it'd be better to avoid it if possible. I've found again convergence curves from a few years ago in ABINIT (which implemented a LOBPCG with restarts based on your matlab version). The first curve is with the default version, the second with additional orthogonalizations (I don't remember which ones, sorry) to avoid the restarts. (never mind the y axis, the residuals in abinit were squared for some reason). |
I have changed the license to MIT at https://github.com/lobpcg/blopex which includes the MATLAB file |
Of course. The main issue is that the decision there to remove P is based not on the actual Try/catch error, but on guessing using the condition numbers of the Gram matrices and a threshold, which may be overly conservative. I do not recollect doing additional orthogonalizations in ABINIT and unsure where your figures may come from, but this work was a decade ago... |
I added those (half a decade ago). Indeed I just checked, and restarts were based on estimating the conditioning. I haven't tried restarting based on failed factorizations, it'd be interesting to see if that fixes stability issues without degrading performance too much. |
@antoine-levitt sorry, until your last message, I did not realize that I was actually familiar with your ABINIT contribution of Chebyshev filter. I did not know that our original LOBPCG code in ABINIT was modified with extra orthogonalizations to improve stability. |
Let me ping an expert @joseeroman in case he has some advice |
@mohamed82008 please see scipy/scipy#10258 (comment) |
I did some arithmetic-count analysis of this a few years back, and IIRC while it doesn't change the complexity it does still improve the constant factor to use a smaller block size. Using a larger block size has other advantages for cache and parallel communications, but at the time and for my particular problem (this was 20 years ago), the optimal block size seemed to be ≈ 11, FWIW. |
Fixes #246 by implementing QR orthogonalization as a fallback when CholQR fails.
CC: @mfherbst, @haampie, @lobpcg