-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOBPCG stability: Suggestion of improvements #246
Comments
Note: The original scipy implementation does not suffer from this problem. For example from scipy.sparse.linalg import lobpcg
import numpy as np
lobpcg(np.diag(A), orth(np.random.randn(len(A), 5)), largest=False) returns the correct result each time I tried it. |
Seems like a duplicate of #223. |
I am traveling this week, but I will give it a look next week. Thanks for the issue. |
Thanks for looking into this. Indeed, this shows some similarity with #223. E.g. block size I can assure you, however, we have similarly frequent problems in our application code, where the matrices are much larger compared to the block size than in the shown example. I'll try to come up with a reproducible example for you. |
I came up with a larger example, that still illustrates the On my machine this has a success rate of about 97%. This is not good enough for our needs, since we need in the order of hundred, potentially even thousands of such eigensolves. |
Ok this is interesting. I managed to reproduce this with a specific seed. I will look into it. |
@mohamed82008 see |
It is definitely possible that it is a bug in my code, but given the number of passed test cases, I am very curious where I might have gone wrong. I can do a close comparison when I get some time. |
@mohamed82008 this may be a bug in the core LOBPCG algorithm after all, also found by @mfherbst in scipy, see scipy/scipy#10258 (comment) Let me think how to fix it easily... |
@mohamed82008 (related to scipy/scipy#10258) I added a julia version to the example repository mfherbst/issue_scipy_lobpcg_guess for convenience and out of curiosity. As it turns out and as suspected by @lobpcg, the julia implementation of LOBPCG also fails on this example, but in fact both with and without the guess vectors. |
The naive julia implementation referenced in scipy/scipy#10258 actually does pretty good in the example of mfherbst/issue_scipy_lobpcg_guess. Even converging down to 1e-14 tolerance in a reasonable number of iterations (some 250 without preconditioner and 100 with) is possible. |
@mfherbst I think full orthogonalization of the basis with QR is an almost sure-fire way to make LOBPCG as stable as the QR alg (there is a loop-hole when using constraints). The only problem is the complexity the QR approach introduces in the form of additional matrix-matrix multiplications e.g. https://gist.github.com/antoine-levitt/f8ac3637c5d5d778a448780172e63e28#file-lobpcg-jl-L30, which is why I suggested this as a final measure in #247 if nothing else works. The QR "fix" in #247 is actually a less extreme measure than full orthogonalization as it only does matrix-matrix multiplications of |
QR is also very bad for parallel execution - the actual reason I avoid it in the original LOBPCG algorithm, which is also implemented in many parallel libraries. |
I think the following approach has potential. Let
Note that the sizes of Similarly, if using QR, I think we need to make sure not to include any additional basis from The eigenvalue decomposition approach above avoids the need for additional matrix-matrix multiplications involving the matrices Comments? @lobpcg @antoine-levitt |
As soon as you discard information in this way, you will slow down convergence (essentially, downgrade to LOBPG, ie without the P) for precisions smaller than ~1e-8 (try it, you'll see), which are important for a number of applications. |
Well, if the whole |
Perhaps both options can be provided once the constraint issue for QR is figured out, not that I am volunteering to implement both! If this turns out to be too much work, I may not have the time to do it any time soon; this is a somewhat busy period for me. But let' see. |
The constrained case for QR seems simple to handle. We just need to check that any additional column of |
So it seems a similar method to the one proposed above was also proposed in https://arxiv.org/pdf/1704.07458.pdf with the name |
We discussed this paper already, e.g., #247 (comment) |
Dropping P for the rest of iterations is extreme (I do it in MATLAB under some conditions) and of course results in dramatic slow down, e.g., already discussed in #247 (comment) To fix the current test case, you need to drop P only on iterations where P is linear dependent, as I now do in https://github.com/lobpcg/blopex/blob/master/blopex_tools/matlab/lobpcg/lobpcg.m , but not yet in scipy. please see updated scipy/scipy#10258 (comment) |
Let me ping an expert @joseeroman in case he has some advice |
I have my doubts about this "fix" since it doesn't really change the overall basis. I would be very surprised if it actually fixed all of @mfherbst 's test cases. |
I have updated my comment: to fix the current test case, you need to drop P on iterations where P is linear dependent, as I now do in https://github.com/lobpcg/blopex/blob/master/blopex_tools/matlab/lobpcg/lobpcg.m , but not yet in scipy. please see scipy/scipy#10258 (comment) |
When I was implementing a related algorithm (a cruder predecessor to LOPCG, since at that time I didn't realize you could perform the line minimizations by solving a Ritz problem) many years ago (https://www.osapublishing.org/oe/fulltext.cfm?uri=oe-8-3-173&id=63584), I found that it was sufficient to re-orthogonalize occasionally (not every step). Maybe you could estimate the condition number from the Cholesky factors (which can be done cheaply, I think?) in order to decide whether to re-orthogonalize? In that work I used the polar decomposition A=QP. This can be computed relatively straightforwardly in parallel since |
Oh wow that's an old thread. Since then, we ended up writing our own implementation in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl. It uses a number of tricks for maximal stability, and we haven't been able to make it break yet (and not for lack of trying :-)), and keeps full convergence rate even when converging to close to machine precision, which I have never been able to do with any other LOBPCG implementation. It also fixes a very tricky issue whereby locking degraded convergence, which took me a good while to figure out; I think no other implementation has that fix. I wanted to add some other tricks (like avoid doing a full diagonalization of X'AX using perturbation theory when close to convergence, which gives quite a nice speedup in certain circumstances) and possibly make a paper out of it, but other things got in the way. If somebody is interested in picking this up again I'd be happy to share. The choice there was to only use Cholesky for orthogonalization, because it's the fastest in a parallel setting (it's like the polar decomposition, but choleskys are faster than square roots). It's also very unstable so we do two things: 1) we add something to the diagonal when the cholesky fails 2) we re-orthogonalize when needed, indeed using the condition number of the Cholesky factors as you suggest @stevengj (see https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl#L77) |
Would it be useful for others to bring your implementation into this package? |
As I said on the discourse thread our implementation at its current stage is not complete drop-in replacement for the lobpcg of this package (e.g. generalised eigensolves are not tested very thoroughly, only smallest eigenvalues implemented etc). So to get it to fully replace the implementation that exists would be a bit of work. Coexistence might be a little easier short-term I suppose. Other than that it would lower the burden of maintaining it on our side, so I would not mind helping to get it integrated elsewhere 😄. What do you think @antoine-levitt? |
Since my last comment in this thread, I have updated a year ago |
I already put it up at https://github.com/antoine-levitt/LOBPCG last year. It's the version that DFTK's one is based on. Anybody is welcome to do what they want with it. It's definitely more stable than the one in IterativeSolvers, and therefore I would recommend it for somebody to use for their first attempt. Generalized eigenvalue problems are OK (definitely more stable than the implementation in IterativeSolvers) as long as B is not horribly badly conditioned; there's an alternative tweak that doesn't assume B to be relatively well conditioned but it needs additional B multiplications for stability so I didn't put it in by default. I did not benchmark extensively outside of DFTK's operating point so I can't say for sure that the implementation is unambiguously better than the one in IterativeSolvers regarding performance: there might be more matmuls or memory usage in our version; to be checked. LOBPCG, as most Krylov methods, is a very delicate algorithm that is fundamentally unstable, and implementations have to find a fine line between stability and efficiency, so I think it's fine to have several implementations coexist. If someone does the work of comparing implementations and finds that one (possibly after tweaks) is unambiguously better than the others, then it might make sense to just have one, or possibly two pareto-optimal implementations, one more geared towards stability and one more towards performance. I can help but I don't have time to actually do that non-trivial amount of work. |
Let's get a GSoC on this. |
If there is a write-up somewhere describing the algorithm to be implemented, it would make it easier for the student to do it. Otherwise we would need a more mathematically adept student than the average GSoC to make sense of the arguments and scripts posted here. If a paper can come out of it, perhaps a graduate student might also be interested to volunteer and do all the comparisons and benchmarks. |
The problem is that there are several papers describing different competing tricks and correspondingly several different actual implementations of LOBPCG in various packages, see https://en.wikipedia.org/wiki/LOBPCG . In exact arithmetic, different implementations should result in exactly same results, although with quite different computational costs and various memory requirements. In double precision, and especially single precision, implementations vary greatly in numerical stability and may just fail depending on the input. There is no universally accepted "best" implementation. Arguably, the Python version https://github.com/scipy/scipy/blob/v1.5.4/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py is most used and tested by various users. If you just want to modify the existing Julia version to literally follow every detail of this Python version, it is a purely coding exercise, easy for a CS student to do, just basically translating Python into Julia. If the goal is first to determine the "best" version for implementation in Julia, it is a difficult, possibly open-ended, PhD-level math/CS research project. |
Either way, we can put it on the GSoC project list and depending on the level of the student, we can do good old scope creep. |
Agreed with Andrew here: the pure coding exercise is not very interesting, as one can just use PyCall to use scipy's version from julia. The interesting part requires serious numerical skills. |
A GSoC student for this would be pretty cool. Let me know if there is something I can do to help when it gets to it! |
I think the main thing to do is to list the project here and add yourself as a mentor (with any other potential co-mentors): https://julialang.org/jsoc/projects/ I think it is soon going to be time that students start visiting this list, so best to put it up immediately. |
We currently employ the
lobpcg
solver from this Julia package in our DFTK.jl electronic structure theory code. In our observations, the current Cholesky-QR-based LOBPCG implementation can become numerically unstable and sometimes even produce spurious eigenvalues.As an illustrative example, run the following julia code
In my experiments with this simple test problem, it fails about each 2nd time with a
PosDefException
from the Cholesky factorization. In the remaining cases, it returns two approximations to the zero eigenvalue atres.λ[1]
andres.λ[2]
. In all such instances I further observed the first two eigenvectors to be non-orthogonal. That is to say, thatreturns a numerically non-zero answer, order of magnitude
0.0001
.A couple of strategies to improve the numerical stability of LOBPCG have been discussed in the literature, e.g. in https://arxiv.org/abs/1704.07458. As far as I can judge from reading through the code, the suggested basis truncation and selection strategies are not yet present and it might be advantageous to take a look at implementing them.
The text was updated successfully, but these errors were encountered: