-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nc::linalg::lstsq very slow #85
Comments
hello, please feel free to scavenge things there, might help you to get rid of bottlenecks. I suspect a too low tol settings https://github.com/moe123/macadam/tree/master/macadam/details/numa |
@dpilger26 not really on that for now, however, parameter description is missing, I should add that. Check your implementation first. You might want to converge with a too low value; or too high expectations, would explain why reduction is very slow and somehow goes to a very high number of iterations for nothing. BTW, by any standard a 8905 x 316 matrix is something very big + beyond the dimension fact; the very nature of data must be known, is it well conditioned?; the OP should reconsider his mathematical approach to solve his problem; maybe not using the right tools here. |
Numpy lstsq function is faster . BenchMark details lstsq of A (201,200) and B (201 , 1) in numpy python it takes 6 seconds and in Numcpp it takes about 98 seconds. |
@Aditya-11 does not mean anything the two underlaying implementations differ by a lot; what are your calling arguments ?especially |
Thanks @moe123 , I have to check this inTolerance parameter. Compiler clang used . If preconditioning is bad then numpy answer should also come slower I guess . |
@Aditya-11 no numpy implementation differs by a lot (it is a ~20 years old project, so older than you), it also does take account of rcond + select different strategies for decomps + applies residual decimation steps and so on. Preconditioning is never good or bad; numerical analysis is not about pseudo-moralistic casts; but facts; it is what it is contextually; so your guess is as just as good as your previous rant a million thumb-up won't change facts, so whatever, thus, here, we are evidently talking about the logical relationship between a tolerance threshold and the nature of input data for a given algorithm using a SVD solver. The implied tolerance should not exceed something around ~ 1E-7 in the most optimistic scenario within the context of this present implementation; what are your flags ? loop-vectorize ? size. + You all seem so smart that it is too much to ask to upload your actual benchmarks ; until now, the discussion is all about entitled B.S of unverifiable assertions. |
@moe123 , I have changed the argument inTolerance to 1e-6 , still it is much slower compared to python implementation . Benchmark details : n = 200 , a_1 (200 * 200) , b_1 (200*1) . in c++ : time : 100-160s python : time : 6-8 s Any other stuff to be changed , I had previously implemented the gmres algorithm in python using numpy and then converted to c++ using Numcpp and it was slower .Therefore I thought that lstsq might be slow . |
@Aditya-11 do you know what's the definition of science? the very definition is a set of methods that given to one can be reproduced by another *. What's your Valgrind output or similar; where does cpu-time happen? (*) so would you give the content of your data that an apple-to-apple comparison can take place? same remark goes for the OP, I don't know what you are smoking and in which planet do you live on; but, that's far away from here and now; if you would kindly subscribe to reality there would be a tremendous improvement. To address an issue, you need to understand it; meaning literally circumventing it; this is true with anything happening in this world; however, it requires embracing the idea of making an effort; we are not yet there. -1 Does the fact that numpy runs faster is of any help ? no that's a given reference which we would like very much to reproduce or approach. -2 Does the complexity of this issue resides on telling one is faster and the other one slower, over and over again; and by such, not exposing the entire problem properly and openly giving it a context that can be shared and anyone can reproduce? no, the complexity will reside when this simple basic step is achieved and resolved; (for now, you failed at it, same goes for the OP); what are the set of rational options we have for improving the actual algorithm. |
Hi @Aditya-11 can you provide the flags used in CMake, it might be something familiar with compilation of the library/executable itself, threading etc. Both numpy and this package relies on lapack which as others mentioned is quite old an stable |
Describe the bug
It takes forever to do SVD::decompose()
To Reproduce
Run linalg.lstsq with big NdArray, my was 8905 rows and 316 cols
Expected behavior
It must be much more faster
The text was updated successfully, but these errors were encountered: