-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking Scripts to tune the default algorithm choices #166
Comments
This is a continuation of the benchmarking I presented in #159. There, I presented the results of running the (RecursiveFactorization) pkg> status
Project RecursiveFactorization v0.2.11
Status `D:\peter\Documents\julia\dev\RecursiveFactorization\Project.toml`
[a93c6f00] DataFrames v1.3.4
[bdcacae8] LoopVectorization v0.12.120
[33e6dc65] MKL v0.5.0
[f517fe37] Polyester v0.6.13
[7792a7ef] StrideArraysCore v0.3.15
[d5829a12] TriangularSolve v0.1.12
[3d5dd08c] VectorizationBase v0.21.42
[112f6efa] VegaLite v2.6.0
[37e2e46d] LinearAlgebra
julia> versioninfo(verbose=true)
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
Microsoft Windows [Version 10.0.22000.795]
CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz:
speed user nice sys idle irq
#1 3000 MHz 2431453 0 19001781 516223109 13460718 ticks
#2 3000 MHz 5596203 0 3129484 528930375 187468 ticks
#3 3000 MHz 3831859 0 3086890 530737312 40078 ticks
#4 3000 MHz 3733187 0 1884296 532038578 28156 ticks
#5 3000 MHz 2444562 0 2263937 532947562 36484 ticks
#6 3000 MHz 2220062 0 1327734 534108265 28578 ticks
#7 3000 MHz 2176875 0 1391343 534087843 28593 ticks
#8 3000 MHz 2917562 0 1682937 533055546 53328 ticks
Memory: 31.85821533203125 GB (22317.58984375 MB free)
Uptime: 537656.0 sec
Load Avg: 0.0 0.0 0.0
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = runemacs.exe
CHOCOLATEYLASTPATHUPDATE = 132198172845121191
HOME = D:\peter\Documents
HOMEDRIVE = C:
HOMEPATH = \Users\peter
MIC_LD_LIBRARY_PATH = C:\Program Files (x86)\Common Files\Intel\Shared Libraries\compiler\lib\intel64_win_mic
PATH = C:\Program Files\ImageMagick-7.1.0-Q16-HDRI;C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64_win\mpirt;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64_win\compiler;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32_win\mpirt;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32_win\compiler;C:\ProgramData\Oracle\Java\javapath;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\mpirt;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32\mpirt;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\ia32\compiler;C:\Program Files (x86)\Common Files\Microsoft Shared\VSA\10.0\VsaEnv;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows Live;C:\Program Files\MiKTeX 2.9\miktex\bin\x64;C:\Windows\twain_32\MP830;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;c:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;C:\Program Files (x86)\Windows Live\Shared;C:\Program Files\gs\gs8.64\bin;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files\Calibre2\;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\;C:\Program Files\Microsoft SQL Server\110\Tools\Binn\;C:\Program Files (x86)\Git\cmd;C:\Program Files (x86)\Git\bin;C:\Program Files\TortoiseGit\bin;C:\ProgramData\chocolatey\bin;C:\Program Files\MATLAB\R2022a\bin;C:\Program Files (x86)\Calibre2\;C:\Program Files\Microsoft VS Code\bin;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Git\cmd;C:\Program Files\nodejs\;C:\Users\peter\AppData\Local\Microsoft\WindowsApps;c:\usr\local\bin;C:\Users\peter\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\;C:\Users\peter\AppData\Local\GitHubDesktop\bin;C:\Users\peter\AppData\Local\Pandoc\;C:\cygwin64\usr\i686-w64-mingw32\sys-root\mingw\lib;C:\Program Files (x86)\Aspell\bin;C:\Users\peter\AppData\Local\gitkraken\bin;C:\Users\peter\AppData\Roaming\npm;C:\Users\peter\AppData\Local\Microsoft\WindowsApps
PATHEXT = .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.JL;.CPL
PSMODULEPATH = D:\peter\Documents\WindowsPowerShell\Modules;C:\Program Files\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules First, the result of running the script after starting Julia with Next, the 8-core result after |
Ah, re what I said earlier about using RF with 1 vs multiple threads: OpenBLAS is dramatically faster with |
These days, should we ever go back to OpenBLAS? There's so many cases where it's just... bad. |
What do you get on your AMD machine? I'm guessing MKL is still much better there for For Apple silicon, we could use Accelerate. It does quite well with a single core: But that single core gets to use their single matrix multiplier. I should double check if Accelerate can benefit from multiple cores on the M1; maybe I just didn't set it. While it has only a single matrix multiplier, there's probably a lot of other things that can be done on the cores. This was with 4 cores on a mac mini. RF wins below 100x100, at least. @YingboMa and I should look into an algorithm better suited to threading. |
How do you make use of Accelerate? |
https://github.com/chriselrod/AppleAccelerateLinAlgWrapper.jl I also didn't bother wrapping anything other than what I wanted to test (matmul, lu, ldiv, and rdiv). |
We might as well add it to LinearSolve.jl if we can do it in a way that doesn't hit LibBLASTrampoline. |
Currently, a problem with it is that despite hiting LibBLASTrampoline, it doesn't actually replace any existing methods, so we still need to manually use But, yeah, we should probably just call accelerate directly to reduce the risk of something going wrong. |
Why does OpenBLAS perform so much worse on Windows than on Linux? |
Added. |
We should put together a benchmark script and have a bunch of people run it. It should just run LUFactorization, RFLUFactorization, and FastLUFactorization (and MKLFactorization when that exists).
It would be nice for this to have an option for what kind of matrix is generated as a function of some
N
, so for example it can be used to generate the matrices from the Brusselator equation for testing the sparse factorizations.The text was updated successfully, but these errors were encountered: