FastALS method #39

alexmul1114 · 2024-02-23T19:37:06Z

Continued from #38.

…on from Faster MTTKRP PR.

codecov · 2024-02-23T19:41:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (845ad27) to head (6c68d82).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master       #39   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        13    +1     
  Lines          262       336   +74     
=========================================
+ Hits           262       336   +74

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexmul1114 · 2024-02-23T20:07:08Z

Looks like there is probably minimal difference between using hcat and stack:

Benchmark Report for `GCPDecompositions`

Job Properties

Time of benchmarks:
- Target: 23 Feb 2024 - 14:59
- Baseline: 23 Feb 2024 - 15:04
Package commits:
- Target: cf7fa6
- Baseline: 47a06e
Julia commits:
- Target: 312098
- Baseline: 312098
Julia command flags:
- Target: None
- Baseline: None
Environment variables:
- Target: GCP_BENCHMARK_SUITES => leastsquares
- Baseline: GCP_BENCHMARK_SUITES => leastsquares

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=1"]`	1.28 (5%) ❌	1.00 (1%)
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=10"]`	1.05 (5%) ❌	1.00 (1%)
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=50"]`	1.13 (5%) ❌	0.99 (1%)
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=1"]`	1.24 (5%) ❌	1.01 (1%)
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=10"]`	0.73 (5%) ✅	1.00 (1%)
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=50"]`	0.87 (5%) ✅	1.00 (1%)
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=1"]`	1.75 (5%) ❌	1.00 (1%)
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=10"]`	0.67 (5%) ✅	1.00 (1%)
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=50"]`	1.75 (5%) ❌	1.00 (1%)
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=1"]`	1.18 (5%) ❌	1.00 (1%)
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=10"]`	1.06 (5%) ❌	1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["leastsquares"]

Julia versioninfo

Target

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz   10622009            0      5760461    1393113133       190103  ticks
  Memory: 31.726390838623047 GB (14341.58984375 MB free)
  Uptime: 182098.406 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

Baseline

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz   10882853            0      5871399    1397152068       196039  ticks
  Memory: 31.726390838623047 GB (14281.48828125 MB free)
  Uptime: 182374.078 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

alexmul1114 · 2024-03-14T12:32:49Z

Benchmark Report for first implementation of FastALS:

Benchmark Report for `GCPDecompositions`

Job Properties

Time of benchmarks:
- Target: 13 Mar 2024 - 19:45
- Baseline: 13 Mar 2024 - 19:50
Package commits:
- Target: f5182f
- Baseline: 7cb16a
Julia commits:
- Target: 312098
- Baseline: 312098
Julia command flags:
- Target: None
- Baseline: None
Environment variables:
- Target: GCP_BENCHMARK_SUITES => leastsquares
- Baseline: GCP_BENCHMARK_SUITES => leastsquares

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=1"]`	0.46 (5%) ✅	1.87 (1%) ❌
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=10"]`	0.41 (5%) ✅	2.39 (1%) ❌
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=50"]`	1.04 (5%)	1.40 (1%) ❌
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=1"]`	0.32 (5%) ✅	2.66 (1%) ❌
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=10"]`	0.22 (5%) ✅	3.26 (1%) ❌
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=50"]`	0.20 (5%) ✅	2.99 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=1"]`	0.35 (5%) ✅	1.23 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=10"]`	0.26 (5%) ✅	6.05 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=50"]`	0.22 (5%) ✅	5.69 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=1"]`	0.64 (5%) ✅	2.11 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=10"]`	0.34 (5%) ✅	2.31 (1%) ❌
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=50"]`	0.28 (5%) ✅	2.13 (1%) ❌
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=1"]`	2.00 (5%) ❌	3.66 (1%) ❌
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=10"]`	1.42 (5%) ❌	7.84 (1%) ❌
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=50"]`	1.14 (5%) ❌	3.18 (1%) ❌
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=1"]`	1.30 (5%) ❌	8.61 (1%) ❌
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=10"]`	1.30 (5%) ❌	23.17 (1%) ❌
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=50"]`	1.05 (5%) ❌	8.76 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=1"]`	0.88 (5%) ✅	23.22 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=10"]`	0.87 (5%) ✅	59.28 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=50"]`	1.00 (5%)	24.93 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["leastsquares"]

Julia versioninfo

Target

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  113738462            0     85039194    14930681228      2317212  ticks
  Memory: 31.726390838623047 GB (9885.97265625 MB free)
  Uptime: 1.837269765e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

Baseline

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  114113166            0     85186915    14934994760      2324227  ticks
  Memory: 31.726390838623047 GB (10180.65234375 MB free)
  Uptime: 1.837572e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

alexmul1114 · 2024-03-14T12:33:58Z

Added use of buffers for saved data between modes, here is the comparison against the previous FastALS implementation:

Benchmark Report for `GCPDecompositions`

Job Properties

Time of benchmarks:
- Target: 14 Mar 2024 - 08:29
- Baseline: 14 Mar 2024 - 08:32
Package commits:
- Target: a6bd46
- Baseline: f5182f
Julia commits:
- Target: 312098
- Baseline: 312098
Julia command flags:
- Target: None
- Baseline: None
Environment variables:
- Target: GCP_BENCHMARK_SUITES => leastsquares
- Baseline: GCP_BENCHMARK_SUITES => leastsquares

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=1"]`	0.85 (5%) ✅	0.58 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=10"]`	1.09 (5%) ❌	0.51 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=50"]`	1.18 (5%) ❌	0.58 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=1"]`	0.99 (5%)	0.40 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=10"]`	1.09 (5%) ❌	0.38 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=50"]`	0.91 (5%) ✅	0.41 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=1"]`	1.32 (5%) ❌	0.68 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=10"]`	0.94 (5%) ✅	0.36 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=50"]`	0.88 (5%) ✅	0.37 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=1"]`	0.64 (5%) ✅	0.50 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=10"]`	0.58 (5%) ✅	0.47 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=50"]`	1.16 (5%) ❌	0.50 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=1"]`	0.86 (5%) ✅	0.59 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=10"]`	0.80 (5%) ✅	0.46 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=50"]`	1.01 (5%)	0.57 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=1"]`	0.77 (5%) ✅	0.45 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=10"]`	0.88 (5%) ✅	0.39 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=50"]`	0.90 (5%) ✅	0.43 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=1"]`	0.78 (5%) ✅	0.38 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=10"]`	0.80 (5%) ✅	0.36 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=50"]`	0.90 (5%) ✅	0.37 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["leastsquares"]

Julia versioninfo

Target

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  115790899            0     86968679    15205469148      2397432  ticks
  Memory: 31.726390838623047 GB (9448.2734375 MB free)
  Uptime: 1.883099078e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

Baseline

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  115897541            0     87008212    15207638227      2398041  ticks
  Memory: 31.726390838623047 GB (9670.80078125 MB free)
  Uptime: 1.883243781e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

alexmul1114 · 2024-03-15T00:39:28Z

Benchmark Report after adding buffers for all of the khatri-rao products. Getting a good speedup and memory improvement for the order 4 and 5 cases, but still using a little extra memory for the order 3 cases.

Benchmark Report for `GCPDecompositions`

Job Properties

Time of benchmarks:
- Target: 14 Mar 2024 - 20:31
- Baseline: 14 Mar 2024 - 20:34
Package commits:
- Target: 18f857
- Baseline: 7cb16a
Julia commits:
- Target: 312098
- Baseline: 312098
Julia command flags:
- Target: None
- Baseline: None
Environment variables:
- Target: GCP_BENCHMARK_SUITES => leastsquares
- Baseline: GCP_BENCHMARK_SUITES => leastsquares

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=1"]`	0.38 (5%) ✅	0.38 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=10"]`	0.28 (5%) ✅	0.33 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=50"]`	0.11 (5%) ✅	0.32 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=1"]`	0.32 (5%) ✅	0.18 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=10"]`	0.17 (5%) ✅	0.16 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=50"]`	0.13 (5%) ✅	0.28 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=1"]`	0.39 (5%) ✅	0.04 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=10"]`	0.25 (5%) ✅	0.19 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=50"]`	0.22 (5%) ✅	0.26 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=1"]`	0.33 (5%) ✅	0.17 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=10"]`	0.18 (5%) ✅	0.12 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=50"]`	0.17 (5%) ✅	0.25 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=1"]`	0.82 (5%) ✅	0.94 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=10"]`	0.97 (5%)	1.26 (1%) ❌
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=50"]`	0.97 (5%)	1.10 (1%) ❌
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=1"]`	0.58 (5%) ✅	0.99 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=10"]`	0.70 (5%) ✅	1.41 (1%) ❌
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=50"]`	0.94 (5%) ✅	1.16 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=1"]`	0.46 (5%) ✅	1.06 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=10"]`	0.47 (5%) ✅	1.50 (1%) ❌
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=50"]`	0.73 (5%) ✅	1.22 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["leastsquares"]

Julia versioninfo

Target

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  118963244            0     89680258    15625943959      2490480  ticks
  Memory: 31.726390838623047 GB (10910.5078125 MB free)
  Uptime: 1.926389953e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

Baseline

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  119260867            0     89742947    15629176119      2491945  ticks
  Memory: 31.726390838623047 GB (10286.14453125 MB free)
  Uptime: 1.926614484e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

alexmul1114 · 2024-03-15T12:51:28Z

Modified the helper function to use views more, now using less memory for all cases. When I test the one case that regresses in runtime individually, it is faster with FastALS, so that is likely experimental noise.

Benchmark Report for `GCPDecompositions`

Job Properties

Time of benchmarks:
- Target: 15 Mar 2024 - 08:45
- Baseline: 15 Mar 2024 - 08:50
Package commits:
- Target: 6a163e
- Baseline: 7cb16a
Julia commits:
- Target: 312098
- Baseline: 312098
Julia command flags:
- Target: None
- Baseline: None
Environment variables:
- Target: GCP_BENCHMARK_SUITES => leastsquares
- Baseline: GCP_BENCHMARK_SUITES => leastsquares

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=1"]`	0.41 (5%) ✅	0.32 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=10"]`	0.29 (5%) ✅	0.23 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30), rank(X)=50"]`	0.89 (5%) ✅	0.47 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=1"]`	0.32 (5%) ✅	0.14 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=10"]`	0.31 (5%) ✅	0.10 (1%) ✅
`["leastsquares", "least-squares-size(X)=(15, 20, 25, 30, 35), rank(X)=50"]`	0.35 (5%) ✅	0.23 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=1"]`	0.40 (5%) ✅	0.37 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=10"]`	0.22 (5%) ✅	0.11 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 30, 30, 30, 30), rank(X)=50"]`	0.30 (5%) ✅	0.19 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=1"]`	0.48 (5%) ✅	0.13 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=10"]`	0.25 (5%) ✅	0.07 (1%) ✅
`["leastsquares", "least-squares-size(X)=(30, 40, 50, 60), rank(X)=50"]`	0.47 (5%) ✅	0.21 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=1"]`	0.78 (5%) ✅	0.84 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=10"]`	0.85 (5%) ✅	0.93 (1%) ✅
`["leastsquares", "size(X)=(15, 20, 25), rank(X)=50"]`	0.87 (5%) ✅	0.99 (1%)
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=1"]`	0.57 (5%) ✅	0.83 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=10"]`	0.70 (5%) ✅	0.89 (1%) ✅
`["leastsquares", "size(X)=(30, 40, 50), rank(X)=50"]`	0.74 (5%) ✅	0.98 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=1"]`	0.42 (5%) ✅	0.81 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=10"]`	1.24 (5%) ❌	0.84 (1%) ✅
`["leastsquares", "size(X)=(60, 70, 80), rank(X)=50"]`	0.70 (5%) ✅	0.95 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["leastsquares"]

Julia versioninfo

Target

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  120025868            0     90644744    15910842369      2513869  ticks
  Memory: 31.726390838623047 GB (10260.5546875 MB free)
  Uptime: 1.97043814e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

Baseline

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
      Microsoft Windows [Version 10.0.22621.3155]
  CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz: 
                 speed         user         nice          sys         idle          irq
       #1-16  2304 MHz  120422368            0     90844071    15915169694      2521088  ticks
  Memory: 31.726390838623047 GB (10327.34765625 MB free)
  Uptime: 1.970745843e6 sec
  Load Avg:  0.0  0.0  0.0
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

dahong67

Looking good! Please go through the requested changes.

Now that we have an efficient and working implementation (yay!), let's work on simplifying it. Will message you on Slack to discuss.

benchmark/suites/leastsquares.jl

src/gcp-algorithms.jl

src/gcp-algorithms/fastals.jl

Co-authored-by: David Hong <[email protected]>

…ht func, reduce number of args

dahong67 · 2024-03-20T15:36:18Z

Looking good! Merging. 🎊

alexmul1114 added 29 commits February 14, 2024 10:21

Addresses dahong67#17, start with latest MTTKRP and Khatri-Rao functi…

e183ca7

…on from Faster MTTKRP PR.

Rough implementation with ALS

1afffa6

Change helper function to modify U in place

d18df82

Add mttkrps benchmarks, old implementaiton to benchmark against

9c7a693

Add mttkrps_testing function for benchmarking

42a4274

Add initializations

c089078

Fix typo

36d68c1

new mttkrps for testing

aca4396

Refactor test function into benchmarks, old implementation

490e2d6

Fix typos

ac13227

new mttkrps for testing

5668bf3

Fix typos (old implementation)

58780f9

Fix typose (new)

4f9fa20

Refactor benchmark (baseline)

daf9cfd

Uncomment new function

3100f18

Fix errors in benchmark (baseline)

1a7133f

Fix benchmark errors (baseline)

4348895

Fix benchmark errors (baseilne)

5424644

Refactor benchmarks (old)

990946c

Change benchmarks (old)

6d11656

Fix typos (old)

dfd0601

Fix typos (old)

ce58417

Add more benchmarks (old)

0432e13

Ready for benchmarking (new)

d1f64dc

Add fast ALS algorithm, make it default for least squares

f067535

Add least squares benchmark suite (baseline)

fc47162

Add least squares benchmark suite (FastALS)

d510a4f

Replace slices with selectdim

c4b8e37

Use view instead of selectdim

47a06e7

dahong67 and others added 11 commits March 6, 2024 12:33

Merge branch 'master' into pr/alexmul1114/39

aee50f5

Merge branch 'master' into FastALS-Method

e7c3175

Change deafult ls algorithm to ALS for benchmark baseline

a67f72c

Make FastALS default for benchmark target

50769fc

First working FastALS implementation (baseline)

1b6844d

First working FastALS implementation (target)

5521d75

Update leastsquares benchmark (baseline)

7cb16ac

Update leastsquares benchmark (target)

f5182f2

Use buffers for saved data between modes

a17afb0

Finish buffers

900bbed

Fix error in n<n_star case

a6bd460

Add buffers

18f8573

Use views in helper function

6a163e3

dahong67 force-pushed the FastALS-Method branch from dff97b0 to 6a163e3 Compare March 15, 2024 20:11

dahong67 added 2 commits March 15, 2024 16:13

Use benchmark/Manifest.toml from master

ada19ee

Run JuliaFormatter

0344998

dahong67 reviewed Mar 15, 2024

View reviewed changes

alexmul1114 and others added 6 commits March 16, 2024 16:10

Apply suggestions from code review

ea71633

Co-authored-by: David Hong <[email protected]>

Remove mttkrps benchmarks from this PR

199157f

Factor out outer kr products helper func into right-left and left-rig…

18c9a46

…ht func, reduce number of args

Add 5-way LS test case for code coverage

c07b942

Remove one of old ALS tests, add another 5-way test

c3e9252

Run formatter

6c68d82

dahong67 merged commit 24886d8 into dahong67:master Mar 20, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastALS method #39

FastALS method #39

alexmul1114 commented Feb 23, 2024

codecov bot commented Feb 23, 2024 •

edited

Loading

alexmul1114 commented Feb 23, 2024

Target

Baseline

alexmul1114 commented Mar 14, 2024

Target

Baseline

alexmul1114 commented Mar 14, 2024

Target

Baseline

alexmul1114 commented Mar 15, 2024

Target

Baseline

alexmul1114 commented Mar 15, 2024 •

edited

Loading

Target

Baseline

dahong67 left a comment •

edited

Loading

dahong67 commented Mar 20, 2024

FastALS method #39

FastALS method #39

Conversation

alexmul1114 commented Feb 23, 2024

codecov bot commented Feb 23, 2024 • edited Loading

Codecov Report

alexmul1114 commented Feb 23, 2024

Benchmark Report for GCPDecompositions

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

alexmul1114 commented Mar 14, 2024

Benchmark Report for GCPDecompositions

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

alexmul1114 commented Mar 14, 2024

Benchmark Report for GCPDecompositions

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

alexmul1114 commented Mar 15, 2024

Benchmark Report for GCPDecompositions

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

alexmul1114 commented Mar 15, 2024 • edited Loading

Benchmark Report for GCPDecompositions

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

dahong67 left a comment • edited Loading

Choose a reason for hiding this comment

dahong67 commented Mar 20, 2024

codecov bot commented Feb 23, 2024 •

edited

Loading

Benchmark Report for `GCPDecompositions`

Benchmark Report for `GCPDecompositions`

Benchmark Report for `GCPDecompositions`

Benchmark Report for `GCPDecompositions`

alexmul1114 commented Mar 15, 2024 •

edited

Loading

Benchmark Report for `GCPDecompositions`

dahong67 left a comment •

edited

Loading