-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Support for ALS #37
Open
alexmul1114
wants to merge
24
commits into
dahong67:master
Choose a base branch
from
alexmul1114:CUDA-Support
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
d48f0f1
Fixes #36 (with simple MTTKRP implementation)
alexmul1114 4596484
Change to Arrays AbstractArrays
alexmul1114 6e2a476
Replace skipmissing
alexmul1114 01e2ab9
Use ismissing in mapreduce instead of skipmissing
alexmul1114 1e6097e
Add new _gcp method for CuArray
alexmul1114 fd71ca4
Add CUDA gcp as extension, use simple MTTKRP implementation and khatr…
alexmul1114 e5b30ea
Add CUDA to deps for backward compatiblity
alexmul1114 c34cc3c
Move CUDA to extras
alexmul1114 966e30b
Change CUDA version compatibility
alexmul1114 9938c46
Modify project.toml
alexmul1114 e8077a3
Test with new MTTKRP from #35
alexmul1114 fb1d0ce
Change .= back to =
alexmul1114 e368433
Fix typos
alexmul1114 614929d
Use latest MTTKRP from #35
alexmul1114 7854ec8
collect selectdim for now
alexmul1114 df803e1
Try reusing inner
alexmul1114 2a7f0c2
Use copy in MTTKRP middle mode Rn computation for now
alexmul1114 b72f6e9
Change abstract arrays back to arrays
alexmul1114 12f0004
Remove typo
alexmul1114 e9680f6
Switch khatrirao to accept abstractmatrix
alexmul1114 ebc5455
Fix typo
alexmul1114 ab40583
Temporarily use Float32 for comparison against GPU
alexmul1114 f4696c0
Replace norm function
alexmul1114 d7f68ed
Merge branch 'master' into pr/alexmul1114/37
dahong67 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ authors = ["David Hong <[email protected]> and contributors"] | |
version = "0.1.2" | ||
|
||
[deps] | ||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" | ||
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20" | ||
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" | ||
IntervalSets = "8197267c-284f-5f27-9208-e0e47529a953" | ||
|
@@ -13,12 +14,15 @@ LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7" | |
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" | ||
|
||
[weakdeps] | ||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" | ||
LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7" | ||
|
||
[extensions] | ||
CUDAExt = "CUDA" | ||
LossFunctionsExt = "LossFunctions" | ||
|
||
[compat] | ||
CUDA = ">= 4.4.1" | ||
Compat = "3.42, 4" | ||
ForwardDiff = "0.10.36" | ||
IntervalSets = "0.7.7" | ||
|
@@ -27,3 +31,6 @@ LinearAlgebra = "1.6" | |
LossFunctions = "0.11.1" | ||
Random = "1.6" | ||
julia = "1.6" | ||
|
||
[extras] | ||
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
module CUDAExt | ||
|
||
using GCPDecompositions, CUDA | ||
|
||
GCPDecompositions.gcp( | ||
X::CuArray, | ||
r, | ||
loss = LeastSquaresLoss(); | ||
constraints = (), | ||
algorithm = GCPAlgorithms.ALS(), | ||
) = _gcp(X, r, loss, constraints, algorithm) | ||
function _gcp( | ||
X::CuArray{TX,N}, | ||
r, | ||
loss::LeastSquaresLoss, | ||
constraints::Tuple{}, | ||
algorithm::GCPAlgorithms.ALS, | ||
) where {TX<:Real,N} | ||
T = promote_type(TX, Float32) | ||
|
||
# Random initialization | ||
M0 = CPD(ones(T, r), rand.(T, size(X), r)) | ||
M0norm = sqrt(sum(abs2, M0[I] for I in CartesianIndices(size(M0)))) | ||
Xnorm = sqrt(mapreduce(x -> isnan(x) ? 0 : abs2(x), +, X, init=0f0)) | ||
for k in Base.OneTo(N) | ||
M0.U[k] .*= (Xnorm / M0norm)^(1 / N) | ||
end | ||
λ, U = M0.λ, collect(M0.U) | ||
|
||
# Move λ, U to gpu | ||
λ = CuArray(λ) | ||
U = [CuArray(U[i]) for i in 1:N] | ||
|
||
# Inefficient but simple implementation | ||
for _ in 1:algorithm.maxiters | ||
for n in 1:N | ||
V = reduce(.*, U[i]'U[i] for i in setdiff(1:N, n)) | ||
U[n] = GCPDecompositions.mttkrp(X, U, n) / V | ||
λ = vec(sqrt.(sum(abs2, U[n]; dims=1))) | ||
U[n] = U[n] ./ permutedims(λ) | ||
end | ||
end | ||
|
||
return CPD(Array(λ), Tuple([Array(U[i]) for i in 1:N])) | ||
end | ||
|
||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added CUDA as an extension, where the extension has a gcp definition for CuArray input. Right now M0 is created and normalized on the CPU then moved to the GPU for ALS (and moved back to the CPU to be returned at the end). Need to figure out how rewrite line 24 without scalar indexing so M0 can be created directly as a CuArray.