-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support for unified and host memory #2138
Conversation
Instead of tracking a single |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2138 +/- ##
==========================================
+ Coverage 72.21% 72.29% +0.08%
==========================================
Files 159 159
Lines 14340 14444 +104
==========================================
+ Hits 10356 10443 +87
- Misses 3984 4001 +17
☔ View full report in Codecov by Sentry. |
if is_unified(xs) && sizeof(xs) > 0 && !is_capturing() | ||
buf = xs.data[] | ||
subbuf = Mem.UnifiedBuffer(buf.ctx, pointer(xs), sizeof(xs)) | ||
Mem.prefetch(subbuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmccormick you were asking me about this yesterday.
src/array.jl
Outdated
haskey(tls, :CUDA_ASYNC_BUFFERS) || return | ||
async_buffers = tls[:CUDA_ASYNC_BUFFERS]::Vector{Mem.UnifiedBuffer} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really need to finish TaskLocalValues.jl xD
This PR includes a couple of unified memory improvements that are long overdue. With most vendors providing mature unified memory architectures, we really should be exploring defaulting to unified memory, as it greatly helps with two major issues that users run into: scalar indexing errors due to abstractarray fallbacks, and out-of-memory situations.
Scalar iteration
Building on JuliaGPU/GPUArrays.jl#499, CUDA.jl now implements efficient scalar indexing for unified arrays. The performance difference to old-style
@allowscalar
is huge:vs.
The
dirty
bit is a bit naive though, as it breaks down when multiple tasks are accessing a single array concurrently. Maybe that's fine though, and users ought to perform manual synchronization when using multiple tasks or streams.Default unified memory
I also added a preference to switch the default memory type to
unified
, and added a CI job to test whether the test suite supports that. We should consider if we actually want to switch, for one, because unified memory doesn't have a stream-ordered allocator, so it may make allocations and GC pauses worse. At the same time, it would allow the user to use much more memory, so maybe that helps with GC pressure...TODO: consider adding a MemAdvice or Prefetch (see https://developer.nvidia.com/blog/maximizing-unified-memory-performance-cuda/)
cc @vchuravy