diff --git a/docs/Project.toml b/docs/Project.toml
index f921b5887b9..3ae09eda05c 100644
--- a/docs/Project.toml
+++ b/docs/Project.toml
@@ -47,4 +47,4 @@ Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
 Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 
 [compat]
-CUDA = "= 3.12.0"
+CUDA = "4"
diff --git a/docs/src/showcase/massively_parallel_gpu.md b/docs/src/showcase/massively_parallel_gpu.md
index 4186fa139dc..4d4ade85bb4 100644
--- a/docs/src/showcase/massively_parallel_gpu.md
+++ b/docs/src/showcase/massively_parallel_gpu.md
@@ -17,6 +17,20 @@ use GPUs to parallelize over different parameters and initial conditions. In oth
 This showcase will focus on the latter case. For the former, see the
 [massively parallel GPU ODE solving showcase](@ref gpuspde).
 
+## Supported GPUs
+
+SciML's GPU support extends to a wide array of hardware, including:
+
+| GPU Manufacturer | GPU Kernel Language | Julia Support Package                              | Backend Type             |
+|:---------------- |:------------------- |:-------------------------------------------------- |:------------------------ |
+| NVIDIA           | CUDA                | [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl)     | `CUDA.CUDABackend()`     |
+| AMD              | ROCm                | [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl) | `AMDGPU.ROCBackend()`    |
+| Intel            | OneAPI              | [OneAPI.jl](https://github.com/JuliaGPU/oneAPI.jl) | `oneAPI.oneAPIBackend()` |
+| Apple (M-Series) | Metal               | [Metal.jl](https://github.com/JuliaGPU/Metal.jl)   | `Metal.MetalBackend()`   |
+
+For this tutorial we will demonstrate the CUDA backend for NVIDIA GPUs, though any of the other GPUs can be
+used by simply swapping out the `backend` choice.
+
 ## Problem Setup
 
 Let's say we wanted to quantify the uncertainty in the solution of a differential equation.
@@ -41,7 +55,7 @@ Let's implement the Lorenz equation out-of-place. If you don't know what that me
 see the [getting started with DifferentialEquations.jl](https://docs.sciml.ai/DiffEqDocs/stable/getting_started/)
 
 ```@example diffeqgpu
-using DiffEqGPU, OrdinaryDiffEq, StaticArrays
+using DiffEqGPU, OrdinaryDiffEq, StaticArrays, CUDA
 function lorenz(u, p, t)
     σ = p[1]
     ρ = p[2]
@@ -76,14 +90,14 @@ sol = solve(monteprob, Tsit5(), EnsembleThreads(), trajectories = 10_000, saveat
 Now uhh, we just change `EnsembleThreads()` to `EnsembleGPUArray()`
 
 ```@example diffeqgpu
-sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10_000, saveat = 1.0f0)
+sol = solve(monteprob, Tsit5(), EnsembleGPUArray(CUDA.CUDABackend()), trajectories = 10_000, saveat = 1.0f0)
 ```
 
 Or for a more efficient version, `EnsembleGPUKernel()`. But that requires special solvers,
 so we also change to `GPUTsit5()`.
 
 ```@example diffeqgpu
-sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(), trajectories = 10_000)
+sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend()), trajectories = 10_000)
 ```
 
 Okay, so that was anticlimactic, but that's the point: if it were harder than that, it