diff --git a/docs/articles/opencl.html b/docs/articles/opencl.html deleted file mode 100644 index 38f536e07..000000000 --- a/docs/articles/opencl.html +++ /dev/null @@ -1,263 +0,0 @@ - - - - - - - -Running Stan on the GPU with OpenCL • cmdstanr - - - - - - - - - - - - - - - - - -
-
- - - - -
-
- - - - -
-

-Introduction

-

This vignette demonstrates how to use the OpenCL capabilities of CmdStan with CmdStanR. The functionality described in this vignette requires CmdStan 2.26.1 or newer.

-

As of version 2.26.1, users can expect speedups with OpenCL when using vectorized probability distribution functions (functions with the _lpdf or _lpmf suffix) and when the input variables contain at least 20,000 elements.

-

The actual speedup for a model will depend on the particular lpdf/lpmf functions used and whether the lpdf/lpmf functions are the bottlenecks of the model. The more computationally complex the function is, the larger the expected speedup. The biggest speedups are expected when using the specialized GLM functions.

-

In order to establish the bottlenecks in your model we recommend using profiling, which was introduced in Stan version 2.26.0.

-
-
-

-OpenCL runtime

-

OpenCL is supported on most modern CPUs and GPUs. In order to use OpenCL in CmdStanR, an OpenCL runtime for the target device must be installed. A guide for the most common devices is available in the CmdStan manual’s chapter on parallelization.

-
-
-

-Compiling a model with OpenCL

-

By default, models in CmdStanR are compiled without OpenCL support. Once OpenCL support is enabled, a CmdStan model will make use of OpenCL if the functions in the model support it. Technically no changes to a model are required to support OpenCL since the choice of using OpenCL is handled by the compiler, but it can still be useful to rewrite a model to be more OpenCL-friendly by using vectorization as much as possible when using probability distributions.

-

Consider a simple logistic regression with parameters alpha and beta, covariates X, and outcome y.

-
data {
-  int<lower=1> k;
-  int<lower=0> n;
-  matrix[n, k] X;
-  int y[n];
-}
-parameters {
-  vector[k] beta;
-  real alpha;
-}
-model {
-  target += std_normal_lpdf(beta);
-  target += std_normal_lpdf(alpha);
-  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
-}
-

Some fake data will be useful to run this model:

-
-library(cmdstanr)
-
-# Generate some fake data
-n <- 250000
-k <- 20
-X <- matrix(rnorm(n * k), ncol = k)
-y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
-mdata <- list(k = k, n = n, y = y, X = X)
-

In this model, most of the computation will be handled by the bernoulli_logit_glm_lpmf function. Because this is a supported GPU function, it should be possible to accelerate it with OpenCL. Check here for a list of functions with OpenCL support.

-

To build the model with OpenCL support, add cpp_options = list(stan_opencl = TRUE) at the compilation step.

-
-# Compile the model with STAN_OPENCL=TRUE
-mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
-                        cpp_options = list(stan_opencl = TRUE))
-
-
-

-Running models with OpenCL

-

Running models with OpenCL requires specifying the OpenCL platform and device on which to run the model (there can be multiple). If the system has one GPU and no OpenCL CPU runtime, the platform and device IDs of the GPU are typically both 0, but the clinfo tool can be used to figure out for sure which devices are available.

-

On an Ubuntu system with both CPU and GPU OpenCL support, clinfo -l outputs:

-
Platform #0: AMD Accelerated Parallel Processing
- `-- Device #0: gfx906+sram-ecc
-Platform #1: Intel(R) CPU Runtime for OpenCL(TM) Applications
- `-- Device #0: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
-

On this system the GPU is platform ID 0 and device ID 0, while the CPU is platform ID 1, device ID 0. These can be specified with the opencl_ids argument when running a model. The opencl_ids is supplied as a vector of length 2, where the first element is the platform ID and the second argument is the device ID.

-
-fit_cl <- mod_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
-                        opencl_ids = c(0, 0), refresh = 0)
-
## Running MCMC with 4 parallel chains...
-## 
-## Chain 2 finished in 60.2 seconds.
-## Chain 1 finished in 60.4 seconds.
-## Chain 3 finished in 60.4 seconds.
-## Chain 4 finished in 60.4 seconds.
-## 
-## All 4 chains finished successfully.
-## Mean chain execution time: 60.4 seconds.
-## Total execution time: 62.9 seconds.
-

We’ll also run a version without OpenCL and compare the run times.

-
-# no OpenCL version
-mod <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan")
-fit_cpu <- mod$sample(data = mdata, chains = 4, parallel_chains = 4, refresh = 0)
-
## Running MCMC with 4 parallel chains...
-## 
-## Chain 2 finished in 564.9 seconds.
-## Chain 4 finished in 565.3 seconds.
-## Chain 1 finished in 565.3 seconds.
-## Chain 3 finished in 571.1 seconds.
-## 
-## All 4 chains finished successfully.
-## Mean chain execution time: 566.6 seconds.
-## Total execution time: 573.0 seconds.
-

The speedup of the OpenCL model is:

-
-fit_cpu$time()$total / fit_cl$time()$total
-
## [1] 9.114909
-

This speedup will be determined by the particular GPU/CPU used, the input problem sizes (data as well as parameters) and if the model uses functions that can be run on the GPU or other OpenCL devices.

-
-
- - - -
- - - - -
- - - - - - diff --git a/docs/articles/opencl_files/accessible-code-block-0.0.1/empty-anchor.js b/docs/articles/opencl_files/accessible-code-block-0.0.1/empty-anchor.js deleted file mode 100644 index ca349fd6a..000000000 --- a/docs/articles/opencl_files/accessible-code-block-0.0.1/empty-anchor.js +++ /dev/null @@ -1,15 +0,0 @@ -// Hide empty tag within highlighted CodeBlock for screen reader accessibility (see https://github.com/jgm/pandoc/issues/6352#issuecomment-626106786) --> -// v0.0.1 -// Written by JooYoung Seo (jooyoung@psu.edu) and Atsushi Yasumoto on June 1st, 2020. - -document.addEventListener('DOMContentLoaded', function() { - const codeList = document.getElementsByClassName("sourceCode"); - for (var i = 0; i < codeList.length; i++) { - var linkList = codeList[i].getElementsByTagName('a'); - for (var j = 0; j < linkList.length; j++) { - if (linkList[j].innerHTML === "") { - linkList[j].setAttribute('aria-hidden', 'true'); - } - } - } -}); diff --git a/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.css b/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.css deleted file mode 100644 index 07aee5fcb..000000000 --- a/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.css +++ /dev/null @@ -1,4 +0,0 @@ -/* Styles for section anchors */ -a.anchor-section {margin-left: 10px; visibility: hidden; color: inherit;} -a.anchor-section::before {content: '#';} -.hasAnchor:hover a.anchor-section {visibility: visible;} diff --git a/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.js b/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.js deleted file mode 100644 index 570f99a0a..000000000 --- a/docs/articles/opencl_files/anchor-sections-1.0/anchor-sections.js +++ /dev/null @@ -1,33 +0,0 @@ -// Anchor sections v1.0 written by Atsushi Yasumoto on Oct 3rd, 2020. -document.addEventListener('DOMContentLoaded', function() { - // Do nothing if AnchorJS is used - if (typeof window.anchors === 'object' && anchors.hasOwnProperty('hasAnchorJSLink')) { - return; - } - - const h = document.querySelectorAll('h1, h2, h3, h4, h5, h6'); - - // Do nothing if sections are already anchored - if (Array.from(h).some(x => x.classList.contains('hasAnchor'))) { - return null; - } - - // Use section id when pandoc runs with --section-divs - const section_id = function(x) { - return ((x.classList.contains('section') || (x.tagName === 'SECTION')) - ? x.id : ''); - }; - - // Add anchors - h.forEach(function(x) { - const id = x.id || section_id(x.parentElement); - if (id === '') { - return null; - } - let anchor = document.createElement('a'); - anchor.href = '#' + id; - anchor.classList = ['anchor-section']; - x.classList.add('hasAnchor'); - x.appendChild(anchor); - }); -}); diff --git a/docs/articles/opencl_files/header-attrs-2.7/header-attrs.js b/docs/articles/opencl_files/header-attrs-2.7/header-attrs.js deleted file mode 100644 index dd57d92e0..000000000 --- a/docs/articles/opencl_files/header-attrs-2.7/header-attrs.js +++ /dev/null @@ -1,12 +0,0 @@ -// Pandoc 2.9 adds attributes on both header and div. We remove the former (to -// be compatible with the behavior of Pandoc < 2.8). -document.addEventListener('DOMContentLoaded', function(e) { - var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); - var i, h, a; - for (i = 0; i < hs.length; i++) { - h = hs[i]; - if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 - a = h.attributes; - while (a.length > 0) h.removeAttribute(a[0].name); - } -});