-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathstep2.Rmd
85 lines (70 loc) · 2.52 KB
/
step2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: "Uncompressed HDF5"
output: html_document
---
<style>
pre {
overflow-x: auto;
}
pre code {
word-wrap: normal;
white-space: pre;
}
</style>
```{r global_options, echo = FALSE, include = FALSE}
options(width = 150)
```
Here we use the default versions of **rhdf5** and **DelayedArray**, but with an uncompressed vesion of the counts table on disk.
```{r, setup, echo=FALSE, message=FALSE}
if(packageVersion('rhdf5') > "2.23.1") {
BiocInstaller::biocLite('grimbough/rhdf5', ref = "6713b80", suppressUpdates = TRUE, ask = FALSE)
}
if(packageVersion('DelayedArray') > "0.5.6") {
BiocInstaller::biocLite('Bioconductor/DelayedArray', ref = "82d7d8b", suppressUpdates = TRUE, ask = FALSE)
BiocInstaller::biocLite("Bioconductor/HDF5Array", ref = "bd5760c", suppressUpdates = TRUE, ask = FALSE)
}
```
```{r, load-data, message=FALSE}
library(TENxBrainData)
library(microbenchmark)
tenx <- TENxBrainData()
options(DelayedArray.block.size=2e9)
```
The code below details how to create the HDF5 file containing the uncompressed dataset. This requires being able to hold the entire matrix in memory (possibly twice due to internal R copying), so you need at least 300GB of RAM!
```{r, create-dataset, eval=FALSE}
library(rhdf5)
tenx.inmem <- as.matrix(counts(tenx))
h5File <- '/tmpdata/msmith/tenx_uncompressed.h5'
h5createFile(h5File)
h5createDataset(file = h5File, dataset = "counts", dims = dim(tenx.inmem), chunk = c(100,100),
level = 0, storage.mode = "integer")
h5write(tenx.inmem, file = h5File, name = "counts" )
```
Now we create a new *SingleCellExperiment* object using the nncompressed HDF5Array as the counts table.
```{r, subset-data}
h5.uncmp <- HDF5Array(file = '/tmpdata/msmith/tenx_uncompressed.h5',
name = "counts")
tenx.uncmp <- SingleCellExperiment(
list(counts = h5.uncmp), rowData = rowData(tenx), colData = colData(tenx)
)
tenx.sub.uncmp <- tenx.uncmp[,1:13000]
```
Run first iteration and save the results.
```{r, save-results}
systime <- system.time(res2 <- colSums(counts(tenx.sub.uncmp)))
save(res2, file = "res2.rda")
systime
```
Benchmark 5 more times.
```{r, step2-run-benchmark}
bm2 <- microbenchmark(colSums(counts(tenx.sub.uncmp)),
times = 5L, unit = "s",
control = list(order = "block", warmup = 0))
bm2 <- rbind(data.frame(expr = "", time = as.numeric(systime[3]) * 1e9), bm2)
bm2$expr <- "2GB block size2"
save(bm2, file = "bm2.rda")
bm2
```
```{r, sessionInfo}
devtools::session_info(c("HDF5Array", "DelayedArray"))
```