-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
214 lines (171 loc) · 7.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
title: "Welcome to UcscHubgeneratoR"
output: rmarkdown::github_document
---
```{r setup, include=FALSE, cache = FALSE, echo = FALSE}
knitr::opts_chunk$set(echo = TRUE,
base.url = 'https://github.com/msubirana/ucschubgenerator/raw/master/')
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
Displaying and managing custom tracks into a Genome Browser (UCSC) can be annoying.
![](vignettes/img/giphy.gif)
For this reason we have created a R package that facilitates the creation and upload of custom tracks based on [track hubs](https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html). A track hub is way of organizing large numbers of genome-wide data sets, configured with a set of plain-text files that determine the organization, UI, labels, color, and other details. The data underlying the tracks and optional sequence in a hub reside on the remote server of the data provider rather than at UCSC. Genomic annotations are stored in compressed binary indexed files in bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, HAL or VCF format that contain the data at several resolutions. For more information about the types and parameters visit [trackDb information](https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html). Currently the allowed parameters are: type, visibility, color and autoScale, shortLabel and longLabel.
Ucscgenerator package requires a GATTACA SSH Key of the GATTACA user saved in the host where it will be run. For a detailed guide please check [SSH Key](https://www.ssh.com/ssh/keygen/).
## Installation
The following code demonstrates a track hub built out of all bigWig files found in a directory.
```{r}
library(devtools)
install_github("msubirana/ucschubgenerator")
```
## Basic Example
### Download example dataset
```{r}
# Import libraries
library(ucschubgenerator)
# Download example data
path <- downloadUcschubgenerator()
```
### Create the trackhub
In this example, a trackhub with the same type of tracks and parameters will be created. In the following example will be face trackhubs with different file types and parameters.
In this case we will select the '.vcf.gz' of the 'example_data' download previously and saved in the 'path'
variable. All the 'vcf.gz' will present the same parameters defined for this particular hub.
```{r}
# Import libraries
library(ucschubgenerator)
# Define variables
path_tracks <- path
pattern_tracks <- '.vcf.gz$'
hub_name <- 'example_hub_unique'
path_local_hub <- file.path(path, 'hubs')
dir.create(path_local_hub)
hub_short_label <- hub_name
hub_long_label <- 'Example of ucschubgenerator using only one type of file with a unique set of parameters'
email_address <- '[email protected]'
assembly_database <- 'hg38'
gattaca_folder_hub <- 'exampleHub'
type <- 'vcfTabix'
visibility <- 'dense'
color <- '0, 0, 0'
gattaca_user <- 'msubirana@gattaca'
quote_label <- 'example'
batchHubGenerator(path_tracks = path_tracks,
pattern_tracks = pattern_tracks,
path_local_hub = path_local_hub,
hub_name = hub_name,
hub_short_label = hub_short_label,
hub_long_label = hub_long_label,
email_address = email_address,
assembly_database = assembly_database,
gattaca_folder_hub = gattaca_folder_hub,
type = type,
visibility = visibility,
color = color,
quote_label = quote_label,
gattaca_user = gattaca_user)
```
Once the script ends, you can upload the hub in the [USCS web](https://genome.ucsc.edu/cgi-bin/hgHubConnect?hubCheckUrl=http%3A%2F%2Fgattaca.imppc.org%2Fgenome_browser%2Flplab%2FmarcHubs%2FparserProve%2Fhub.txt&hgsid=759593043_P7LJWoCqmm0BrJx4xCRwGGvMCkRt). selecting My Hubs and pasting the link generated (server link of the hub.txt file).
To visualize it, is necessary to open the appropriate human assembly and the track hub will appear under the Custom Tracks.
![](vignettes/img/Screenshot-from-2019-09-17-15-25-59.png)
This time we will create a hub with different types of tracks and different parameters.
```{r}
path_tracks <- path
hub_name <- 'example_hub_multiple'
path_local_hub <- file.path(path, 'hubs')
dir.create(path_local_hub)
hub_short_label <- hub_name
hub_long_label <- 'Example of ucschubgenerator using different type of files and parameters'
email_address <- '[email protected]'
assembly_database <- 'hg38'
gattaca_folder_hub <- 'exampleHub'
gattaca_user <- 'msubirana@gattaca'
# Generation of the basic hub structure
hubGenerator(path_local_hub = path_local_hub,
hub_name = hub_name,
hub_short_label = hub_short_label,
hub_long_label = hub_long_label,
email_address = email_address,
description_url = NULL,
assembly_database = assembly_database,
gattaca_folder_hub = gattaca_folder_hub)
# Delete old trackDb if exists (avoid duplicates in the file)
path_local_hub_name <- file.path(path_local_hub, hub_name)
unlink(file.path(path_local_hub_name, assembly_database, 'trackDb.txt'))
# Upload "vcf.gz" files
pattern_tracks <- '.vcf.gz$'
tracks <- list.files(path_tracks,
pattern = pattern_tracks,
full.names = T)
quote_label = 'vcf_examples'
type <- 'vcfTabix'
visibility <- 'dense'
color <- '0, 0, 0'
for(track in tracks){
trackhubTrack(path_local_hub = path_local_hub,
hub_name = hub_name,
gattaca_folder_hub = gattaca_folder_hub,
assembly_database = assembly_database,
track = track,
short_label = short_label,
long_label = long_label,
type = type,
visibility = visibility,
color = color,
quote_label = quote_label)
}
# Upload "bw" files with parameters 1
pattern_tracks <- '_t1.bw$'
tracks <- list.files(path_tracks,
pattern = pattern_tracks,
full.names = T)
quote_label = '_bw_examples1'
type <- 'bigWig'
visibility <- 'full'
color <- '255, 204, 153'
autoScale <- 'on'
for(track in tracks){
trackhubTrack(path_local_hub = path_local_hub,
hub_name = hub_name,
gattaca_folder_hub = gattaca_folder_hub,
assembly_database = assembly_database,
track = track,
short_label = short_label,
long_label = long_label,
type = type,
visibility = visibility,
color = color,
quote_label = quote_label)
}
# Upload "bw" files with parameters 2
pattern_tracks <- '_t2.bw$'
tracks <- list.files(path_tracks,
pattern = pattern_tracks,
full.names = T)
quote_label = '_bw_examples2'
type <- 'bigWig'
visibility <- 'full'
color <- '153, 204, 255'
autoScale <- 'on'
for(track in tracks){
trackhubTrack(path_local_hub = path_local_hub,
hub_name = hub_name,
gattaca_folder_hub = gattaca_folder_hub,
assembly_database = assembly_database,
track = track,
short_label = short_label,
long_label = long_label,
type = type,
visibility = visibility,
color = color,
quote_label = quote_label)
}
# rsync to the gattaca server and generate the UCSC link
rsyncHub (gattaca_folder_hub = gattaca_folder_hub,
gattaca_user = gattaca_user,
path_local_hub = path_local_hub,
hub_name = hub_name)
```