-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathPancCanNet-workflow1.Rmd
310 lines (228 loc) · 15.4 KB
/
PancCanNet-workflow1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
---
title: "PancCanNet-workflow1.Rmd"
author: "mkutmon"
date: "2021-11-26"
output:
md_document:
variant: markdown_github
always_allow_html: true
---
## Introduction
In this workflow, you will explore the PancCanNet pathway resource in detail and study the pathways related to pancreatic cancer. You will integrate knowledge from the Human Protein Atlas and TCGA about pancreatic cancer prognostic markers and pancreatic cancer specific genes. Lastly, you can visualize your gene or protein expression data in the context of pancreatic cancer related processes.
The PancCanNet project is a collaboration between the Erasmus MC, Maastricht University and Omnigen. The pathway portal is integrated in the collaborative pathway database WikiPathways (http://panccannet.wikipathways.org) and all analysis workflows are available on Github (https://panccannet.github.io).
Pathway curation is an ongoing effort and we are continuously improving current pathway models and adding new pathway models. The workflow will always use the most-up-to-date information and results can slightly differ to the information on the project website.
## R environment setup
First, we need to make sure all required R-packages are installed. Run the following code snippet to install and load the libraries.
```{r setup, include=FALSE, echo = FALSE}
# check if libraries are installed > otherwise install
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if(!"rstudioapi" %in% installed.packages()) BiocManager::install("rstudioapi")
if(!"rWikiPathways" %in% installed.packages()) BiocManager::install("rWikiPathways")
if(!"RCy3" %in% installed.packages()) BiocManager::install("RCy3")
if(!"data.table" %in% installed.packages()) BiocManager::install("data.table")
if(!"heatmaply" %in% installed.packages()) BiocManager::install("heatmaply")
if(!"igraph" %in% installed.packages()) BiocManager::install("igraph")
if(!"RColorBrewer" %in% installed.packages()) BiocManager::install("RColorBrewer")
# load required libraries
library(rstudioapi)
library(rWikiPathways)
library(RCy3)
library(data.table)
library(heatmaply)
library(igraph)
library(RColorBrewer)
# set working environment to source file
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
```
## Explore PancCanNet pathway resources
The following section will download the current PancCanNet pathway collection and provide basic statistics and information about the pathways.
```{r collection}
# retrieve pathway list
pwys <- rWikiPathways::getPathwayIdsByCurationTag("Curation:PancCanNet")
print(paste0("The PancCanNet pathway portal currently contains ",length(pwys)," pathways."))
# retrieve gene list per pathway
mylist <- list()
x <- 1
for(p in pwys) {
genes <- rWikiPathways::getXrefList(p,"En")
for(g in genes) {
mylist[[x]] <- c(p,g)
x <- x+1
}
}
pwy.genes <- as.data.frame(do.call("rbind",mylist))
colnames(pwy.genes) <- c("wpid","gene")
rm(mylist,x,genes,p,g)
# report pathway size distribution
hist(table(pwy.genes$wpid), breaks=15, xlab="Pathway size (num of genes)", ylab = "Num of pathways", main="Pathway size distribution", col="#FFAAAA",border="#550000")
# report unique number of genes in PancCanNet pathways
print(paste0("The pathways in the PancCanNet collection currently contain ",length(unique(pwy.genes$gene))," unique genes."))
```
Let's explore the overlap and crosstalk between the pathways in the collection. We will first calculate the Jaccard similarity score (based on gene content) and additionally, we will visualize the pathway-gene network in Cytoscape which allows us to see which genes are present in many pathways.
```{r collection jaccard}
# calculate Jaccard index
jaccard <- matrix(ncol=length(pwys), nrow=length(pwys))
for(i in 1:length(pwys)) {
for(j in i:length(pwys)) {
genes1 <- pwy.genes[pwy.genes$wpid == pwys[i],2]
genes2 <- pwy.genes[pwy.genes$wpid == pwys[j],2]
intersection = length(intersect(genes1, genes2))
union = length(genes1) + length(genes2) - intersection
score <- (intersection/union)
jaccard[i,j] <- score
jaccard[j,i] <- score
}
}
colnames(jaccard) <- pwys
rownames(jaccard) <- pwys
heatmaply(jaccard,main="Similarity plot PancCanNet pathways",plot_method = "plotly", colors = viridis(n = 256, option = "magma"))
rm(genes1,genes2,i,j,intersection,union, score)
```
**Jaccard similarity heatmap**
Lighter areas in the heatmap show pathways with similar gene content (high similarity score). Overall, the PancCanNet collections contains pathways with very different gene content (low similarity score).
For the next step, please make sure that you have Cytoscape v3.8.0+ running. We want to explore the cross-talk between the pathway and see which central genes are present in many different pathways.
```{r collection network}
# create pathway-gene network
graph <- igraph::graph_from_edgelist(as.matrix(pwy.genes), directed = TRUE)
suid <- RCy3::createNetworkFromIgraph(graph, title = "PancCanNet pathway gene network", collection = "PancCanNet")
table <- RCy3::getTableColumns(columns = c("name"))
table$type <- ifelse(startsWith(table$name, "WP"), 'Pathway', 'Gene')
RCy3::loadTableData(table, data.key.column = "name", table.key.column = "name")
RCy3::mapTableColumn("name",species="Human",map.from="Ensembl", map.to = "HGNC")
# analyse network properties
RCy3::analyzeNetwork(directed=TRUE)
# adapt visual style
toggleGraphicsDetails()
RCy3::copyVisualStyle("default","crosstalk")
RCy3::setVisualStyle("crosstalk")
RCy3::deleteStyleMapping("crosstalk","NODE_LABEL")
RCy3::lockNodeDimensions(TRUE, style.name="crosstalk")
RCy3::setNodeShapeMapping('type', c('Gene','Pathway'), c("ellipse","hexagon"), style.name="crosstalk")
RCy3::setNodeSizeMapping('type', c('Gene','Pathway'), c(25,40), mapping.type = "d", style.name = "crosstalk")
setNodeColorMapping("type", c('Gene','Pathway'), c("#EBCD9E","#70A689"), mapping.type = 'd', default.color = "#99FF99", style.name = "crosstalk")
# visualize hub nodes in pathway gene network
suid.hubs <- RCy3::cloneNetwork(suid)
RCy3::setCurrentNetwork(suid.hubs)
RCy3::renameNetwork("PancCanNet pathway-gene network: Hubs")
# adapt visual style
toggleGraphicsDetails()
RCy3::copyVisualStyle("crosstalk","hubs")
RCy3::setVisualStyle("hubs")
# identify most connected genes
degree <- RCy3::getTableColumns(columns = c("name","HGNC","Indegree"))
degree <- degree[degree$Indegree > 0,]
colnames(degree) <- c("Gene","GeneName","NumPathways")
degree <- degree[order(-degree$NumPathways),]
hist(degree$NumPathways, breaks=14, xlab="Node degree (genes)", ylab = "Num of genes", main="Distribution number of pathways", col="#FFAAAA",border="#550000",xlim=c(1,15))
hub.node.cutoff <- 10
hubs <- degree[degree$NumPathways > hub.node.cutoff,]
hubs
RCy3::setNodeBorderColorBypass(hubs$Gene, new.colors = "#052549")
RCy3::setNodeBorderWidthBypass(hubs$Gene, new.sizes = 10)
rm(runningRemote,graph,table,notebookIsRunning)
```
Explore the network in Cytoscape. You can see a lot of crosstalk and overlap between the pathways in the portal. The hub nodes are highlighted with a thick border. You can change the hub degree cutoff if you want to be more strict or less strict. Even though the Jaccard index indicated that there are very few similar pathways, there is still a lot of crosstalk between the pathways in the collection.
## Integrate basic pancreatic cancer knowledge
In this section, you will integrate current knowledge about pancreatic cancer from the Human Protein Atlas and TCGA. Let's first investigate known prognostic markers for pancreatic cancer and assess their presence in the PancCanNet pathways.
```{r prognsis}
# favorable prognostic markers
link <- "https://www.proteinatlas.org/api/search_download.php?search=prognostic%3Apancreatic%20cancer%3BFavorable%20AND%20sort_by%3Aprognostic%20pancreatic%20cancer&columns=g,eg,up,rnacas,rnacad,rnacass,rnacasm,prognostic_pancreatic_cancer,t_RNA_pancreas,sc_RNA_Pancreatic_endocrine_cells&compress=no&format=tsv"
fav.markers <- read.table(file = link, sep = '\t', header = TRUE)
cat(paste0("There are currently ", length(fav.markers$Ensembl), " known favorable prognostic markers for pancreatic cancer."))
# unfavorable prognostic markers
link <- "https://www.proteinatlas.org/api/search_download.php?search=prognostic%3Apancreatic%20cancer%3BUnfavorable%20AND%20sort_by%3Aprognostic%20pancreatic%20cancer&columns=g,eg,up,rnacas,rnacad,rnacass,rnacasm,prognostic_pancreatic_cancer,t_RNA_pancreas,sc_RNA_Pancreatic_endocrine_cells&compress=no&format=tsv"
unfav.markers <- read.table(file = link, sep = '\t', header = TRUE)
cat(paste0("There are currently ", length(unfav.markers$Ensembl), " known unfavorable prognostic markers for pancreatic cancer."))
# how many pathways contain prognostic markers
fav.markers.pwy <- fav.markers[fav.markers$Ensembl %in% pwy.genes$gene,]
unfav.markers.pwy <- unfav.markers[unfav.markers$Ensembl %in% pwy.genes$gene,]
cat(paste0("Number of favorable markers in PancCanNet pathways: ",length(fav.markers.pwy$Ensembl),"\n","Number of unfavorable markers in PancCanNet pathways: ",length(unfav.markers.pwy$Ensembl)))
# visualize prognostic markers in pathway gene network
suid.markers <- RCy3::cloneNetwork(suid)
RCy3::setCurrentNetwork(suid.markers)
RCy3::renameNetwork("PancCanNet pathway-gene network: Markers")
# adapt visual style
toggleGraphicsDetails()
RCy3::copyVisualStyle("crosstalk","markers")
RCy3::setVisualStyle("markers")
RCy3::setNodeBorderColorBypass(fav.markers.pwy$Ensembl, new.colors = "#902D10")
RCy3::setNodeBorderWidthBypass(fav.markers.pwy$Ensembl, new.sizes = 10)
RCy3::setNodeBorderColorBypass(unfav.markers.pwy$Ensembl, new.colors = "#052549")
RCy3::setNodeBorderWidthBypass(unfav.markers.pwy$Ensembl, new.sizes = 10)
```
Explore the network in Cytoscape. You will see that most pathways contain at least a couple of marker genes, both favorable (red border color) and unfavorable (blue border color).
###
Next, we will use the knowledge from the pathology information from the Human Protein Atlas and TCGA to identify and visualize pancreatic cancer-enriched genes and cancer-enriched genes.
```{r cancer-enriched}
# genes enriched only in pancreatic cancer
link <- "https://www.proteinatlas.org/api/search_download.php?search=cancer_category_rna%3Apancreatic%20cancer%3BDetected%20in%20single&columns=g,eg&compress=no&format=tsv"
pancreatic.cancer.only <- read.table(file = link, sep = '\t', header = TRUE)
# genes enriched in only a few cancers incl. pancreatic cancer
link <- "https://www.proteinatlas.org/api/search_download.php?search=cancer_category_rna%3Apancreatic%20cancer%3BDetected%20in%20some&columns=g,eg&compress=no&format=tsv"
pancreatic.cancer.some <- read.table(file = link, sep = '\t', header = TRUE)
# genes enriched in many cancers
link <- "https://www.proteinatlas.org/api/search_download.php?search=cancer_category_rna%3Apancreatic%20cancer%3BDetected%20in%20many&columns=g,eg&compress=no&format=tsv"
pancreatic.cancer.many <- read.table(file = link, sep = '\t', header = TRUE)
# genes enriched in all cancers
link <- "https://www.proteinatlas.org/api/search_download.php?search=cancer_category_rna%3Apancreatic%20cancer%3BDetected%20in%20all&columns=g,eg&compress=no&format=tsv"
pancreatic.cancer.all <- read.table(file = link, sep = '\t', header = TRUE)
pancreatic.cancer.only.pwy <- pancreatic.cancer.only[pancreatic.cancer.only$Ensembl %in% pwy.genes$gene,]
pancreatic.cancer.some.pwy <- pancreatic.cancer.some[pancreatic.cancer.some$Ensembl %in% pwy.genes$gene,]
pancreatic.cancer.many.pwy <- pancreatic.cancer.many[pancreatic.cancer.many$Ensembl %in% pwy.genes$gene,]
pancreatic.cancer.all.pwy <- pancreatic.cancer.all[pancreatic.cancer.all$Ensembl %in% pwy.genes$gene,]
cat(paste0("There are ",length(pancreatic.cancer.only$Ensembl), " genes only detected in pancreatic cancer of which ",length(pancreatic.cancer.only.pwy$Ensembl), " are in PancCanNet pathways.\n\n","There are ",length(pancreatic.cancer.some$Ensembl), " genes detected in some cancers incl. pancreatic cancer of which ",length(pancreatic.cancer.some.pwy$Ensembl), " are in PancCanNet pathways.\n\n","There are ",length(pancreatic.cancer.many$Ensembl), " genes detected in many cancers incl. pancreatic cancer of which ",length(pancreatic.cancer.many.pwy$Ensembl), " are in PancCanNet pathways.\n\n","There are ",length(pancreatic.cancer.all$Ensembl), " detected in all cancers of which ",length(pancreatic.cancer.all.pwy$Ensembl), " are in PancCanNet pathways."))
# visualize specificity in pathway gene network
suid.spec <- RCy3::cloneNetwork(suid)
RCy3::setCurrentNetwork(suid.spec)
RCy3::renameNetwork("PancCanNet pathway-gene network: Cancer specificity")
genes <- RCy3::getTableColumns(columns = c("name","type"))
genes <- genes[genes$type=="Gene",]
genes[,"Specificity"] <- "none"
genes[genes$name %in% pancreatic.cancer.only$Ensembl,"Specificity"] <- "only"
genes[genes$name %in% pancreatic.cancer.some$Ensembl,"Specificity"] <- "some"
genes[genes$name %in% pancreatic.cancer.many$Ensembl,"Specificity"] <- "many"
genes[genes$name %in% pancreatic.cancer.all$Ensembl,"Specificity"] <- "all"
RCy3::loadTableData(genes,data.key.column = "name", table.key.column = "name")
# adapt visual style
RCy3::copyVisualStyle("crosstalk","specificity")
RCy3::setVisualStyle("specificity")
RCy3::toggleGraphicsDetails()
RCy3::setNodeColorMapping("Specificity", c('only','some','many','all','none'), c("#0594F4","#17285E","#4D5B88","#737C9D","#000000"), mapping.type = 'd', default.color = "#FFF7BC", style.name = "specificity")
```
Explore the network in Cytoscape. The darker the color the more specific the expression is pancreatic cancer.
## Visualize your own data
Now let's load and explore your differential gene or protein expression dataset and visualize it on the pathway-gene network in Cytoscape.
Example dataset:
pancreatic ductal adenocarcinoma vs. normal
Downloaded from Expression Atlas: https://www.ebi.ac.uk/gxa/experiments/E-GEOD-15471/
Ensembl gene id, gene name, log2FC, p-value
```{r vis}
data <- read.delim("data/E-GEOD-15471.tsv")
# overview table > pathway, genes, measured genes, DEGs+, DEGs-
pwy.data <- merge(pwy.genes, data, by.x="gene", by.y="Gene.ID")
print(paste0(length(pwy.data$Gene.Name)," out of ", length(pwy.genes$wpid), " genes in PancCanNet pathways are measured in the dataset."))
# heatmap (pathways, genes)
#heatmaply(pwy.data[,c(1,2,4)],main="",plot_method = "plotly", colors = viridis(n = 256, option = "magma"))
# pathway-gene network
suid.viz <- RCy3::cloneNetwork(suid)
RCy3::setCurrentNetwork(suid.viz)
RCy3::renameNetwork("PancCanNet pathway-gene network: Data visualization")
RCy3::loadTableData(pwy.data, data.key.column = "gene", table.key.column = "name")
# adapt visual style
RCy3::copyVisualStyle("crosstalk","data")
RCy3::setVisualStyle("data")
RCy3::toggleGraphicsDetails()
node.colors <- c(rev(brewer.pal(3, "RdBu")))
setNodeColorMapping("log2foldchange", c(-1,0,1), node.colors, default.color = "#99FF99", style.name = "data")
RCy3::setNodeLabelMapping(table.column = "Gene.Name", style.name = "data")
```
You can now explore the gene expression changes in the PancCanNet pathways. (blue = down-regulated, white = not changed, red = up-regulated)
## Save session
```{r session}
cys.file <- file.path(getwd(), "output/workflow1.cys")
saveSession(cys.file)
```
## Clear environment
```{r}
rm(list=ls())
```