CD4.Tcells.Rmd

---
title: Analysis of CD4+ T cells in ccRCC
author: 
- name: Nick Borcherding
  email: ncborch@gmail.com
  affiliation: Washington University in St. Louis, School of Medicine, St. Louis, MO, USA
date: "August, 1, 2020"
output:
  BiocStyle::html_document:
    toc_float: true

---

```{r, echo=FALSE, results="hide", message=FALSE}
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocStyle)
```


# Preperation

## Loading Libaries

In general I like to load libraries here that we will use universally, and then call other libraries when we need them in the code chunks that are relevant. 

```{r}
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
```

## Loading Data

```{r}
integrated <- readRDS("./data/Processed/integrated_Cluster.rds")
load("./data/Processed/completeMeta.rda")
```

## Selecting Color Palette

```{r setup, include=FALSE}
colorblind_vector <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))
```

***

# Initial Subsetting and Analysis

We will use the **Minor** variable in the integrated Seurat object meta data that we defined in the integration step, filtering for the "CD5.T.Cell" and "Tregs". 

```{r}
Tcells <- subset(integrated, Minor == "CD4.T.Cell" | Minor ==  "Treg")
Tcells <- NormalizeData(Tcells)
rm(integrated)
```

We are also going to create a sub-directory for all T cell analyses:
```{r}
dir.create("./DataAnalysis/CellType/CD4.Tcells")
```

## Reclustering 

Isolating just the T cells, stripping the Seurat object of non-"RNA" values using *DietSeurat()* and then splitting the object to create a list that can be fed into **SCTransform()**

```{r eval=FALSE}
Tcells@meta.data$Run <- paste0(Tcells@meta.data$sample, "_", Tcells@meta.data$type) #Define the sequence run
Tcells[["percent.mt"]] <- PercentageFeatureSet(Tcells, pattern = "^MT-") #Calculate %mito

Tcells <- DietSeurat(Tcells, assays = "RNA")
T.list <- SplitObject(object = Tcells , split.by = "Run")

for (i in 1:length(T.list)) {
    T.list[[i]] <- SCTransform(T.list[[i]], verbose = FALSE, conserve.memory=TRUE)
}
rm(Tcells)
```

Integrating the data as above with the whole data set.

```{r eval = FALSE}
options(future.globals.maxSize= 2621440000)
features <- SelectIntegrationFeatures(object.list = T.list, nfeatures = 3000)
T.list <- PrepSCTIntegration(object.list = T.list, anchor.features = features,
verbose = FALSE)
T.anchors <- FindIntegrationAnchors(object.list = T.list, normalization.method = "SCT",
anchor.features = features, verbose = FALSE, k.filter = 40)
Tcells <- IntegrateData(anchorset = T.anchors, normalization.method = "SCT", verbose = FALSE)
rm(T.list)
rm(T.anchors)
saveRDS(Tcells, file = "./data/Processed/CD4.Tcells_Precluster.rds")
readRDS("./data/Processed/CD4.Tcells_Precluster.rds")
```


Calculating the UMAP and finding clusters.

```{r eval = FALSE}
Tcells <- ScaleData(object = Tcells, verbose = FALSE)
Tcells <- RunPCA(object = Tcells, npcs = 40, verbose = FALSE)
Tcells <- RunUMAP(object = Tcells, reduction = "pca", 
    dims = 1:35)
Tcells <- FindNeighbors(object = Tcells, dims = 1:40, force.recalc = T)
Tcells <- FindClusters(object = Tcells, resolution = 0.4, force.recalc=T)
Tcells <- subset(Tcells, idents = c(0,1,2,3,4,5,6,7))
Tcells <- readRDS("./data/Processed/CD4.Tcells_FullCluster.rds")
```

### Visualizing the new sub-clusters

```{r}
Tcells@meta.data$newCluster <- Tcells@active.ident
DimPlot(object = Tcells, reduction = 'umap', label = T) + NoLegend()
ggsave(path = "./DataAnalysis/CellType/CD4.Tcells", filename="IntegratedObject_byCluster.eps", width=3.5, height=3)
DimPlot(object = Tcells, reduction = 'umap', group.by = "type") 
ggsave(path = "./DataAnalysis/CellType/CD4.Tcells", filename="IntegratedObject_byType.eps", width=3.75, height=3)

DimPlot(object = Tcells, reduction = 'umap', group.by = "orig.ident") 
ggsave(path = "./DataAnalysis/CellType/CD8.Tcells", filename="IntegratedObject_byOrigIdent.eps", width=3.75, height=3)
```

Examining the relative proportion on the new clusters
```{r}
freq_table <- table(Tcells@active.ident, Tcells$type)

tTotal <- 12239
pTotal <- 21160
kTotal <- 1778
totals <- c(kTotal,pTotal,tTotal)

for (i in 1:3) { 
    freq_table[,i] <- freq_table[,i]/totals[i]
}

freq_table <- reshape2::melt(freq_table)

ggplot(freq_table, aes(x=as.factor(Var1), y=value, fill=Var2)) + 
  geom_bar(stat="identity", position="fill", color="black", lwd=0.25) + 
  theme(axis.title.x = element_blank()) + 
   scale_fill_manual(values=rev(colorblind_vector(3))) + 
  theme_classic() + 
    xlab("Clusters") + 
    coord_flip() 
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/", file = "relativeContribution_byClusterType_scaled.pdf", height=5, width=5)


ggplot(freq_table, aes(x=as.factor(Var1), y=value, fill=Var2)) + 
  geom_bar(stat="identity", color="black", lwd=0.25) + 
  theme(axis.title.x = element_blank()) + 
   scale_fill_manual(values=rev(colorblind_vector(3))) + 
  theme_classic() + 
    xlab("Clusters") + 
    coord_flip()
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/", file = "totalContribution_byClusterType_scaled.pdf", height=5, width=5)

```

## Cell Cycle 

Cell cycle regression as described in the [Satija lab website](https://satijalab.org/seurat/v3.1/cell_cycle_vignette.html).  

```{r}
cc.genes <- Seurat::cc.genes.updated.2019
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
```

Now we can perform cell cycle scoring with the genes. For now, I am not going to regress using the assignments or save the integrated object with the calculations.

```{r}
Tcells <- CellCycleScoring(Tcells, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE)
```

Like above with the contribution by condition (**type**), we can now look at the phases assignments by cluster and condition. We do not need to scale if we separate **type** into separate bar graphs.

```{r}
dir.create("./DataAnalysis/CellType/CD4.Tcells/CellCycle")
freq_table <- Tcells[[]]
freq_table <- freq_table[,c("type", "Final_clusters", "Phase")]
freq_table <- subset(freq_table, Phase != "Undecided") #removing undecided phases
freq_table <- freq_table %>%
    group_by(type, Final_clusters, Phase) %>%
    summarise(n = n())

freq_table$Phase <- factor(freq_table$Phase, levels = c("G1", "S", "G2M")) #ordering phases
freq_table$Final_clusters <- factor(freq_table$Final_clusters, levels = c(1,2,3,8,12,13,14,17,18,21,23,24))

ggplot(freq_table, aes(x=Final_clusters, y=n, fill=Phase)) + 
  geom_bar(stat="identity", position="fill", color="black", lwd=0.25) + 
  theme(axis.title.x = element_blank())+ 
    facet_grid(type ~.) + 
   scale_fill_manual(values=colorblind_vector(3)) + 
  theme_classic()
ggsave(path = "./DataAnalysis/CellType/CD4.Tcells/CellCycle", file = "CellCycle_byCluster_byType.pdf", height=6, width=6)

ggplot(freq_table, aes(x=Final_clusters, y=n, fill=Phase)) + 
  stat_summary(geom="bar", position="fill", color="black", lwd=0.25) + 
  theme(axis.title.x = element_blank())+ 
   scale_fill_manual(values=colorblind_vector(3)) + 
  theme_classic()
ggsave(path = "./DataAnalysis/CellType/CD4.Tcells/CellCycle", file = "CellCycle_byCluster.pdf", height=2, width=6)
```


```{r}
Tcells$Phase <- factor(Tcells$Phase, levels = c("G1", "S", "G2M")) #ordering phases
DimPlot(object = Tcells, reduction = 'umap', group.by = "Phase", split.by = "type") +
    scale_color_manual(values=colorblind_vector(3)) 


ggsave(path = "./DataAnalysis/CellType/CD4.Tcells/CellCycle", filename="CellCycle_UMAP_byType.eps", width=10.5, height=3)
```

###Expression Markers in T cells

Like above we also want to look at the expression markers that define clusters and T cell subtypes. First we will make the schex-based reductions. 

```{r}
suppressPackageStartupMessages(library(schex))
Tcells <- make_hexbin(Tcells, 40, dimension_reduction = "UMAP")
```

Here is a list I use in T cell identification, we can run these first and store them in the selected folder.

```{r}
genelist <- c("CD8a", "Cd8b1", "Cd4", "Cd3d", "Foxp3", "Il2ra", "Il7r", "Ccr7", "Ccl4", "Gata3", "Tbx21",  "Cd44", "Cd28", "Sell", "Fas", "Ctla4", "Pdcd1", "Icos", "Havcr2", "Entpd1", "Tigit", "Cd244", "Eomes", "Cd160", "Il10", "Smad3", "Klrg1", "Itga4", "Ifna1", "Ifng", "Cxcr3", "Ccr5", "Il12rb2", "Il18ra", "Il27a", "Stat1", "Stat4", "Ccr3", "Ccr4", "Ccr6", "Ccr10", "Ccr8", "Stat5", "Stat6", "Il4", "Il5", "Il6", "Il9", "Il10", "Il13", "Il17a", "Il17f", "Il21", "Il22", "Tgfbr2", "Ccl20", "Ccl22", "Rorgt", "Rora", "Rorc", "Stat3", "Tnfsf8", "Cxcr5", "Cxcr6", "Maf", "Bcl6", "Gzmb", "Prf1", "Ms4a4b", "Nfat", "Blimp1", "Batf", "Trgc1", "Trgc2", "Trdc", "Il23r", "Top2a", "Birc5", "Cxcl13", "Trac", "Trbc1", "Trbc2", "Cd52", "Nkg7", "Gapdh", "Igfbp4", "Nfkbia", "Ifi27l2a", "Tnfrsf4", "TCF7")
genelist <- toupper(unique(genelist)) #Use toupper() to convert these to Human nomenclature


dir.create("DataAnalysis/CellType/CD4.Tcells/markers")
dir.create("DataAnalysis/CellType/CD4.Tcells/markers/selected")

DefaultAssay(Tcells) <- "RNA"

for (i in seq_along(genelist)) {
        if (length(which(rownames(Tcells@assays$RNA@counts) == genelist[i])) == 0){
            next() #Need to loop here because plot_hexbin_feature() does not have a built-in function to deal with absence of selected gene
        } else {
        plot <- plot_hexbin_feature(Tcells, feature = genelist[i], type = "counts", action = "prop_0")+ 
             guides(fill=F, color = F) + 
                scale_fill_gradientn(colors = rev(colorblind_vector(13))) 
        ggsave(path = "DataAnalysis/CellType/CD4.Tcells/markers/selected", file = paste0( genelist[i], "_prop.pdf"), plot, height=3, width=3.25)
        }
    }

```

We can also look at the top markers for each cluster by using the *FindAllMarkers()* call.

```{r}
All.markers <- FindAllMarkers(Tcells, assay = "RNA", pseudocount.use = 0.1, only.pos = T) 
write.table(All.markers, file = "DataAnalysis/CellType/CD4.Tcells/markers/FindAllMarkers_output.txt", col.names=NA, sep="\t",append=F)
```

Graphing the top markers for each cluster onto the UMAP using schex from the above *FindAllMarkers()* call.

```{r}
All.markers <- read.delim("./DataAnalysis/CellType/CD4.Tcells/markers/FindAllMarkers_output.txt")
top10 <- All.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
top10 <- top10$gene #just want the IDs

dir.create("DataAnalysis/CellType/CD4.Tcells/markers/TopClusterMarkers")
for (i in seq_along(top10)) {
    plot <- plot_hexbin_feature(Tcells, feature = top10[i], type = "counts", action = "prop_0")+ 
             guides(fill=F, color = F) + 
                scale_fill_gradientn(colors = rev(colorblind_vector(13))) 
        ggsave(path = "DataAnalysis/CellType/CD4.Tcells/markers/TopClusterMarkers", file = paste0("Top10markers", "_", top10[i], "_prop.pdf"), plot, height=3, width=3.25)
}
```

Graphing the top markers for each cluster using the *DotPlot()* function from the above *FindAllMarkers()* call.

```{r}
top10 <- All.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DotPlot(Tcells, features = unique(top10$gene)) + coord_flip() + 
    scale_color_gradientn(colors = rev(colorblind_vector(11))) + 
    guides(color = F, size = F) +
    scale_size(range = c(0.5,3.5))
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/markers", file = "Top10_dotplot.eps", height=8, width=3)
```

Here are some selected markers for T cell biology that are interesting that I also want to take a look at by cluster. 

```{r}
genes <- c("CCR7", "LEF1", "SELL", "TCF7", "CD27", "CD28", "CD40LG", "CD44", "CD69", "IL2RA", "TFRC","TNFRSF4", "TNFRSF8", "CCL3", "CCL4","GZMA", "GZMB", "GZMK", "IFNG","NKG7", "PRF1", "CTLA4", "HAVCR2", "LAG3", "PDCD1", "TIGIT")

DotPlot(Tcells, features = genes) + coord_flip() + 
    scale_color_gradientn(colors = rev(colorblind_vector(11))) + 
    guides(color = F, size = F) +
    scale_size(range = c(0.5,3.5))
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/markers", file = "SelectMarkers_dotplot.eps", height=6, width=2.5)
```

***

# Differential Gene Expression

## Comparing tumor vs peripheral blood in just the ccRCC samples

### Overall comparison

```{r}
dir.create("DataAnalysis/CellType/CD4.Tcells/DGE")
TumorOnly <- subset(Tcells, orig.ident == "P1" | orig.ident == "P2" | orig.ident == "P3")
TumorOnly  <- NormalizeData(TumorOnly , assay = "RNA")
TumorOnly <- SetIdent(TumorOnly, value = TumorOnly@meta.data$type)
markers <- FindMarkers(object = TumorOnly, 
                only.pos = F, ident.1 = "T", ident.2="P", min.diff.pct = -Inf, 
                logfc.threshold = -Inf, min.pct = -Inf)
markers$names <- rownames(markers)
markers  <- markers  %>%
        mutate(Difference = pct.1 - pct.2)
mat <- data.frame(markers)
  write.table(mat, file="./DataAnalysis/CellType/CD4.Tcells/DGE/Overall_differentialMarkers_TvPB.txt",
                sep="\t",append=F, row.names = FALSE)
```

### By Cluster comparison
```{r}

DefaultAssay(TumorOnly) <- "RNA"
TumorOnly <- subset(Tcells, orig.ident == "P1" | orig.ident == "P2" | orig.ident == "P3")
clusters <- as.character(unique(TumorOnly@active.ident))
mat <- NULL
for (i in c(1,2,4,5,8)) {
    tmp <- subset(TumorOnly, idents = clusters[i])
    tmp <- NormalizeData(tmp, assay = "RNA") #after subset always normalize RNA
    tmp <- SetIdent(tmp, value = tmp@meta.data$type)
    markers <- FindMarkers(object = tmp, 
                only.pos = F, ident.1 = "T", ident.2="P", min.diff.pct = -Inf, 
                logfc.threshold = -Inf, min.pct = -Inf)
    markers$names <- rownames(markers)
    markers  <- markers  %>%
        mutate(Difference = pct.1 - pct.2)
    
    mat <- data.frame(markers)
    write.table(mat, file=paste("./DataAnalysis/CellType/CD4.Tcells/DGE/TvPB_differentialMarkers_", clusters[i], ".txt", sep=""),
                sep="\t",append=F, row.names = FALSE)
    mat=NULL
}
```

## Comparing tumor vs normal kidney parenchyma

### Overall comparison

```{r}
dir.create("DataAnalysis/CellType/CD4.Tcells/DGE")
subset <- subset(Tcells, Run == "P1_T" |  Run == "P2_T" |  Run == "P3_T" |  Run == "N2_K" |  Run == "N3_K" |  Run == "N4_K")
subset <- NormalizeData(subset  , assay = "RNA")
subset <- SetIdent(subset, value = subset@meta.data$type)
markers <- FindMarkers(object = subset , 
                only.pos = F, ident.1 = "T", ident.2="K", min.diff.pct = -Inf, 
                logfc.threshold = -Inf, min.pct = -Inf)
markers$names <- rownames(markers)
markers  <- markers  %>%
        mutate(Difference = pct.1 - pct.2)
mat <- data.frame(markers)
  write.table(mat, file="./DataAnalysis/CellType/CD4.Tcells/DGE/Overall_differentialMarkers_TvK.txt",
                sep="\t",append=F, row.names = FALSE)
```

### By Cluster comparison
```{r}
subset <- subset(Tcells, Run == "P1_T" |  Run == "P2_T" |  Run == "P3_T" |  Run == "N2_K" |  Run == "N3_K" |  Run == "N4_K")
clusters <- as.character(unique(subset@active.ident))
DefaultAssay(subset) <- "RNA"


mat <- NULL
for (i in seq_along(clusters)) {
    tmp <- subset(subset, idents = clusters[i])
    tmp <- NormalizeData(tmp, assay = "RNA") #after subset always normalize RNA
    tmp <- SetIdent(tmp, value = tmp@meta.data$type)
    markers <- FindMarkers(object = tmp, 
                only.pos = F, ident.1 = "T", ident.2="K", min.diff.pct = -Inf, 
                logfc.threshold = -Inf, min.pct = -Inf)
    markers$names <- rownames(markers)
    markers  <- markers  %>%
        mutate(Difference = pct.1 - pct.2)
    
    mat <- data.frame(markers)
    write.table(mat, file=paste("./DataAnalysis/CellType/CD4.Tcells/DGE/TvK_differentialMarkers_", clusters[i], ".txt", sep=""),
                sep="\t",append=F, row.names = FALSE)
    mat=NULL
}
```

## Comparing peripheral blood of patients to peripheral blood of normal healthy

```{r}
subset <- subset(Tcells, Run == "P1_P" |  Run == "P2_P" |  Run == "P3_P" |  Run == "N1_P")
subset@meta.data$ind <- ifelse(subset@meta.data$Run == "N1_P", "Con", "Tum")
subset <- NormalizeData(subset  , assay = "RNA")
subset <- SetIdent(subset, value = subset@meta.data$ind)
markers <- FindMarkers(object = subset , 
                only.pos = F, ident.1 = "Tum", ident.2="Con", min.diff.pct = -Inf, 
                logfc.threshold = -Inf, min.pct = -Inf)
markers$names <- rownames(markers)
markers  <- markers  %>%
        mutate(Difference = pct.1 - pct.2)
mat <- data.frame(markers)
  write.table(mat, file="./DataAnalysis/CellType/CD4.Tcells/DGE/Overall_differentialMarkers_TumvCon.txt",
                sep="\t",append=F, row.names = FALSE)
```

## Visualizing the differential gene expression

Here we are going to load the differential results from above into a list - marker_list and loop through the visualizations. 

```{r}

file_list <- list.files("./DataAnalysis/CellType/CD4.Tcells/DGE/", pattern = ".txt")
files <- file.path(paste0("./DataAnalysis/CellType/CD4.Tcells/DGE/", file_list))

marker_list <- list()
for (i in 1:length(files)) {
    marker_list[[i]] <- read.delim(files[i])
}

prefix <- suffix <- stringr::str_split(file_list, "_", simplify = T)[,1]
suffix <- stringr::str_split(file_list, "_", simplify = T)[,3] #Isolate just the cell type
suffix <- stringr::str_remove(suffix, ".txt") #remove .txt
suffix <- stringr::str_remove(suffix, " ") #remoce spaces
names <- paste0(prefix, "_", suffix)
names(marker_list) <- names #assign the cell type to each element of the list
```

This loop will 
+  filter for significant genes that are upregulated or downregulated (not filtered by log-fold change).
+  select the top 20 genes will be selected based on the weight of log-fold change and 4*percent difference.
+  visualize the differential genes as 1) scatter plot with percent difference vs. log-fold change and 2) traditional volcano plot.

```{r}
library(ggrepel)
dir.create("DataAnalysis/CellType/CD4.Tcells/DGE/visualizations")

for (i in seq_along(marker_list)) {
tmp <- marker_list[[i]]
tmp <- tmp %>%
    mutate(Trend = ifelse(p_val_adj <= 0.05 & avg_logFC > 0, "Up",
                    ifelse(p_val_adj <= 0.05 & avg_logFC < 0, "Down", "None")))
filter <- subset(tmp, p_val_adj <= 0.05 & avg_logFC > 0)
top20 <- filter %>% top_n(n =20, wt = avg_logFC + 4*Difference)

ggplot(tmp, aes(x=Difference, y=avg_logFC)) + 
    geom_point(data=subset(tmp, Trend == "Up" | Trend == "Down"), aes(color = Trend), size=0.75) + 
    geom_point(data=subset(tmp, Trend != "Up" & Trend != "Down"), size=0.5, color="#999999") + 
  theme_classic() + 
    geom_hline(yintercept = 0, lty = 2) + 
    geom_vline(xintercept = 0, lty = 2)  + 
    geom_text_repel(data=subset(tmp, names %in% top20$names), aes(label=names), segment.size = 0.25, size=2.5) + 
    scale_color_manual(values = rev(colorblind_vector(2)))+
    guides(color=F)
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/DGE/visualizations", file = paste0("pctDifference_vs_foldchange_", names(marker_list[i]), ".pdf"), height = 3.5, width=2.5) 

tmp <- tmp %>%
    mutate(p_val_adj = ifelse(p_val_adj == 0, min(p_val_adj), p_val_adj)) #mutate the p-values that are lower than detection

ggplot(tmp, aes(x=avg_logFC, y=-log10(p_val_adj))) + 
    geom_point(size=0.5, color="#999999") + 
    geom_point(data=subset(tmp, Trend == "Up" | Trend == "Down"), aes(color = Trend), size=0.75) + 
    theme_classic() + 
    scale_y_sqrt() +
    geom_vline(xintercept = 0, lty = 2) + 
    geom_hline(yintercept = 1.3, lty = 2) + 
    geom_text_repel(data=subset(tmp, names %in% top20$names), aes(label=names), segment.size = 0.25, size=2.5) + 
    scale_color_manual(values = rev(colorblind_vector(2))) + 
    guides(color = F)
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/DGE/visualizations", file = paste0("VolcanoPlot_", names(marker_list[i]), ".pdf"), height = 3.5, width=2.5)
}
```

***

# Trajectory

Here we are going to use the embedding of the subclusters to generate a trajectory using the slingshot R package. Slingshot does not work directly with Seurat, so we will convert into the Bioconductor format, singlecellexperiment and then use slingshot.

```{r}
library(slingshot)
dir.create("./DataAnalysis/CellType/CD4.Tcells/slingshot")

sce <- as.SingleCellExperiment(Tcells) 

sds <- slingshot(Embeddings(Tcells, "umap"), clusterLabels = Tcells@active.ident, allow.breaks = TRUE, stretch = 0, reducedDim = "UMAP") #Calcualting the trajectory
```

In order to visualize the UMAP with the computed trajectories overlaid, we need a quick function cell_pal and then can assign cluster colors by using the scales *hue_pal()* function.

```{r}
library(scales)
#Making plots more siminmar to ggplot outputs of Seurat
cell_pal <- function(cell_vars, pal_fun,...) {
  if (is.numeric(cell_vars)) {
    pal <- pal_fun(100, ...)
    return(pal[cut(cell_vars, breaks = 100)])
  } else {
    categories <- sort(unique(cell_vars))
    pal <- setNames(pal_fun(length(categories), ...), categories)
    return(pal[cell_vars])
  }
}
#We need color palettes Leiden clusters. These would be the same colors seen in the Seurat plots.

cell_colors_clust <- cell_pal(Tcells@active.ident, hue_pal())

pdf("./DataAnalysis/CellType/CD4.Tcells/slingshot/Trajectory.pdf", height=4, width=4)
plot(reducedDim(sds), col = cell_colors_clust, pch = 16, cex = 0.25)
lines(sds, lwd = 2, type = 'lineages', col = 'black')
dev.off()
pdf("./DataAnalysis/CellType/CD4.Tcells/slingshot/Trajectory2.pdf", height=4, width=4)
plot(reducedDim(sds), col = cell_colors_clust, pch = 16, cex = 0.5)
lines(sds, lwd = 2, col = 'black')
dev.off()
```

We can also see the pseudotime variables across each of the curves. 

```{r}
nc <- 2
pt <- slingPseudotime(sds)
nms <- colnames(pt)
nr <- ceiling(length(nms)/nc)
pal <- colorblind_vector(100)
par(mfrow = c(nr, nc))
pdf("./DataAnalysis/CellType/CD4.Tcells/slingshot/Trajectory2_pseudotime.pdf", height=4, width=4)
for (i in nms) {
  colors <- pal[cut(pt[,i], breaks = 100)]
  plot(reducedDim(sds), col = colors, pch = 16, cex = 0.5, main = i)
  lines(sds, lwd = 2, col = 'black')
}
dev.off()
```

***

# Clonotype Analysis

During the integration step, clonotypes have already been assigned across all T cells. We can take advantage of this using the [scRepertoire](https://github.com/ncborcherding/scRepertoire) package we wrote. 

```{r}
library(scRepertoire)
dir.create("./DataAnalysis/CellType/CD4.Tcells/clonotype")
```

Using the clonotype groupings calculated in the integration step to view along the UMAP of the CD4+ T cells.

```{r}
Tcells@meta.data$cloneType <- factor(Tcells@meta.data$cloneType, levels = c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)", "Small (1 < X <= 5)", "Single (0 < X <= 1)", NA))

DimPlot(Tcells, group.by = "cloneType") + scale_color_manual(values = c(colorblind_vector(5)), na.value="grey")
ggsave(path = "DataAnalysis/CellType/CD4.Tcells/clonotype", filename="IntegratedObject_byClonotypeFreq.eps", width=6.5, height=3)
```


Now we can get a little more granularity on CD4+ T cells clonotype dynamics - I will isolate both all cells and the CD4+ T cells from the tumor-only. Run will be just the sequencing run - so 6 total P1_P, P1_T, etc

```{r}
TumorOnly <- subset(Tcells, orig.ident == "P1" | orig.ident ==  "P2" | orig.ident ==  "P3")
CD4 <- expression2List(Tcells, group = "cluster")
TumorOnly  <- expression2List(TumorOnly,  group = "cluster")
Run  <- expression2List(TumorOnly,  group = "Run")
```

Please see the manuscript that outlines the [scRepertoire package](https://f1000research.com/articles/9-47/v2), these are just the basic visualizations. 

```{r}
quantContig(CD4, cloneCall="gene+nt", scale = TRUE) + guides(fill = F)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/UniqueClonotypes.pdf", height=2, width=2.5)

quantContig(TumorOnly, cloneCall="gene+nt", scale = TRUE) + guides(fill = F)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/UniqueClonotype_TumorOnlys.pdf", height=2, width=2.5)

compareClonotypes(CD4, numbers = 15, 
                    cloneCall="gene+nt", graph = "alluvial")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Alluvial_compare.pdf", height=4, width=20)

compareClonotypes(TumorOnly, numbers = 15, 
                    cloneCall="gene+nt", graph = "alluvial")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Alluvial_compare_TumorOnly.pdf", height=4, width=20)

clonalHomeostasis(CD4, cloneCall = "gene+nt")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_homeostasis.pdf", height=2, width=5)

clonalProportion(CD4, cloneCall = "gene+nt")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_proportion.pdf", height=2, width=5)

clonalOverlap(CD4, cloneCall = "gene+nt", method = "morisita")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_overlap_morisita.pdf", height=3, width=4)

clonalOverlap(CD4, cloneCall = "gene+nt", method = "overlap")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_overlap_overlap.pdf", height=3, width=4)

clonalHomeostasis(TumorOnly, cloneCall = "gene+nt")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_homeostasis_TumorOnly.pdf", height=2, width=5)

clonalProportion(TumorOnly, cloneCall = "gene+nt")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_proportion_TumorOnly.pdf", height=2, width=5)

clonalOverlap(TumorOnly, cloneCall = "gene+nt", method = "morisita")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_overlap_morisita_TumorOnly.pdf", height=3, width=4)

clonalOverlap(TumorOnly, cloneCall = "gene+nt", method = "overlap")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_overlap_overlap_TumorOnly.pdf", height=3, width=4)

clonalDiversity(CD4, cloneCall = "gene+nt", group = "samples")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_Diversity.pdf", height=3, width=4)

clonalDiversity(TumorOnly, cloneCall = "gene+nt", group = "samples")
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_Diversity_TumorOnly.pdf", height=3, width=4)

alluvialClonotypes(Tcells, cloneCall = "gene", 
                   y.axes = c("Frequency", "type", "newCluster"), 
                   color = "newCluster") 
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_Alluvial.pdf", height=3, width=5)

TumorOnly <- subset(Tcells, orig.ident == c("P1", "P2", "P3"))
alluvialClonotypes(TumorOnly, cloneCall = "gene", 
                   y.axes = c("Frequency", "type", "newCluster"), 
                   color = "newCluster") 
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Clonal_Alluvial_TumorOnly.pdf", height=3, width=5)

quantContig(Run, cloneCall = "gene+nt", scale = T)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/PtUnique_TumorOnly.pdf", height=3, width=5)

clonalProportion(Run, cloneCall = "gene+nt")

CD4_overlap <- clonalOverlap(Run, cloneCall = "gene+nt", exportTable = T)

ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/PtProportion_TumorOnly.pdf", height=3, width=5)

compareClonotypes(Run, samples = c("P1_P", "P1_T"), numbers = 10)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Pt1_alluvialCompare_TumorOnly.pdf", height=3, width=20)

compareClonotypes(Run, samples = c("P2_P", "P2_T"), numbers = 10)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Pt2_alluvialCompare_TumorOnly.pdf", height=3, width=20)

compareClonotypes(Run, samples = c("P3_P", "P3_T"), numbers = 10)
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/Pt3_alluvialCompare_TumorOnly.pdf", height=3, width=20)

clonalOverlap(Run, cloneCall = "gene+nt") + 
    scale_fill_gradientn(colors = rev(colorblind_vector(11)), na.value = "white", limits = c(0,0.15))
ggsave("DataAnalysis/CellType/CD4.Tcells/clonotype/PT_OVERLAP.pdf", height=4, width = 5)

```

***

# ssGSEA

Setting up the directory and loading the escape R package. This is a package we built for single-cell GSEA analysis - can find it on [GitHub](https://github.com/ncborcherding/escape) and has been accepted for Biocondutor.

```{r}
dir.create("./DataAnalysis/CellType/CD4.Tcells/GSEA")
library(escape)
```


Here we can load a delimited text document with columns consisting of list of genes to be formed into gene sets. The loop will separate each column into a list, remove empty and duplicated values and convert each into a gene set using the GSEABase R package. 

```{r}
sign <- read.delim("./data/immune_sign.txt")[,1:20]
full <- as.list(sign)

unique <- names(full) 
list <- list()
for (i in seq_along(unique)) {
    tmp <- full[[i]]
    tmp <- tmp[tmp != ""]
    tmp <- unique(toupper(tmp))
    tmp <- GSEABase::GeneSet(tmp, setName=paste(unique[i]))
    list[[i]] <- tmp
    
}

list <- GSEABase::GeneSetCollection(list)
```


Next we perform the actual ssGSEA step using the *enrichIt()* function, separating the single-cells into groups of 1000.

```{r}
ES2 <- enrichIt(obj = Tcells, gene.sets = list, groups = 1000, cores = 2)
save(ES2, file = "./DataAnalysis/CellType/CD8.Tcells/GSEA/Requested_enrichment.rda")
```

I am also interested in using more established gene sets, so I am going to do the same thing with the Hallmark and C5 library from the from the [Molecular Signature Database](https://www.gsea-msigdb.org/gsea/index.jsp).

```{r}
GS <- getGeneSets(library = c("H", "C5"))
ES <- enrichIt(obj = Tcells, gene.sets = GS, groups = 1000, cores = 4)
save(ES, file = "./DataAnalysis/CellType/CD4.Tcells/GSEA/H_C5_enrichment.rda")
```


We can take the output of *enrichIt()* and either *AddMetaData()* or just merge, chose merge here so the meta data does not get too busy. I isolate just the hallmark pathways and bind with our selected pathways.

```{r}
ES2 <- get(load("./DataAnalysis/CellType/CD4.Tcells/GSEA/Requested_enrichment.rda"))
filter <- ES[,grepl("HALLMARK",colnames(ES))] #Isolate only Hallmark Pathways
filter <- cbind(filter,ES2)
meta <- Tcells[[]]
meta <- merge(meta, filter, by = "row.names")
```

Next we can take the values and get the median values per cluster to display in a heatmap.

```{r}
heatmap <- meta[, c("newCluster", colnames(filter))]
melted <- reshape2::melt(heatmap, id.vars = c("newCluster"))
meanvalues <- melted %>%
  group_by(newCluster, variable) %>%
  summarise(median(value))

matrix <- reshape2::dcast(meanvalues, newCluster ~ variable)
rownames(matrix) <- matrix[,1]
matrix <- matrix[,-1]

pdf("test.pdf", height=12, width=5)
pheatmap::pheatmap(t(matrix), color = rev(colorblind_vector(50)), scale = "row", fontsize_row = 3, cluster_rows = T, cluster_cols = T)
dev.off()

GOI <- c("HALLMARK_INFLAMMATORY_RESPONSE", "T1_Interferon", "T2_Interferon", "Activated", "Exhuasted", "T_Cell_Terminal_Differentiation", "HALLMARK_HYPOXIA", "HALLMARK_IL2_STAT5_SIGNALING","HALLMARK_IL6_JAK_STAT3_SIGNALING","HALLMARK_NOTCH_SIGNALING", "HALLMARK_TGF_BETA_SIGNALING","HALLMARK_HEDGEHOG_SIGNALING","HALLMARK_DNA_REPAIR", "Cytolytic", "Glycolysis", "TCA_cycle", "Treg")
sub <- matrix[,colnames(matrix) %in% GOI]

pdf("./DataAnalysis/CellType/CD4.Tcells/GSEA/SelectGSEAheatmap.pdf", height=4, width=4)
pheatmap::pheatmap(t(sub), color = rev(colorblind_vector(50)), scale = "row", fontsize_row = 3, cluster_rows = T, cluster_cols = F)
dev.off()
```