diff --git a/.DS_Store b/.DS_Store index 5008ddf..3e5a709 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/README.Rmd b/README.Rmd index 846ae7e..19f9055 100644 --- a/README.Rmd +++ b/README.Rmd @@ -77,8 +77,19 @@ We provide several vignettes demonstrating the different types of analysis that We recommend users to start with the following vignette, which demonstrates the different steps in the analysis without too many details yet. This is the recommended vignette to learn the basics of MultiNicheNet. +TEST +TEST +TEST +TEST + +* [MultiNicheNet analysis: MIS-C threewise comparison - step-by-step](vignettes/basic_analysis_steps_MISC.html): `vignette("basic_analysis_steps_MISC", package="multinichenetr")` * [MultiNicheNet analysis: MIS-C threewise comparison - step-by-step](vignettes/basic_analysis_steps_MISC.md): `vignette("basic_analysis_steps_MISC", package="multinichenetr")` +TEST +TEST +TEST +TEST + This vignette provides an example of a comparison between 3 groups. The following vignettes demonstrate how to analyze cell-cell communication differences in other settings. For sake of simplicity, these vignettes also use a MultiNicheNet wrapper function, which encompasses the different steps demonstrated in the previous vignette. These vignettes are the best vignettes to learn how to apply MultiNicheNet to different datastes for addressing different questions. * [MultiNicheNet analysis: MIS-C pairwise comparison - wrapper function](vignettes/pairwise_analysis_MISC.md): `vignette("pairwise_analysis_MISC.Rmd", package="multinichenetr")` diff --git a/README.md b/README.md index 4459619..6b8892c 100644 --- a/README.md +++ b/README.md @@ -111,8 +111,15 @@ with: devtools::install_github("saeyslab/nichenetr") devtools::install_github("saeyslab/multinichenetr") +It is possible that during installation the following warning is thrown: + +“glmmTMB was built with TMB version 1.9.4” “Current TMB version is 1.9.5 + +This warning can be safely ignored since this does not affect +multinichenetr’s installation and functionalities. + multinichenetr is tested via Github Actions version control on Windows, -Linux (Ubuntu) and Mac (most recently tested R version: R 4.3.0.). +Linux (Ubuntu) and Mac (most recently tested R version: R 4.3.1.). ## Learning to use multinichenetr @@ -125,10 +132,17 @@ demonstrates the different steps in the analysis without too many details yet. This is the recommended vignette to learn the basics of MultiNicheNet. +TEST TEST TEST TEST + +- [MultiNicheNet analysis: MIS-C threewise comparison - + step-by-step](vignettes/basic_analysis_steps_MISC.html): + `vignette("basic_analysis_steps_MISC", package="multinichenetr")` - [MultiNicheNet analysis: MIS-C threewise comparison - step-by-step](vignettes/basic_analysis_steps_MISC.md): `vignette("basic_analysis_steps_MISC", package="multinichenetr")` +TEST TEST TEST TEST + This vignette provides an example of a comparison between 3 groups. The following vignettes demonstrate how to analyze cell-cell communication differences in other settings. For sake of simplicity, these vignettes diff --git a/vignettes/.DS_Store b/vignettes/.DS_Store new file mode 100644 index 0000000..7aebd86 Binary files /dev/null and b/vignettes/.DS_Store differ diff --git a/vignettes/add_proteomics_MISC.Rmd b/vignettes/add_proteomics_MISC.Rmd index 5adde27..d74702b 100644 --- a/vignettes/add_proteomics_MISC.Rmd +++ b/vignettes/add_proteomics_MISC.Rmd @@ -457,8 +457,8 @@ Of this data, we will only keep information about ligands and receptors ```{r} olink_df = xlsx::read.xlsx( - "/Users/robinb/Work/current_projects/MISC OLINK/data/summary_difference_diorioMIS-C-vs-HC.xlsx", 1 - ) %>% as_tibble() %>% + url("https://zenodo.org/records/10908003/files/summary_difference_diorioMIS-C-vs-HC.xlsx"), + 1) %>% as_tibble() %>% dplyr::rename(gene = variables, logFC = mean_diff_d0, pval = adj_p_values) %>% dplyr::select(gene, logFC, pval) %>% dplyr::mutate(gene = gene %>% make.names()) %>% diff --git a/vignettes/basic_analysis_steps_MISC.Rmd b/vignettes/basic_analysis_steps_MISC.Rmd index a79ac21..bcf5efc 100644 --- a/vignettes/basic_analysis_steps_MISC.Rmd +++ b/vignettes/basic_analysis_steps_MISC.Rmd @@ -4,6 +4,7 @@ author: "Robin Browaeys" package: "`r BiocStyle::pkg_ver('multinichenetr')`" output: BiocStyle::html_document +output_dir: "/Users/robinb/Work/multinichenetr/vignettes" vignette: > %\VignetteIndexEntry{MultiNicheNet analysis: MIS-C threewise comparison - step-by-step} %\VignetteEngine{knitr::rmarkdown} @@ -26,6 +27,10 @@ knitr::opts_chunk$set( library(BiocStyle) ``` + + In this vignette, you can learn how to perform a MultiNicheNet analysis to compare cell-cell communication between conditions of interest. A MultiNicheNet analysis can be performed if you have multi-sample, multi-condition/group single-cell data. We strongly recommend having at least 4 samples in each of the groups/conditions you want to compare. With less samples, the benefits of performing a pseudobulk-based DE analysis are less clear. For those datasets, you can check and run our alternative workflow that makes use of cell-level sample-agnostic differential expression tools. As input you need a SingleCellExperiment object containing at least the raw count matrix and metadata providing the following information for each cell: the **group**, **sample** and **cell type**. @@ -647,11 +652,11 @@ In multi-sample datasets, we have the opportunity to look whether expression of ```{r} lr_target_prior_cor = lr_target_prior_cor_inference( - receivers_oi = multinichenet_output$prioritization_tables$group_prioritization_tbl$receiver %>% unique(), + receivers_oi = prioritization_tables$group_prioritization_tbl$receiver %>% unique(), abundance_expression_info = abundance_expression_info, celltype_de = celltype_de, grouping_tbl = grouping_tbl, - prioritization_tables = multinichenet_output$prioritization_tables, + prioritization_tables = prioritization_tables, ligand_target_matrix = ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, diff --git a/vignettes/basic_analysis_steps_MISC.html b/vignettes/basic_analysis_steps_MISC.html index 626bcec..ec06285 100644 --- a/vignettes/basic_analysis_steps_MISC.html +++ b/vignettes/basic_analysis_steps_MISC.html @@ -1,1747 +1,1829 @@ - + - - + - + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + + -

MultiNicheNet -analysis: MIS-C threewise comparison - step-by-step

-

Robin Browaeys 2023-06-06

- - -

In this vignette, you can learn how to perform an all-vs-all -MultiNicheNet analysis. In this vignette, we start from one -SingleCellExperiment object containing cells from both sender and -receiver cell types and from different patients.

-

A MultiNicheNet analysis can be performed if you have multi-sample, -multi-group single-cell data. MultiNicheNet will look for cell-cell -communication between the cell types in your data for each sample, and -compare the cell-cell communication patterns between the groups of -interest. Therefore, the absolute minimum of meta data you need to have, -are following columns indicating for each cell: the -group, sample and cell -type.

-

As example expression data of interacting cells, we will here use -scRNAseq data of immune cells in MIS-C patients and healthy siblings -from this paper of Hoste et al.: TIM3+ -TRBV11-2 T cells and IFNγ signature in patrolling monocytes and CD16+ NK -cells delineate MIS-C DOI . MIS-C (multisystem inflammatory syndrome in children) -is a novel rare immunodysregulation syndrome that can arise after -SARS-CoV-2 infection in children. We will use NicheNet to explore immune -cell crosstalk enriched in MIS-C compared to healthy siblings.

-

In this vignette, we will prepare the data and analysis parameters, -and then perform the MultiNicheNet analysis.

-

The different steps of the MultiNicheNet analysis are the -following:

+ +
+ + + + + + +

Contents

+
+ +
+ + +

In this vignette, you can learn how to perform a MultiNicheNet analysis to compare cell-cell communication between conditions of interest. A MultiNicheNet analysis can be performed if you have multi-sample, multi-condition/group single-cell data. We strongly recommend having at least 4 samples in each of the groups/conditions you want to compare. With less samples, the benefits of performing a pseudobulk-based DE analysis are less clear. For those datasets, you can check and run our alternative workflow that makes use of cell-level sample-agnostic differential expression tools.

+

As input you need a SingleCellExperiment object containing at least the raw count matrix and metadata providing the following information for each cell: the group, sample and cell type.

+

As example expression data of interacting cells, we will here use scRNAseq data of immune cells in MIS-C patients and healthy siblings from this paper of Hoste et al.: TIM3+ TRBV11-2 T cells and IFNγ signature in patrolling monocytes and CD16+ NK cells delineate MIS-C DOI. +MIS-C (multisystem inflammatory syndrome in children) is a novel rare immunodysregulation syndrome that can arise after SARS-CoV-2 infection in children. We will use MultiNicheNet to explore immune cell crosstalk enriched in MIS-C compared to healthy siblings and adult COVID-19 patients.

+

In this vignette, we will first prepare the MultiNicheNet core analysis, then run the several steps in the MultiNicheNet core analysis, and finally interpret the output.

+
+

1 Preparation of the MultiNicheNet core analysis

+
library(SingleCellExperiment)
+library(dplyr)
+library(ggplot2)
+library(nichenetr)
+library(multinichenetr)
+
+

1.1 Load NicheNet’s ligand-receptor network and ligand-target matrix

+

MultiNicheNet builds upon the NicheNet framework and uses the same prior knowledge networks (ligand-receptor network and ligand-target matrix, currently v2 version).

+

The Nichenet v2 networks and matrices for both mouse and human can be downloaded from Zenodo DOI.

+

We will read these object in for human because our expression data is of human patients. +Gene names are here made syntactically valid via make.names() to avoid the loss of genes (eg H2-M3) in downstream visualizations.

+
organism = "human"
+
options(timeout = 120)
+
+if(organism == "human"){
+  
+  lr_network_all = 
+    readRDS(url(
+      "https://zenodo.org/record/10229222/files/lr_network_human_allInfo_30112033.rds"
+      )) %>% 
+    mutate(
+      ligand = convert_alias_to_symbols(ligand, organism = organism), 
+      receptor = convert_alias_to_symbols(receptor, organism = organism))
+  
+  lr_network_all = lr_network_all  %>% 
+    mutate(ligand = make.names(ligand), receptor = make.names(receptor)) 
+  
+  lr_network = lr_network_all %>% 
+    distinct(ligand, receptor)
+  
+  ligand_target_matrix = readRDS(url(
+    "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds"
+    ))
+  
+  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  
+  lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix))
+  ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()]
+  
+} else if(organism == "mouse"){
+  
+  lr_network_all = readRDS(url(
+    "https://zenodo.org/record/10229222/files/lr_network_mouse_allInfo_30112033.rds"
+    )) %>% 
+    mutate(
+      ligand = convert_alias_to_symbols(ligand, organism = organism), 
+      receptor = convert_alias_to_symbols(receptor, organism = organism))
+  
+  lr_network_all = lr_network_all  %>% 
+    mutate(ligand = make.names(ligand), receptor = make.names(receptor)) 
+  lr_network = lr_network_all %>% 
+    distinct(ligand, receptor)
+  
+  ligand_target_matrix = readRDS(url(
+    "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds"
+    ))
+  
+  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  
+  lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix))
+  ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()]
+  
+}
+
+
+

1.2 Read in SingleCellExperiment Object

+

In this vignette, we will load in a subset of the scRNAseq data of the MIS-C DOI. For the sake of demonstration, this subset only contains 3 cell types. These celltypes are some of the cell types that were found to be most interesting related to MIS-C according to Hoste et al. 

+

If you start from a Seurat object, you can convert it easily to a SingleCellExperiment object via sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA").

+

Because the NicheNet 2.0. networks are in the most recent version of the official gene symbols, we will make sure that the gene symbols used in the expression data are also updated (= converted from their “aliases” to official gene symbols). Afterwards, we will make them again syntactically valid.

+
sce = readRDS(url(
+  "https://zenodo.org/record/8010790/files/sce_subset_misc.rds"
+  ))
+sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE()
+
+
+

1.3 Prepare the settings of the MultiNicheNet cell-cell communication analysis

+

In this step, we will formalize our research question into MultiNicheNet input arguments.

+
+

1.3.1 Define in which metadata columns we can find the group, sample and cell type IDs

+

In this case study, we want to study differences in cell-cell communication patterns between MIS-C patients (M), their healthy siblings (S) and adult patients with severe covid (A). The meta data columns that indicate this disease status(=group/condition of interest) is MIS.C.AgeTier.

+

Cell type annotations are indicated in the Annotation_v2.0 column, and the sample is indicated by the ShortID column. +If your cells are annotated in multiple hierarchical levels, we recommend using a relatively high level in the hierarchy. This for 2 reasons: 1) MultiNicheNet focuses on differential expression and not differential abundance, and 2) there should be sufficient cells per sample-celltype combination (see later).

+
sample_id = "ShortID"
+group_id = "MIS.C.AgeTier"
+celltype_id = "Annotation_v2.0"
+

Important: It is required that each sample-id is uniquely assigned to only one condition/group of interest. See the vignettes about paired and multifactorial analysis to see how to define your analysis input when you have multiple samples (and conditions) per patient.

+

If you would have batch effects or covariates you can correct for, you can define this here as well. However, this is not applicable to this dataset. Therefore we will use the following NA settings:

+
covariates = NA
+batches = NA
+

Important: for categorical covariates and batches, there should be at least one sample for every group-batch combination. If one of your groups/conditions lacks a certain level of your batch, you won’t be able to correct for the batch effect because the model is then not able to distinguish batch from group/condition effects.

+

Important: The column names of group, sample, cell type, batches and covariates should be syntactically valid (make.names)

+

Important: All group, sample, cell type, batch and covariate names should be syntactically valid as well (make.names) (eg through SummarizedExperiment::colData(sce)$ShortID = SummarizedExperiment::colData(sce)$ShortID %>% make.names())

+
+
+

1.3.2 Define the contrasts of interest.

+

Here, we want to compare each patient group to the other groups, so the MIS-C (M) group vs healthy control siblings (S) and adult COVID19 patients (A) (= M vs S+A) and so on. We want to know which cell-cell communication patterns are specific for the M vs A+S group, the A vs M+S group and the S vs A+M group.

+

To perform this comparison, we need to set the following contrasts:

+
contrasts_oi = c("'M-(S+A)/2','S-(M+A)/2','A-(S+M)/2'")
+

Very Important Note the format to indicate the contrasts! This formatting should be adhered to very strictly, and white spaces are not allowed! Check ?get_DE_info for explanation about how to define this well. The most important points are that: +each contrast is surrounded by single quotation marks +contrasts are separated by a comma without any white space +*all contrasts together are surrounded by double quotation marks.

+

If you compare against two groups, you should divide by 2 (as demonstrated here), if you compare against three groups, you should divide by 3 and so on.

+

For downstream visualizations and linking contrasts to their main condition, we also need to run the following: +This is necessary because we will also calculate cell-type+condition specificity of ligands and receptors.

+
contrast_tbl = tibble(contrast = 
+                        c("M-(S+A)/2","S-(M+A)/2", "A-(S+M)/2"), 
+                      group = c("M","S","A"))
+

If you want to compare only two groups (eg M vs S), you can use the following: +contrasts_oi = c("'M-S','S-M'") +contrast_tbl = tibble(contrast = c("M-S","S-M"), group = c("M","S"))

+

Other vignettes will demonstrate how to formalize different types of research questions.

+
+
+

1.3.3 Define the sender and receiver cell types of interest.

+

If you want to focus the analysis on specific cell types (e.g. because you know which cell types reside in the same microenvironments based on spatial data), you can define this here. If you have sufficient computational resources and no specific idea of cell-type colocalzations, we recommend to consider all cell types as potential senders and receivers. Later on during analysis of the output it is still possible to zoom in on the cell types that interest you most, but your analysis is not biased to them.

+

Here we will consider all cell types in the data:

+
senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
+receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
+sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% 
+            c(senders_oi, receivers_oi)
+          ]
+

In case you would have samples in your data that do not belong to one of the groups/conditions of interest, we recommend removing them and only keeping conditions of interst:

+
conditions_keep = c("M", "S", "A")
+sce = sce[, SummarizedExperiment::colData(sce)[,group_id] %in% 
+            conditions_keep
+          ]
+
+
+
+
+

2 Running the MultiNicheNet core analysis

+

Now we will run the core of a MultiNicheNet analysis. This analysis consists of the following steps:

-

In this vignette, we will demonstrate all these steps one by one.

-

After the MultiNicheNet analysis is done, we will explore the output -of the analysis with different ways of visualization.

-

Step -0: Preparation of the analysis: load packages, NicheNet LR network & -ligand-target matrix, single-cell expression data

-

Step -0.1: Load required packages and NicheNet ligand-receptor network and -ligand-target matrix

-
library(SingleCellExperiment)
-library(dplyr)
-library(ggplot2)
-library(multinichenetr)
-

The Nichenet v2 networks and matrices for both mouse and human can be -downloaded from Zenodo DOI.

-

We will read these object in for human because our expression data is -of human patients. Gene names are here made syntactically valid via -make.names() to avoid the loss of genes (eg H2-M3) in -downstream visualizations.

-
organism = "human"
-if(organism == "human"){
-  lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_human_21122021.rds"))
-  lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor) %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor))
-  ligand_target_matrix = readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds"))
-  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% make.names()
-  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% make.names()
-} else if(organism == "mouse"){
-  lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_mouse_21122021.rds"))
-  lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor) %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor))
-  ligand_target_matrix = readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds"))
-  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% make.names()
-  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% make.names()
-}
-

Step 0.2: Read in -SingleCellExperiment Objects

-

In this vignette, sender and receiver cell types are in the same -SingleCellExperiment object, which we will load here. In this vignette, -we will load in a subset of the scRNAseq data of the MIS-C DOI. For the sake of demonstration, this subset only -contains 3 cell types. These celltypes are some of the cell types that -were found to be most interesting related to MIS-C according to Hoste et -al. 

-

If you start from a Seurat object, you can convert it easily to a -SingleCellExperiment via -sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA").

-

Because the NicheNet 2.0. networks are in the most recent version of -the official gene symbols, we will make sure that the gene symbols used -in the expression data are also updated (= converted from their -“aliases” to official gene symbols). Afterwards, we will make them again -syntactically valid.

-
sce = readRDS(url("https://zenodo.org/record/8010790/files/sce_subset_misc.rds"))
-sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE()
-## [1] "there are provided symbols that are not in the alias annotation table: "
-##    [1] "AL627309.1"  "AL669831.5"  "AL645608.8"  "AL645608.2"  "AL162741.1"  "AL391244.3"  "AL645728.1"  "AL691432.2"  "FO704657.1"  "AL109917.1"  "AL590822.2"  "AL139246.5"  "AL139246.3"  "AL513320.1"  "AL805961.2" 
-##   [16] "AL365255.1"  "AL031848.2"  "Z98884.1"    "AL034417.4"  "AL034417.3"  "AL096855.1"  "AL139424.2"  "AL354956.1"  "AL139423.1"  "AL109811.3"  "AL109811.1"  "AL109811.2"  "AC243836.1"  "AC254633.1"  "AL031283.1" 
-##   [31] "AL121992.3"  "AL121992.1"  "AL450998.2"  "AL137802.2"  "AL021920.3"  "BX284668.2"  "BX284668.5"  "BX284668.6"  "AL035413.1"  "AL031727.2"  "AL020998.1"  "AL031005.1"  "AL031728.1"  "AL031281.2"  "AL445686.2" 
-##   [46] "AL445471.1"  "AL445471.2"  "AL031432.5"  "AL031432.3"  "AL606491.1"  "AL031280.1"  "AL020996.1"  "AL033528.2"  "AL391650.1"  "AL512408.1"  "BX293535.1"  "AL512288.1"  "AL353622.1"  "AL353622.2"  "AL360012.1" 
-##   [61] "AC114488.2"  "AC114488.3"  "AL136115.2"  "AL445248.1"  "AL049795.1"  "AL662907.3"  "AC114490.3"  "AC114490.2"  "AC004865.2"  "AL591845.1"  "AL139260.1"  "AL033527.3"  "AL033527.5"  "AL050341.2"  "AL603839.3" 
-##   [76] "AC098484.3"  "AL512353.1"  "AL139289.2"  "AL139289.1"  "AL451062.3"  "AL357079.1"  "AL139220.2"  "AL592166.1"  "AC104170.1"  "AL050343.1"  "AL445685.1"  "AL606760.2"  "AL606760.3"  "AL606760.1"  "AC119428.2" 
-##   [91] "AL161644.1"  "AC105277.1"  "AC099791.2"  "AL357078.1"  "AL583808.1"  "AC118549.1"  "AC103591.3"  "AL359504.2"  "AL078459.1"  "AL049597.2"  "AC099063.4"  "AC099063.1"  "AC099568.2"  "AC104836.1"  "AC105942.1" 
-##  [106] "AC092802.1"  "AC118553.1"  "AC118553.2"  "AC104506.1"  "AC093157.2"  "AC093157.1"  "AL109741.1"  "AL390036.1"  "AL359258.2"  "AL356488.3"  "AL356488.2"  "SARS"        "AC000032.1"  "AL450468.2"  "AL160006.1" 
-##  [121] "AL355488.1"  "AL365361.1"  "AL360270.1"  "AL360270.3"  "AL355816.2"  "AL354760.1"  "AL603832.1"  "AL357055.3"  "AL137856.1"  "AL390066.1"  "AL445231.1"  "AL157902.1"  "AL359915.2"  "AC244453.2"  "AC244021.1" 
-##  [136] "AL592494.3"  "AC239800.2"  "AC239800.3"  "AC245595.1"  "AC246785.3"  "FP700111.1"  "AC245014.3"  "AC245014.1"  "AC243547.3"  "AC243547.2"  "NOTCH2NL"    "AC239799.2"  "AC242426.2"  "AC239809.3"  "AC245297.3" 
-##  [151] "AC245297.2"  "AC239868.3"  "AC239868.2"  "AL590133.2"  "AL391069.2"  "AL391069.3"  "AL391335.1"  "AL589765.7"  "AL450992.2"  "AL162258.2"  "AL358472.5"  "AL358472.4"  "AL358472.3"  "AL358472.2"  "AL451085.1" 
-##  [166] "AL451085.2"  "AC234582.1"  "AC234582.2"  "AL353807.2"  "AL355388.2"  "AL139412.1"  "AL590560.1"  "AL590560.2"  "AL121987.2"  "AL139011.1"  "AL138930.1"  "AL121985.1"  "AL591806.3"  "AL590714.1"  "AL592295.4" 
-##  [181] "AL359541.1"  "AL596325.1"  "AL356441.1"  "AL451074.2"  "AL451050.2"  "AL359962.2"  "AL031733.2"  "AL021068.1"  "AL031599.1"  "AL645568.1"  "AL121983.1"  "AL121983.2"  "Z99127.1"    "AL513329.1"  "AL359265.3" 
-##  [196] "AL137796.1"  "AL449106.1"  "AL353708.3"  "AL353708.1"  "AL162431.2"  "AL445228.2"  "AL078644.2"  "AL133383.1"  "AL390957.1"  "AL136987.1"  "AL136454.1"  "AL157402.2"  "AC096677.1"  "AC099676.1"  "AC098934.4" 
-##  [211] "AL512306.3"  "AL512306.2"  "AC244034.2"  "AL137789.1"  "AL031316.1"  "AL606468.1"  "AL451060.1"  "AL360091.3"  "AC092803.2"  "AL590648.3"  "AC096642.1"  "AL513314.2"  "AL592148.3"  "AL392172.1"  "AL359979.2" 
-##  [226] "AC138393.3"  "AC092809.4"  "AC092811.2"  "AL591895.1"  "AL512343.2"  "AL353593.1"  "AL353593.2"  "AL670729.1"  "AL117350.1"  "AL121990.1"  "AL445524.1"  "AL355472.3"  "AL355472.4"  "AL355472.1"  "AL160408.2" 
-##  [241] "AL391832.2"  "AL391832.3"  "AL359921.2"  "AL359921.1"  "BX323046.1"  "AL356512.1"  "AL591848.3"  "AL390728.6"  "AL390728.5"  "AC138089.1"  "AC098483.1"  "AC092159.2"  "AC093390.2"  "AC231981.1"  "AC108488.1" 
-##  [256] "AC017076.1"  "AC010904.2"  "AC092580.2"  "AC073195.1"  "AC010969.2"  "AC104794.2"  "AC007240.1"  "AC007249.2"  "AC062028.1"  "AC079148.1"  "AC013400.1"  "AC079145.1"  "AC098828.2"  "AC012065.3"  "AC012065.4" 
-##  [271] "AC018742.1"  "AC012073.1"  "AC012074.1"  "AC104699.1"  "AC013472.3"  "AC013403.2"  "AC074117.1"  "AC093690.1"  "AC104695.2"  "AC104695.3"  "AC092164.1"  "AC016907.2"  "AL121652.1"  "AL121658.1"  "AL133245.1" 
-##  [286] "AC007378.1"  "AC006369.1"  "AC074366.1"  "AC019171.1"  "AC007388.1"  "AC083949.1"  "AC010883.1"  "AC019129.2"  "AC018682.1"  "AC016722.3"  "AC073283.1"  "AC079807.1"  "AC092650.1"  "AC093635.1"  "AC008280.3" 
-##  [301] "AC093110.1"  "AC012358.3"  "AC015982.1"  "AC007743.1"  "AC016727.1"  "AC107081.2"  "AC009501.1"  "AC012368.1"  "AC008074.3"  "AC007365.1"  "AC017083.3"  "AC017083.2"  "AC017083.1"  "AC022201.2"  "AC007881.3" 
-##  [316] "AC007878.1"  "AC073046.1"  "AC073263.1"  "AC073263.2"  "AC006030.1"  "AC005041.1"  "AC019069.1"  "AC104135.1"  "AC005034.3"  "AC009309.1"  "AC015971.1"  "AC133644.2"  "AC104134.1"  "AC062029.1"  "AC103563.7" 
-##  [331] "AC092835.1"  "AC009237.14" "AC021188.1"  "AC092683.1"  "AC109826.1"  "AC079447.1"  "AC092587.1"  "AC018690.1"  "AC092667.1"  "AC016738.1"  "AC104655.1"  "AC012360.1"  "AC012360.3"  "AC010978.1"  "AC013271.1" 
-##  [346] "AC108463.3"  "AC068491.3"  "AC017002.3"  "AC017002.1"  "AC079922.2"  "AC016683.1"  "AC016745.2"  "AC104653.1"  "AC110769.2"  "AC009312.1"  "AC009404.1"  "AC093901.1"  "AC012447.1"  "AC068282.1"  "AC012306.2" 
-##  [361] "AC073869.3"  "AC011893.1"  "AC114763.1"  "AC092620.1"  "AC007364.1"  "AC091488.1"  "AC009506.1"  "AC009961.1"  "AC009299.3"  "AC008063.1"  "AC009495.3"  "AC007405.3"  "AC078883.2"  "AC078883.3"  "AC078883.1" 
-##  [376] "AC010894.2"  "AC010894.4"  "AC093459.1"  "AC079305.1"  "AC073834.1"  "AC009948.1"  "AC009948.4"  "AC009948.3"  "AC010680.3"  "AC010680.2"  "AC010680.1"  "AC068196.1"  "AC009962.1"  "AC096667.1"  "AC017071.1" 
-##  [391] "AC013468.1"  "AC108047.1"  "AC005540.1"  "AC064834.1"  "AC114760.2"  "AC020571.1"  "AC013264.1"  "AC010746.1"  "AC011997.1"  "AC007163.1"  "AC005037.1"  "AC064836.3"  "AC007383.2"  "AC007383.3"  "AC008269.1" 
-##  [406] "AC007879.2"  "AC009226.1"  "AC007038.2"  "AC007038.1"  "AC079610.2"  "AC079610.1"  "AC068051.1"  "AC016708.1"  "AC098820.2"  "AC021016.3"  "AC012510.1"  "AC009974.1"  "AC097468.3"  "AC068946.1"  "AC053503.2" 
-##  [421] "AC079834.2"  "AC097461.1"  "AC097662.1"  "AC009950.1"  "AC012507.1"  "AC012507.4"  "AC073254.1"  "AC105760.1"  "AC105760.2"  "AC112721.2"  "AC104667.1"  "AC012485.3"  "AC016999.1"  "AC062017.1"  "SEPT2"      
-##  [436] "AC114730.1"  "AC131097.4"  "AC093642.1"  "AC024060.1"  "AC018816.1"  "AC026202.2"  "AC026202.3"  "AC026191.1"  "AC018809.2"  "AC034198.2"  "AC093495.1"  "AC090948.1"  "AC090948.2"  "AC090948.3"  "AC116096.1" 
-##  [451] "AC112220.4"  "AC123023.1"  "AC097359.3"  "AC097359.2"  "AC092053.3"  "AC092053.2"  "AC099541.1"  "AC006059.1"  "AC099329.3"  "AC104184.1"  "AC104187.1"  "AC099669.1"  "AC124045.1"  "AC098613.1"  "AC099778.1" 
-##  [466] "AC134772.1"  "AC137630.2"  "AC137630.1"  "AC137630.3"  "QARS"        "AC105935.1"  "U73166.1"    "AC096920.1"  "AC006252.1"  "AC006254.1"  "AC096887.1"  "AC012467.2"  "AC135507.1"  "AC012557.2"  "AC109587.1" 
-##  [481] "AC097634.3"  "AC097634.1"  "AC104435.2"  "AC107204.1"  "AC069222.1"  "AC128688.2"  "AC074044.1"  "AC078785.1"  "AC093010.2"  "AC073352.1"  "AC073352.2"  "AC092910.3"  "AC083798.2"  "AC092903.2"  "AC079848.1" 
-##  [496] "AC112484.3"  "AC112484.1"  "AC108673.2"  "AC108673.3"  "AC107027.1"  "AC117382.2"  "AC096992.2"  "AC097103.2"  "AC026304.1"  "AC020636.1"  "AC108718.1"  "AC084036.1"  "AC106707.1"  "AC080013.6"  "AC080013.1" 
-##  [511] "AC080013.5"  "AC069224.1"  "AC112491.1"  "AC078802.1"  "AC078795.1"  "AC078795.2"  "AC008040.1"  "AC008040.5"  "AC007823.1"  "AC007620.2"  "AC125618.1"  "AC069431.1"  "AC092953.2"  "AC131235.3"  "AC068631.1" 
-##  [526] "AC112907.3"  "AC069213.1"  "AC233280.1"  "AC055764.2"  "AC055764.1"  "AC107464.1"  "AC107464.3"  "AC139887.4"  "AC139887.2"  "AC139887.1"  "AC092535.4"  "AC147067.1"  "AC016773.1"  "AL158068.2"  "AC105415.1" 
-##  [541] "AC093323.1"  "AC097382.2"  "AC104825.1"  "AC105345.1"  "AC099550.1"  "AC116651.1"  "AC006160.1"  "AC133961.1"  "AC106047.1"  "AC104078.1"  "AC108471.2"  "AC095057.2"  "AC111006.1"  "AC096734.2"  "AC096586.1" 
-##  [556] "AC107068.1"  "AC027271.1"  "AC110792.3"  "AC068620.1"  "AC068620.2"  "AC104806.2"  "AC053527.1"  "AC093677.2"  "AC098818.2"  "AC124016.2"  "AC124016.1"  "AC093827.5"  "AC093827.4"  "AC083829.2"  "AC019077.1" 
-##  [571] "AC114811.2"  "AC019131.2"  "AP002026.1"  "AC097460.1"  "AP001816.1"  "AC098487.1"  "AF213884.3"  "AC018797.2"  "AC004069.1"  "AC109361.1"  "AC096564.1"  "AC096564.2"  "AC004067.1"  "AC126283.1"  "AC109347.1" 
-##  [586] "AC106864.1"  "AC110079.2"  "AC108866.1"  "AC108062.1"  "AC096711.3"  "AC097376.2"  "AC112236.2"  "AC096733.2"  "AC097504.2"  "AC139720.1"  "AC104596.1"  "AC093835.1"  "AC095055.1"  "AC097375.1"  "AC106882.1" 
-##  [601] "AC105285.1"  "AC097534.1"  "AC097534.2"  "AC019163.1"  "AC078881.1"  "AC107214.1"  "AC084871.2"  "AC106897.1"  "AC097652.1"  "AC021087.1"  "AC106772.1"  "AC026740.1"  "AC116351.1"  "AC026412.3"  "AC091965.1" 
-##  [616] "AC091965.4"  "AC034229.4"  "AC012640.2"  "AC010491.1"  "AC022113.1"  "AC022113.2"  "AC026785.2"  "AC025181.2"  "AC034231.1"  "AC026801.2"  "AC137810.1"  "AC008957.1"  "AC026741.1"  "AC008875.1"  "AC025171.3" 
-##  [631] "AC025171.5"  "AC025171.2"  "AC025171.4"  "AC114947.2"  "AC114956.1"  "AC093297.2"  "AC008966.1"  "AC008914.1"  "AC008892.1"  "AC016596.1"  "AC008937.1"  "AC008937.3"  "AC092343.1"  "AC104113.1"  "AC092354.2" 
-##  [646] "AC025442.1"  "AC024581.1"  "AC010280.2"  "AC146944.4"  "AC035140.1"  "AC008972.2"  "AC099522.2"  "AC010501.1"  "AC116337.3"  "AC008897.2"  "AC010245.2"  "AC026725.1"  "AC012636.1"  "AC010260.1"  "AC008771.1" 
-##  [661] "AC008434.1"  "AC104118.1"  "AC018754.1"  "AC123595.1"  "AC008840.1"  "AC020900.1"  "AC008906.1"  "AC008906.2"  "AC009126.1"  "AC008522.1"  "AC021086.1"  "AC008467.1"  "AC008494.3"  "AC010226.1"  "AC008549.2" 
-##  [676] "AC034236.2"  "AC008629.1"  "AC093535.1"  "AC011416.3"  "AC116366.3"  "AC116366.2"  "AC116366.1"  "AC004775.1"  "AC010240.3"  "AC008608.2"  "AC104109.3"  "AC104109.2"  "AC109454.2"  "AC026691.1"  "AC106791.1" 
-##  [691] "AC113382.1"  "AC011405.1"  "AC135457.1"  "AC008667.1"  "AC008438.1"  "AC005618.1"  "AC008781.1"  "AC091959.3"  "AC091948.1"  "AC131025.1"  "AC008641.1"  "AC011337.1"  "AC011374.2"  "AC091982.3"  "AC009185.1" 
-##  [706] "AC008676.1"  "AC134043.1"  "AC008691.1"  "AC113414.1"  "AC034199.1"  "AC022217.3"  "AC008429.1"  "AC008429.3"  "AC008378.1"  "AC139491.1"  "AC139491.7"  "AC139795.3"  "AC139795.2"  "AC106795.2"  "AC136604.2" 
-##  [721] "AC136604.3"  "AC008393.1"  "AC008610.1"  "AC008443.5"  "AC008443.6"  "AC138035.1"  "AL357054.4"  "AL357054.2"  "AL133351.1"  "AL031963.3"  "AL138831.1"  "AL162718.1"  "AL359643.3"  "AL136307.1"  "AL024498.1" 
-##  [736] "AL157373.2"  "AL008729.1"  "AL008729.2"  "AL441883.1"  "AL109914.1"  "AL138720.1"  "AL136162.1"  "AL137003.1"  "AL137003.2"  "AL138724.1"  "AL031775.1"  "U91328.2"    "U91328.1"    "HIST1H2BC"   "AL353759.1" 
-##  [751] "HIST1H2BE"   "AL031777.3"  "HIST1H2BF"   "HIST1H2BG"   "HIST1H2BI"   "AL121936.1"  "AL513548.3"  "AL513548.1"  "AL009179.1"  "AL121944.1"  "AL358933.1"  "AL049543.1"  "AL645929.2"  "AL662797.1"  "AL662844.4" 
-##  [766] "AL645933.2"  "AL662899.2"  "AL662884.4"  "AL662796.1"  "AL669918.1"  "AL645941.3"  "AL645940.1"  "AL451165.2"  "Z84485.1"    "AL353597.1"  "AL121574.1"  "AL365205.1"  "AL513008.1"  "AL035587.1"  "AL133375.1" 
-##  [781] "AL355802.2"  "AL096865.1"  "AL035701.1"  "AL355353.1"  "AL021368.1"  "AL021368.2"  "AL135905.1"  "AL354719.2"  "AL391807.1"  "AL365226.2"  "AL365226.1"  "AC019205.1"  "AL121972.1"  "AL590428.1"  "AL080250.1" 
-##  [796] "AL359715.3"  "AL359715.1"  "AL359715.2"  "AL139274.2"  "AL049697.1"  "AL353135.1"  "AL096678.1"  "AL121787.1"  "AL132996.1"  "AL589740.1"  "AL513550.1"  "AL137784.2"  "AL133338.1"  "AL133406.3"  "AL133406.2" 
-##  [811] "AL024507.2"  "AL390208.1"  "AL359711.2"  "AL512303.1"  "AC002464.1"  "AL080317.1"  "AL080317.2"  "AL080317.3"  "Z97989.1"    "AL050331.2"  "AL365275.1"  "AL096711.2"  "AL590006.1"  "AL355581.1"  "AL353596.1" 
-##  [826] "AL024508.2"  "AL356234.2"  "AL357060.1"  "AL590617.2"  "AL031772.1"  "AL158850.1"  "AL035446.1"  "AL023806.1"  "AL356599.1"  "AL031056.1"  "AL358852.1"  "AL355312.4"  "AL080276.2"  "AL590867.1"  "AL596202.1" 
-##  [841] "AL355297.3"  "AL355297.4"  "AL391863.1"  "AL035634.1"  "AL627422.2"  "AL035530.2"  "AL356417.3"  "AL356417.1"  "AL139393.2"  "AL022069.1"  "AL159163.1"  "Z94721.1"    "AL121935.1"  "AL121935.2"  "BX322234.1" 
-##  [856] "AL109910.2"  "AC093627.5"  "AC093627.4"  "AC147651.1"  "AC147651.4"  "AC073957.3"  "AC073957.2"  "AC091729.3"  "AC102953.2"  "AC104129.1"  "AC004906.1"  "AC024028.1"  "AC092171.4"  "AC092171.3"  "AC073343.2" 
-##  [871] "AC004982.2"  "AC004948.1"  "AC006042.4"  "AC006042.2"  "AC007029.1"  "AC073333.1"  "AC073332.1"  "AC004130.1"  "AC073072.1"  "AC005082.1"  "MPP6"        "AC005165.1"  "AC004520.1"  "AC004540.1"  "AC004593.3" 
-##  [886] "AC007285.1"  "AC007036.1"  "AC005154.6"  "GARS"        "AC018647.2"  "AC007349.3"  "AC006033.2"  "AC011290.1"  "AC072061.1"  "AC004951.1"  "AC004854.2"  "AC073115.2"  "AC020743.1"  "AC074351.1"  "AC118758.3" 
-##  [901] "AC092634.3"  "AC073349.1"  "AC068533.4"  "AC008267.5"  "AC027644.3"  "AC073335.2"  "AC006480.2"  "AC211476.2"  "AC004921.1"  "AC005076.1"  "AC003991.2"  "AC002383.1"  "AC007566.1"  "AC002454.1"  "AC002451.1" 
-##  [916] "AC004834.1"  "AC004922.1"  "AC073842.2"  "AC073842.1"  "AC092849.1"  "AC069281.1"  "AC105446.1"  "AC006329.1"  "AC093668.3"  "AC005064.1"  "AC007384.1"  "AC005070.3"  "AC073073.2"  "AC007032.1"  "AC004492.1" 
-##  [931] "AC004839.1"  "AC002467.1"  "AC005046.1"  "AC006333.2"  "AC073934.1"  "AC090114.2"  "AC018638.7"  "AC078846.1"  "AC073320.1"  "AC087071.1"  "AC007938.2"  "AC007938.3"  "AC016831.7"  "AC016831.1"  "AC016831.5" 
-##  [946] "AC058791.1"  "AC008264.2"  "AC009275.1"  "AC083862.2"  "AC009542.1"  "AC024084.1"  "AC078845.1"  "AC083880.1"  "AC004918.1"  "AC093673.1"  "AC004889.1"  "AC074386.1"  "AC005229.4"  "AC073314.1"  "AC004877.1" 
-##  [961] "AC092681.3"  "AC005586.1"  "AC073111.5"  "AC021097.1"  "AC006017.1"  "AC093726.2"  "AC144652.1"  "AC009403.1"  "AC005534.1"  "AC005481.1"  "AC011899.2"  "AC011899.3"  "AL732314.6"  "AL732314.4"  "AL672277.1" 
-##  [976] "AL683807.1"  "BX890604.1"  "AC073529.1"  "AC131011.1"  "CXorf21"     "AC108879.1"  "AC234772.3"  "AC234772.2"  "AF196972.1"  "AC115618.1"  "AC231533.1"  "AC232271.1"  "AC233976.1"  "AL445472.1"  "AL354793.1" 
-##  [991] "AL022157.1"  "AL034397.2"  "AL034397.3"  "AL590764.1"  "AL353804.1"  "Z68871.1"    "AL121601.1"  "AL683813.1"  "AL078639.1"  "AC244197.3" 
-##  [ reached getOption("max.print") -- omitted 2569 entries ]
-## [1] "they are added to the alias annotation table, so they don't get lost"
-## [1] "following are the official gene symbols of input aliases: "
-##              symbol            alias
-## 1             AARS1             AARS
-## 2           ABITRAM          FAM206A
-## 3              ACP3             ACPP
-## 4          ACTN1-DT        ACTN1-AS1
-## 5             ADPRS          ADPRHL2
-## 6             ADSS1           ADSSL1
-## 7             ADSS2             ADSS
-## 8        ANKRD20A2P        ANKRD20A2
-## 9        ANKRD20A3P        ANKRD20A3
-## 10       ANKRD20A4P        ANKRD20A4
-## 11        ANKRD40CL        LINC00483
-## 12           ANTKMT          FAM173A
-## 13            AOPEP           C9orf3
-## 14           ARLNC1        LINC02170
-## 15           ATP5MJ          ATP5MPL
-## 16           ATP5MK           ATP5MD
-## 17         ATPSCKMT          FAM173B
-## 18          B3GALT9          B3GNT10
-## 19             BBLN          C9orf16
-## 20           BMERB1         C16orf45
-## 21            BPNT2           IMPAD1
-## 22            BRME1         C19orf57
-## 23      CARNMT1-AS1      C9orf41-AS1
-## 24            CARS1             CARS
-## 25        CARS1-AS1         CARS-AS1
-## 26             CBY2            SPERT
-## 27          CCDC200        LINC00854
-## 28      CCDC28A-AS1            GVQW2
-## 29             CCN2             CTGF
-## 30             CCN3              NOV
-## 31             CCNP            CNTD2
-## 32            CDIN1         C15orf41
-## 33          CENATAC           CCDC84
-## 34            CEP20            FOPNL
-## 35            CEP43          FGFR1OP
-## 36            CERT1         COL4A3BP
-## 37         CFAP20DC          C3orf67
-## 38          CFAP251            WDR66
-## 39          CFAP410          C21orf2
-## 40           CFAP91           MAATS1
-## 41           CFAP92         KIAA1257
-## 42           CIAO2A           FAM96A
-## 43           CIAO2B           FAM96B
-## 44            CIAO3            NARFL
-## 45           CIBAR1           FAM92A
-## 46           CIBAR2           FAM92B
-## 47            CILK1              ICK
-## 48        CLLU1-AS1          CLLU1OS
-## 49             COA8           APOPT1
-## 50            CRACD         KIAA1211
-## 51           CRACDL        KIAA1211L
-## 52            CRPPA             ISPD
-## 53            CYRIA           FAM49A
-## 54            CYRIB           FAM49B
-## 55             CZIB         C1orf123
-## 56            DARS1             DARS
-## 57        DARS1-AS1         DARS-AS1
-## 58          DENND10           FAM45A
-## 59          DENND11         KIAA1147
-## 60          DENND2B              ST5
-## 61           DIPK1A           FAM69A
-## 62           DIPK1B           FAM69B
-## 63           DIPK2A          C3orf58
-## 64           DMAC2L            ATP5S
-## 65          DNAAF10            WDR92
-## 66          DNAAF11            LRRC6
-## 67           DNAAF8         C16orf71
-## 68           DNAAF9        C20orf194
-## 69            DNAI3            WDR63
-## 70            DNAI4            WDR78
-## 71            DNAI7            CASC1
-## 72        DOCK8-AS1          C9orf66
-## 73            DOP1A           DOPEY1
-## 74            DOP1B           DOPEY2
-## 75           DUSP29            DUPD1
-## 76          DYNC2I1            WDR60
-## 77          DYNC2I2            WDR34
-## 78           DYNLT2            TCTE3
-## 79          DYNLT2B         TCTEX1D2
-## 80           DYNLT4         TCTEX1D4
-## 81           DYNLT5         TCTEX1D1
-## 82            ECRG4          C2orf40
-## 83          ELAPOR1         KIAA1324
-## 84          ELAPOR2        KIAA1324L
-## 85            EOLA1         CXorf40A
-## 86            EOLA2         CXorf40B
-## 87            EPRS1             EPRS
-## 88            EZHIP          CXorf67
-## 89         FAM153CP          FAM153C
-## 90          FAM174C         C19orf24
-## 91         FAM86C1P          FAM86C1
-## 92             FCSK              FUK
-## 93           FHIP1A         FAM160A1
-## 94        FHIP1A-DT      FAM160A1-DT
-## 95           FHIP1B         FAM160A2
-## 96           FHIP2A         FAM160B1
-## 97           FHIP2B         FAM160B2
-## 98           FLACC1         ALS2CR12
-## 99      FMC1-LUC7L2   C7orf55-LUC7L2
-## 100          GARRE1         KIAA0355
-## 101          GASK1A          FAM198A
-## 102          GASK1B          FAM198B
-## 103      GASK1B-AS1      FAM198B-AS1
-## 104           GCNT4        LINC01336
-## 105        GDF5-AS1           GDF5OS
-## 106            GET1              WRB
-## 107            GET3            ASNA1
-## 108            GFUS            TSTA3
-## 109           GOLM2            CASC4
-## 110            H1-0             H1F0
-## 111            H1-1         HIST1H1A
-## 112           H1-10             H1FX
-## 113       H1-10-AS1         H1FX-AS1
-## 114            H1-2         HIST1H1C
-## 115            H1-3         HIST1H1D
-## 116            H1-4         HIST1H1E
-## 117            H1-5         HIST1H1B
-## 118            H1-6         HIST1H1T
-## 119           H2AB1           H2AFB1
-## 120          H2AC11        HIST1H2AG
-## 121          H2AC12        HIST1H2AH
-## 122          H2AC13        HIST1H2AI
-## 123          H2AC14        HIST1H2AJ
-## 124          H2AC15        HIST1H2AK
-## 125          H2AC16        HIST1H2AL
-## 126          H2AC17        HIST1H2AM
-## 127          H2AC18       HIST2H2AA3
-## 128          H2AC19       HIST2H2AA4
-## 129          H2AC20        HIST2H2AC
-## 130          H2AC21        HIST2H2AB
-## 131           H2AC4        HIST1H2AB
-## 132           H2AC6        HIST1H2AC
-## 133           H2AC8        HIST1H2AE
-## 134            H2AJ            H2AFJ
-## 135            H2AW         HIST3H2A
-## 136            H2AX            H2AFX
-## 137           H2AZ1            H2AFZ
-## 138           H2AZ2            H2AFV
-## 139          H2BC11        HIST1H2BJ
-## 140          H2BC12        HIST1H2BK
-## 141          H2BC13        HIST1H2BL
-## 142          H2BC14        HIST1H2BM
-## 143          H2BC15        HIST1H2BN
-## 144          H2BC17        HIST1H2BO
-## 145          H2BC18        HIST2H2BF
-## 146          H2BC21        HIST2H2BE
-## 147           H2BC3        HIST1H2BB
-## 148           H2BC5        HIST1H2BD
-## 149           H2BC9        HIST1H2BH
-## 150           H2BU1        HIST3H2BB
-## 151            H3-2       HIST2H3PS2
-## 152           H3-3A            H3F3A
-## 153           H3-3B            H3F3B
-## 154            H3-5            H3F3C
-## 155            H3C1         HIST1H3A
-## 156           H3C10         HIST1H3H
-## 157           H3C11         HIST1H3I
-## 158           H3C12         HIST1H3J
-## 159           H3C13         HIST2H3D
-## 160            H3C2         HIST1H3B
-## 161            H3C3         HIST1H3C
-## 162            H3C6         HIST1H3E
-## 163            H3C7         HIST1H3F
-## 164            H3C8         HIST1H3G
-## 165           H4-16          HIST4H4
-## 166            H4C1         HIST1H4A
-## 167           H4C11         HIST1H4J
-## 168           H4C13         HIST1H4L
-## 169           H4C14         HIST2H4A
-## 170           H4C15         HIST2H4B
-## 171            H4C2         HIST1H4B
-## 172            H4C3         HIST1H4C
-## 173            H4C4         HIST1H4D
-## 174            H4C5         HIST1H4E
-## 175            H4C6         HIST1H4F
-## 176            H4C8         HIST1H4H
-## 177            H4C9         HIST1H4I
-## 178           HARS1             HARS
-## 179            HEXD            HEXDC
-## 180            HROB         C17orf53
-## 181       HSDL2-AS1         C9orf147
-## 182           IARS1             IARS
-## 183           IFTAP         C11orf74
-## 184           ILRUN         C6orf106
-## 185           IRAG1            MRVI1
-## 186       IRAG1-AS1        MRVI1-AS1
-## 187           IRAG2             LRMP
-## 188         ITPRID2            SSFA2
-## 189           KARS1             KARS
-## 190           KASH5          CCDC155
-## 191          KATNIP         KIAA0556
-## 192           KICS2         C12orf66
-## 193           KIFBP           KIF1BP
-## 194       KRT10-AS1           TMEM99
-## 195           LARS1             LARS
-## 196           LDAF1          TMEM159
-## 197       LINC02481       LINC002481
-## 198       LINC02693         C17orf51
-## 199       LINC02694         C15orf53
-## 200       LINC02870         C10orf91
-## 201       LINC02875         C17orf82
-## 202       LINC02899          C5orf17
-## 203       LINC02901          C6orf99
-## 204       LINC02908         C9orf139
-## 205       LINC02909         C12orf77
-## 206       LINC02910        C20orf197
-## 207       LINC02913         C9orf106
-## 208       LNCTAM34A        LINC01759
-## 209          LRATD2           FAM84B
-## 210            LTO1           ORAOV1
-## 211           MACIR          C5orf30
-## 212       MACROH2A1            H2AFY
-## 213       MACROH2A2           H2AFY2
-## 214           MAILR        AZIN1-AS1
-## 215         MARCHF1           MARCH1
-## 216        MARCHF10          MARCH10
-## 217         MARCHF2           MARCH2
-## 218         MARCHF3           MARCH3
-## 219         MARCHF4           MARCH4
-## 220         MARCHF5           MARCH5
-## 221         MARCHF6           MARCH6
-## 222         MARCHF7           MARCH7
-## 223         MARCHF8           MARCH8
-## 224         MARCHF9           MARCH9
-## 225           MEAK7            TLDC1
-## 226        METTL25B           RRNAD1
-## 227         MICOS10           MINOS1
-## 228         MICOS13         C19orf70
-## 229          MIDEAS          ELMSAN1
-## 230          MINAR1         KIAA1024
-## 231       MIR3667HG         C22orf34
-## 232        MIR9-1HG          C1orf61
-## 233           MIX23           CCDC58
-## 234            MMUT              MUT
-## 235          MROCKI        LINC01268
-## 236           MRTFA             MKL1
-## 237           MRTFB             MKL2
-## 238          MTARC1            MARC1
-## 239          MTARC2            MARC2
-## 240            MTLN           SMIM37
-## 241          MTRES1         C6orf203
-## 242           MTRFR         C12orf65
-## 243           MTSS2           MTSS1L
-## 244            MYG1         C12orf10
-## 245           NARS1             NARS
-## 246        NCBP2AS2        NCBP2-AS2
-## 247       NDUFV1-DT         C11orf72
-## 248          NIBAN1          FAM129A
-## 249          NIBAN2          FAM129B
-## 250          NIBAN3          FAM129C
-## 251        NOPCHAP1         C12orf45
-## 252           NTAQ1           WDYHV1
-## 253           NUP42            NUPL2
-## 254            OBI1           RNF219
-## 255        OBI1-AS1       RNF219-AS1
-## 256           ODAD1          CCDC114
-## 257           ODAD2            ARMC4
-## 258           ODAD3          CCDC151
-## 259           ODAD4            TTC25
-## 260          PABIR1          FAM122A
-## 261          PABIR2          FAM122B
-## 262          PABIR3          FAM122C
-## 263           PACC1          TMEM206
-## 264      PALM2AKAP2            AKAP2
-## 265      PALM2AKAP2      PALM2-AKAP2
-## 266           PALS1             MPP5
-## 267       PAXIP1-DT       PAXIP1-AS1
-## 268           PEDS1          TMEM189
-## 269    PEDS1-UBE2V1   TMEM189-UBE2V1
-## 270         PELATON           SMIM25
-## 271           PGAP4          TMEM246
-## 272           PGAP6           TMEM8A
-## 273           PHAF1         C16orf70
-## 274      PIK3IP1-DT      PIK3IP1-AS1
-## 275       PITX1-AS1          C5orf66
-## 276         PLA2G2C       UBXN10-AS1
-## 277          PLAAT1           HRASLS
-## 278          PLAAT2          HRASLS2
-## 279          PLAAT3          PLA2G16
-## 280          PLAAT4          RARRES3
-## 281          PLAAT5          HRASLS5
-## 282         PLEKHG7         C12orf74
-## 283         POGLUT2           KDELC1
-## 284         POGLUT3           KDELC2
-## 285          POLR1F          TWISTNB
-## 286          POLR1G           CD3EAP
-## 287          POLR1H            ZNRD1
-## 288     PPP1R13B-DT        LINC00637
-## 289          PRANCR        LINC01481
-## 290         PRECSIT        LINC00346
-## 291           PRORP         KIAA0391
-## 292         PRSS42P           PRSS42
-## 293          PRXL2A          FAM213A
-## 294          PRXL2B          FAM213B
-## 295          PRXL2C            AAED1
-## 296        PSME3IP1          FAM192A
-## 297        RAB30-DT        RAB30-AS1
-## 298            RADX          CXorf57
-## 299           RAMAC           RAMMET
-## 300           RARS1             RARS
-## 301            RBIS          C8orf59
-## 302           RELCH         KIAA1468
-## 303           RESF1         KIAA1551
-## 304            RO60           TROVE2
-## 305        RPL34-DT        RPL34-AS1
-## 306            RRM2          C2orf48
-## 307            RSKR           SGK494
-## 308           RUSF1         C16orf58
-## 309           SANBR         KIAA1841
-## 310           SCAT1        LINC02081
-## 311         SEPTIN1            SEPT1
-## 312        SEPTIN10           SEPT10
-## 313        SEPTIN11           SEPT11
-## 314         SEPTIN3            SEPT3
-## 315         SEPTIN4            SEPT4
-## 316     SEPTIN4-AS1        SEPT4-AS1
-## 317         SEPTIN5            SEPT5
-## 318         SEPTIN6            SEPT6
-## 319         SEPTIN7            SEPT7
-## 320      SEPTIN7-DT        SEPT7-AS1
-## 321         SEPTIN8            SEPT8
-## 322         SEPTIN9            SEPT9
-## 323            SHFL         C19orf66
-## 324           SHOC1          C9orf84
-## 325         SLC49A4            DIRC2
-## 326         SLC66A1            PQLC2
-## 327         SLC66A2            PQLC1
-## 328         SLC66A3            PQLC3
-## 329         SMC5-DT         SMC5-AS1
-## 330          SMIM43          TMEM155
-## 331          SNHG30        LINC02001
-## 332          SNHG32          C6orf48
-## 333         SPRING1         C12orf49
-## 334           SSPOP             SSPO
-## 335         STAM-DT         STAM-AS1
-## 336          STEEP1          CXorf56
-## 337  STIMATE-MUSTN1   TMEM110-MUSTN1
-## 338          STING1          TMEM173
-## 339        STX17-DT        STX17-AS1
-## 340           TAFA1          FAM19A1
-## 341           TAFA2          FAM19A2
-## 342         TAMALIN            GRASP
-## 343           TARS1             TARS
-## 344           TARS3           TARSL2
-## 345           TASOR          FAM208A
-## 346          TASOR2          FAM208B
-## 347        TBC1D29P          TBC1D29
-## 348   TIMM23B-AGAP6        LINC00843
-## 349          TLCD3A           FAM57A
-## 350          TLCD3B           FAM57B
-## 351           TLCD4           TMEM56
-## 352           TLCD5          TMEM136
-## 353            TLE5              AES
-## 354 TM4SF19-DYNLT2B TM4SF19-TCTEX1D2
-## 355        TMCC1-DT        TMCC1-AS1
-## 356       TOLLIP-DT       TOLLIP-AS1
-## 357        TRAPPC14          C7orf43
-## 358          TUBB8B         TUBB8P12
-## 359       USP27X-DT       USP27X-AS1
-## 360        USP46-DT        USP46-AS1
-## 361           VARS1             VARS
-## 362           WARS1             WARS
-## 363            YAE1           YAE1D1
-## 364           YARS1             YARS
-## 365           YJU2B          CCDC130
-## 366       YTHDF3-DT       YTHDF3-AS1
-## 367            ZFTA         C11orf95
-## 368       ZNF22-AS1         C10orf25
-## 369      ZNF407-AS1        LINC00909
-## 370       ZNF516-DT         C18orf65
-## 371       ZNF582-DT       ZNF582-AS1
-## 372          ZNF875             HKR1
-## 373           ZNRD2           SSSCA1
-

Step -0.3: Prepare settings of the MultiNicheNet cell-cell communication -analysis

-

Define -in which metadata columns we can find the group, -sample and cell type IDs

-

In this case study, we want to study differences in cell-cell -communication patterns between MIS-C patients (M), their healthy -siblings (S) and adult patients with severe covid (A). The meta data -columns that indicate this disease status is -MIS.C.AgeTier.

-

Cell type annotations are indicated in the -Annotation_v2.0 column, and the sample is indicated by the -ShortID column. If your cells are annotated in multiple -hierarchical levels, we recommend using a high level in the hierarchy. -This for 2 reasons: 1) MultiNicheNet focuses on differential expression -and not differential abundance, and 2) there should be sufficient cells -per sample-celltype combination.

-

If you would have batch effects or covariates you can correct for, -you can define this here as well.

-

Important: for categorical covariates and batches, there should be at -least one sample for every group-batch combination. If one of your -groups/conditions lacks a certain level of your batch, you won’t be able -to correct for the batch effect because the model is then not able to -distinguish batch from group/condition effects.

-

Important: the column names of group, sample, cell type, batches and -covariates should be syntactically valid (make.names)

-

Important: All group, sample, cell type, batch and covariate names -should be syntactically valid as well (make.names) (eg -through -SummarizedExperiment::colData(sce)$ShortID = SummarizedExperiment::colData(sce)$ShortID %>% make.names())

-
sample_id = "ShortID"
-group_id = "MIS.C.AgeTier"
-celltype_id = "Annotation_v2.0"
-covariates = NA
-batches = NA
-

Sender and receiver cell types also need to be defined. Both are here -all cell types in the dataset because we are interested in an All-vs-All -analysis.

-
senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
-receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
-

If the user wants it, it is possible to use only a subset of senders -and receivers. Senders and receivers can be entirely different, but also -overlapping, or the same. If you don’t use all the cell types in your -data, we recommend to continue with a subset of your data.

-
sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% c(senders_oi, receivers_oi)]
-

Now we will go to the first real step of the MultiNicheNet -analysis

-

Step -1: Extract cell type abundance and expression information from receiver -and sender cell types, and link this expression information for ligands -of the sender cell types to the corresponding receptors of the receiver -cell types

-

Since MultiNicheNet will infer group differences at the sample level -for each cell type (currently via Muscat - pseudobulking + EdgeR), we -need to have sufficient cells per sample of a cell type, and this for -both groups. In the following analysis we will set this minimum number -of cells per cell type per sample at 10 (recommended minimum).

-
min_cells = 10
-

Now we will calculate abundance and expression information for each -cell type / sample / group combination with the following functions. In -the output of this function, you can also find some ‘Cell type abundance -diagnostic plots’ that will the users which celltype-sample combinations -will be left out later on for DE calculation because the nr of cells is -lower than de defined minimum defined here above. If too many -celltype-sample combinations don’t pass this threshold, we recommend to -define your cell types in a more general way (use one level higher of -the cell type ontology hierarchy) (eg TH17 CD4T cells –> CD4T -cells).

-
abundance_expression_info = get_abundance_expression_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network, batches = batches)
+

Following these steps, one can optionally +* 7. Calculate the across-samples expression correlation between ligand-receptor pairs and target genes +* 8. Prioritize communication patterns involving condition-specific cell types through an alternative prioritization scheme

+

After these steps, the output can be further explored as we will demonstrate in the “Downstream analysis of the MultiNicheNet output” section.

+

In this vignette, we will demonstrate these steps one-by-one, which offers the most flexibility to the user to assess intermediary results. Other vignettes will demonstrate the use of the multi_nichenet_analysis wrapper function.

+
+

2.1 Cell-type filtering: determine which cell types are sufficiently present

+

In this step we will calculate and visualize cell type abundances. This will give an indication about which cell types will be retained in the analysis, and which cell types will be filtered out.

+

Since MultiNicheNet will infer group differences at the sample level for each cell type (currently via Muscat - pseudobulking + EdgeR), we need to have sufficient cells per sample of a cell type, and this for all groups. In the following analysis we will set this minimum number of cells per cell type per sample at 10. Samples that have less than min_cells cells will be excluded from the analysis for that specific cell type.

+
min_cells = 10
+

We recommend using min_cells = 10, except for datasets with several lowly abundant cell types of interest. For those datasets, we recommend using min_cells = 5.

+
abundance_info = get_abundance_info(
+  sce = sce, 
+  sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, 
+  min_cells = min_cells, 
+  senders_oi = senders_oi, receivers_oi = receivers_oi, 
+  batches = batches
+  )

First, we will check the cell type abundance diagnostic plots.

-

Interpretation of -cell type abundance information

-

The first plot visualizes the number of cells per celltype-sample -combination, and indicates which combinations are removed during the DE -analysis because there are less than min_cells in the -celltype-sample combination.

-
abundance_expression_info$abund_plot_sample
-

-The red dotted line indicates the required minimum of cells as defined -above in min_cells. We can see here that some -sample-celltype combinations are left out. For the DE analysis in the -next step, only cell types will be considered if there are at least two -samples per group with a sufficient number of cells.

-

Important: Based on the cell type abundance -diagnostics, we recommend users to change their analysis settings if -required (eg changing cell type annotation level, batches, …), before -proceeding with the rest of the analysis.

-

Step -2: Perform genome-wide differential expression analysis of receiver and -sender cell types to define DE genes between the conditions of interest. -Based on this analysis, we can define the logFC/p-value of ligands in -senders and receptors in receivers, and define the set of affected -target genes in the receiver.

-

Now we will go over to the multi-group, multi-sample differential -expression (DE) analysis (also called ‘differential state’ analysis by -the developers of Muscat).

-

Define -the contrasts and covariates of interest for the DE analysis.

-

Here, we want to compare each patient group to the other groups, so -the MIS-C (M) group vs healthy control siblins (S) and adult COVID19 -patients (A) etcetera. To do this comparison, we need to set the -following contrasts:

-
contrasts_oi = c("'M-(S+A)/2','S-(M+A)/2','A-(S+M)/2'")
-

Very Important Note the format to indicate the -contrasts! This formatting should be adhered to very strictly, and white -spaces are not allowed! Check ?get_DE_info for explanation -about how to define this well. The most important things are that: each -contrast is surrounded by single quotation marks, contrasts are -separated by a comma without any whitespace, and alle contrasts together -are surrounded by double quotation marks. If you compare against two -groups, you should divide by 2, if you compare against three groups, you -should divide by 3 etcetera.

-

For downstream visualizations and linking contrasts to their main -group, you need to run the following:

-
contrast_tbl = tibble(contrast = 
-                        c("M-(S+A)/2","S-(M+A)/2", "A-(S+M)/2"), 
-                      group = c("M","S","A"))
-

If you want to compare only two groups (eg M vs S), you can do like -this: contrasts_oi = c("'M-S','S-M'") -contrast_tbl = tibble(contrast = c("M-S","S-M"), group = c("M","S"))

-

Perform the DE -analysis for each cell type.

-
DE_info = get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, batches = batches, covariates = covariates, contrasts_oi = contrasts_oi, min_cells = min_cells)
-

Check DE results

-

Table with logFC and p-values for each gene-celltype-contrast:

-
DE_info$celltype_de$de_output_tidy %>% arrange(p_adj) %>% head()
-## # A tibble: 6 x 9
-##   gene     cluster_id             logFC logCPM     F    p_val     p_adj.loc         p_adj contrast 
-##   <chr>    <chr>                  <dbl>  <dbl> <dbl>    <dbl>         <dbl>         <dbl> <chr>    
-## 1 TRBV11.2 L_T_TIM3._CD38._HLADR.  4.73   9.84   246 4.33e-13 0.00000000283 0.00000000283 M-(S+A)/2
-## 2 IGLV3.25 L_NK_CD56._CD16.        7.54   5      218 6.04e-13 0.00000000516 0.00000000516 A-(S+M)/2
-## 3 IGHV3.11 L_NK_CD56._CD16.        5.72   3.09   182 6.56e-11 0.00000028    0.00000028    A-(S+M)/2
-## 4 IGHV1.46 L_NK_CD56._CD16.        4.58   4.12   125 1.44e-10 0.000000411   0.000000411   A-(S+M)/2
-## 5 IGLV6.57 L_NK_CD56._CD16.        4.76   3.83   112 3.83e-10 0.000000817   0.000000817   A-(S+M)/2
-## 6 IGKV1.27 L_NK_CD56._CD16.        4.13   3.35   108 5.5 e-10 0.000000939   0.000000939   A-(S+M)/2
-
celltype_de = DE_info$celltype_de$de_output_tidy
-

In the next step, we will combine the DE information of senders and -receivers by linking their ligands and receptors together:

-

Combine -DE information for ligand-senders and receptors-receivers (similar to -step1 - -abundance_expression_info$sender_receiver_info)

-
sender_receiver_de = combine_sender_receiver_de(
-  sender_de = celltype_de,
-  receiver_de = celltype_de,
-  senders_oi = senders_oi,
-  receivers_oi = receivers_oi,
-  lr_network = lr_network
-)
-
sender_receiver_de %>% head(20)
-## # A tibble: 20 x 12
-##    contrast  sender                 receiver               ligand receptor lfc_ligand lfc_receptor ligand_receptor_lfc_avg p_val_ligand p_adj_ligand p_val_receptor p_adj_receptor
-##    <chr>     <chr>                  <chr>                  <chr>  <chr>         <dbl>        <dbl>                   <dbl>        <dbl>        <dbl>          <dbl>          <dbl>
-##  1 M-(S+A)/2 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16        CCL3   CCR1          2.89        0.8                       1.85   0.00053          0.0881     0.0568             0.308 
-##  2 M-(S+A)/2 L_T_TIM3._CD38._HLADR. L_T_TIM3._CD38._HLADR. CCL3   CCR5          2.89        0.4                       1.64   0.00053          0.0881     0.267              0.821 
-##  3 M-(S+A)/2 L_NK_CD56._CD16.       M_Monocyte_CD16        S100A8 CD69          1.68        1.53                      1.60   0.031            0.428      0.00275            0.0893
-##  4 A-(S+M)/2 M_Monocyte_CD16        M_Monocyte_CD16        HLA.G  KLRD1         1.46        1.6                       1.53   0.00266          0.268      0.0991             0.644 
-##  5 M-(S+A)/2 M_Monocyte_CD16        M_Monocyte_CD16        C1QB   CD33          2.99        0.0287                    1.51   0.00000638       0.0131     0.926              0.992 
-##  6 A-(S+M)/2 M_Monocyte_CD16        M_Monocyte_CD16        C1QB   LRP1          2.21        0.578                     1.39   0.000377         0.152      0.0223             0.518 
-##  7 M-(S+A)/2 M_Monocyte_CD16        M_Monocyte_CD16        SLAMF7 SLAMF7        1.38        1.38                      1.38   0.0318           0.239      0.0318             0.239 
-##  8 M-(S+A)/2 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16        GZMB   IGF2R         2.28        0.423                     1.35   0.0000633        0.0414     0.304              0.64  
-##  9 M-(S+A)/2 M_Monocyte_CD16        M_Monocyte_CD16        C1QB   LRP1          2.99       -0.326                     1.33   0.00000638       0.0131     0.0727             0.346 
-## 10 M-(S+A)/2 L_T_TIM3._CD38._HLADR. L_T_TIM3._CD38._HLADR. GZMB   IGF2R         2.28        0.376                     1.33   0.0000633        0.0414     0.198              0.755 
-## 11 M-(S+A)/2 M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. IL18   IL18R1        1.9         0.726                     1.31   0.0637           0.327      0.0692             0.55  
-## 12 A-(S+M)/2 M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. LGALS3 LAG3          0.915       1.71                      1.31   0.00129          0.224      0.0012             0.334 
-## 13 M-(S+A)/2 L_NK_CD56._CD16.       L_T_TIM3._CD38._HLADR. FAM3C  ADGRG5        0.101       2.52                      1.31   0.576            0.932      0.00294            0.173 
-## 14 M-(S+A)/2 M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. HLA.A  KLRD1         0.799       1.78                      1.29   0.0000086        0.0133     0.000741           0.0969
-## 15 M-(S+A)/2 M_Monocyte_CD16        L_NK_CD56._CD16.       IL18   IL18R1        1.9         0.673                     1.29   0.0637           0.327      0.0275             0.414 
-## 16 S-(M+A)/2 M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. RETN   GPR25        -0.209       2.75                      1.27   0.666            0.962      0.00000532         0.0171
-## 17 A-(S+M)/2 L_NK_CD56._CD16.       M_Monocyte_CD16        SPON2  ITGAM         0.442       2.09                      1.27   0.0424           0.569      0.0111             0.416 
-## 18 M-(S+A)/2 M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. TNF    TNFRSF1A      1.78        0.727                     1.25   0.021            0.205      0.0745             0.561 
-## 19 A-(S+M)/2 L_NK_CD56._CD16.       M_Monocyte_CD16        ICAM2  ITGAM         0.397       2.09                      1.24   0.0228           0.446      0.0111             0.416 
-## 20 A-(S+M)/2 M_Monocyte_CD16        L_NK_CD56._CD16.       HLA.G  LILRB1        1.46        0.996                     1.23   0.00266          0.268      0.00673            0.282
-

Step -3: Predict NicheNet ligand activities and NicheNet ligand-target links -based on these differential expression results

-

Define -the parameters for the NicheNet ligand activity analysis

-

Here, we need to define the thresholds that will be used to consider -genes as differentially expressed or not (logFC, p-value, decision -whether to use adjusted or normal p-value, minimum fraction of cells -that should express a gene in at least one sample in a group, whether to -use the normal p-values or empirical p-values).

-

NicheNet ligand activity will then be calculated as the enrichment of -predicted target genes of ligands in this set of DE genes compared to -the genomic background. Here we choose for a minimum logFC of 0.50, -maximum p-value of 0.05, and minimum fraction of expression of 0.05.

-
logFC_threshold = 0.50
-p_val_threshold = 0.05
-fraction_cutoff = 0.05
-

We will here choose for applying the p-value cutoff on the normal -p-values, and not on the p-values corrected for multiple testing. This -choice was made here because this dataset has only a few samples per -group and we might have a lack of statistical power due to -pseudobulking. In case of more samples per group, and a sufficient high -number of DE genes per group-celltype (> 50), we would recommend -using the adjusted p-values.

-
# p_val_adj = TRUE 
-p_val_adj = FALSE 
-

For the NicheNet ligand-target inference, we also need to select -which top n of the predicted target genes will be considered (here: top -250 targets per ligand).

-
top_n_target = 250
-

The NicheNet ligand activity analysis can be run in parallel for each -receiver cell type, by changing the number of cores as defined here. -This is only recommended if you have many receiver cell type.

-
verbose = TRUE
-cores_system = 8
-n.cores = min(cores_system, sender_receiver_de$receiver %>% unique() %>% length()) # use one core per receiver cell type
-

Run the NicheNet -ligand activity analysis

-

(this might take some time)

-
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings(get_ligand_activities_targets_DEgenes(
-  receiver_de = celltype_de,
-  receivers_oi = receivers_oi,
-  ligand_target_matrix = ligand_target_matrix,
-  logFC_threshold = logFC_threshold,
-  p_val_threshold = p_val_threshold,
-  p_val_adj = p_val_adj,
-  top_n_target = top_n_target,
-  verbose = verbose, 
-  n.cores = n.cores
-)))
-

Check the DE genes used for the activity analysis

-
ligand_activities_targets_DEgenes$de_genes_df %>% head(20)
-## # A tibble: 20 x 6
-##    gene         receiver         logFC    p_val  p_adj contrast 
-##    <chr>        <chr>            <dbl>    <dbl>  <dbl> <chr>    
-##  1 TNFRSF14.AS1 L_NK_CD56._CD16. 0.957 0.00935  0.276  M-(S+A)/2
-##  2 ACOT7        L_NK_CD56._CD16. 0.711 0.00449  0.218  M-(S+A)/2
-##  3 KDM1A        L_NK_CD56._CD16. 0.632 0.0268   0.408  M-(S+A)/2
-##  4 SMAP2        L_NK_CD56._CD16. 0.56  0.00332  0.198  M-(S+A)/2
-##  5 P3H1         L_NK_CD56._CD16. 0.58  0.00937  0.276  M-(S+A)/2
-##  6 CPT2         L_NK_CD56._CD16. 0.632 0.0256   0.401  M-(S+A)/2
-##  7 DHCR24       L_NK_CD56._CD16. 1.15  0.00182  0.165  M-(S+A)/2
-##  8 GADD45A      L_NK_CD56._CD16. 0.538 0.0224   0.382  M-(S+A)/2
-##  9 S100A9       L_NK_CD56._CD16. 1.57  0.0315   0.431  M-(S+A)/2
-## 10 S100A8       L_NK_CD56._CD16. 1.68  0.031    0.428  M-(S+A)/2
-## 11 XCL1         L_NK_CD56._CD16. 0.896 0.0467   0.473  M-(S+A)/2
-## 12 SELL         L_NK_CD56._CD16. 0.576 0.00131  0.15   M-(S+A)/2
-## 13 IER5         L_NK_CD56._CD16. 0.736 0.0152   0.338  M-(S+A)/2
-## 14 MIR181A1HG   L_NK_CD56._CD16. 0.885 0.0153   0.339  M-(S+A)/2
-## 15 CD55         L_NK_CD56._CD16. 0.58  0.0251   0.4    M-(S+A)/2
-## 16 UTP25        L_NK_CD56._CD16. 0.697 0.0479   0.475  M-(S+A)/2
-## 17 H2AW         L_NK_CD56._CD16. 0.677 0.000416 0.109  M-(S+A)/2
-## 18 RNF187       L_NK_CD56._CD16. 0.661 0.0032   0.198  M-(S+A)/2
-## 19 LPIN1        L_NK_CD56._CD16. 0.555 0.000124 0.0778 M-(S+A)/2
-## 20 POMC         L_NK_CD56._CD16. 0.59  0.015    0.335  M-(S+A)/2
-

Check the output of the activity analysis

-
ligand_activities_targets_DEgenes$ligand_activities %>% head(20)
-## # A tibble: 20 x 8
-## # Groups:   receiver, contrast [1]
-##    ligand activity contrast  target   ligand_target_weight receiver         direction_regulation activity_scaled
-##    <chr>     <dbl> <chr>     <chr>                   <dbl> <chr>            <fct>                          <dbl>
-##  1 A2M      0.0224 M-(S+A)/2 CD55                  0.00649 L_NK_CD56._CD16. up                             0.823
-##  2 A2M      0.0224 M-(S+A)/2 FKBP5                 0.00723 L_NK_CD56._CD16. up                             0.823
-##  3 A2M      0.0224 M-(S+A)/2 FOS                   0.0146  L_NK_CD56._CD16. up                             0.823
-##  4 A2M      0.0224 M-(S+A)/2 GADD45A               0.0110  L_NK_CD56._CD16. up                             0.823
-##  5 A2M      0.0224 M-(S+A)/2 H2AC6                 0.00747 L_NK_CD56._CD16. up                             0.823
-##  6 A2M      0.0224 M-(S+A)/2 H2BC12                0.00692 L_NK_CD56._CD16. up                             0.823
-##  7 A2M      0.0224 M-(S+A)/2 ISG20                 0.00738 L_NK_CD56._CD16. up                             0.823
-##  8 A2M      0.0224 M-(S+A)/2 LMNB1                 0.00699 L_NK_CD56._CD16. up                             0.823
-##  9 A2M      0.0224 M-(S+A)/2 MYC                   0.0199  L_NK_CD56._CD16. up                             0.823
-## 10 A2M      0.0224 M-(S+A)/2 NFKB1                 0.00895 L_NK_CD56._CD16. up                             0.823
-## 11 A2M      0.0224 M-(S+A)/2 NFKBIZ                0.00661 L_NK_CD56._CD16. up                             0.823
-## 12 A2M      0.0224 M-(S+A)/2 PPP1R15A              0.00776 L_NK_CD56._CD16. up                             0.823
-## 13 A2M      0.0224 M-(S+A)/2 SLC1A5                0.00710 L_NK_CD56._CD16. up                             0.823
-## 14 A2M      0.0224 M-(S+A)/2 SMAD3                 0.00776 L_NK_CD56._CD16. up                             0.823
-## 15 A2M      0.0224 M-(S+A)/2 SOCS3                 0.00968 L_NK_CD56._CD16. up                             0.823
-## 16 A2M      0.0224 M-(S+A)/2 TFRC                  0.00712 L_NK_CD56._CD16. up                             0.823
-## 17 A2M      0.0224 M-(S+A)/2 WARS1                 0.00743 L_NK_CD56._CD16. up                             0.823
-## 18 A2M      0.0224 M-(S+A)/2 ZFP36                 0.00732 L_NK_CD56._CD16. up                             0.823
-## 19 AANAT    0.0215 M-(S+A)/2 FKBP5                 0.00514 L_NK_CD56._CD16. up                             0.652
-## 20 AANAT    0.0215 M-(S+A)/2 FOS                   0.00863 L_NK_CD56._CD16. up                             0.652
-

Step -4: Use the information collected above to prioritize all -sender-ligand—receiver-receptor pairs.

-

In the 3 previous steps, we calculated expression, differential -expression and NicheNet activity information. Now we will combine these -different types of information in one prioritization scheme.

-

MultiNicheNet allows the user to define the weights of the following -criteria to prioritize ligand-receptor interactions:

+
+

2.1.1 Interpretation of cell type abundance information

+

The first plot visualizes the number of cells per celltype-sample combination, and indicates which combinations are removed during the DE analysis because there are less than min_cells in the celltype-sample combination.

+
abundance_info$abund_plot_sample
+

+

The red dotted line indicates the required minimum of cells as defined above in min_cells. We can see here that some sample-celltype combinations are left out. For the DE analysis in the next step, only cell types will be considered if there are at least two samples per group with a sufficient number of cells. But as we can see here: all cell types will be considered for the analysis and there are no condition-specific cell types.

+

Important: Based on the cell type abundance diagnostics, we recommend users to change their analysis settings if required (eg changing cell type annotation level, batches, …), before proceeding with the rest of the analysis. If too many celltype-sample combinations don’t pass this threshold, we recommend to define your cell types in a more general way (use one level higher of the cell type ontology hierarchy) (eg TH17 CD4T cells –> CD4T cells) or use min_cells = 5 if this would not be possible.

+
+
+

2.1.2 Cell type filtering based on cell type abundance information

+

Running the following block of code can help you determine which cell types are condition-specific and which cell types are absent.

+
sample_group_celltype_df = abundance_info$abundance_data %>% 
+  filter(n > min_cells) %>% 
+  ungroup() %>% 
+  distinct(sample_id, group_id) %>% 
+  cross_join(
+    abundance_info$abundance_data %>% 
+      ungroup() %>% 
+      distinct(celltype_id)
+    ) %>% 
+  arrange(sample_id)
+
+abundance_df = sample_group_celltype_df %>% left_join(
+  abundance_info$abundance_data %>% ungroup()
+  )
+
+abundance_df$n[is.na(abundance_df$n)] = 0
+abundance_df$keep[is.na(abundance_df$keep)] = FALSE
+abundance_df_summarized = abundance_df %>% 
+  mutate(keep = as.logical(keep)) %>% 
+  group_by(group_id, celltype_id) %>% 
+  summarise(samples_present = sum((keep)))
+
+celltypes_absent_one_condition = abundance_df_summarized %>% 
+  filter(samples_present == 0) %>% pull(celltype_id) %>% unique() 
+# find truly condition-specific cell types by searching for cell types 
+# truely absent in at least one condition
+
+celltypes_present_one_condition = abundance_df_summarized %>% 
+  filter(samples_present >= 2) %>% pull(celltype_id) %>% unique() 
+# require presence in at least 2 samples of one group so 
+# it is really present in at least one condition
+
+condition_specific_celltypes = intersect(
+  celltypes_absent_one_condition, 
+  celltypes_present_one_condition)
+
+total_nr_conditions = SummarizedExperiment::colData(sce)[,group_id] %>% 
+  unique() %>% length() 
+
+absent_celltypes = abundance_df_summarized %>% 
+  filter(samples_present < 2) %>% 
+  group_by(celltype_id) %>% 
+  count() %>% 
+  filter(n == total_nr_conditions) %>% 
+  pull(celltype_id)
+  
+print("condition-specific celltypes:")
+## [1] "condition-specific celltypes:"
+print(condition_specific_celltypes)
+## character(0)
+  
+print("absent celltypes:")
+## [1] "absent celltypes:"
+print(absent_celltypes)
+## character(0)
+

Absent cell types will be filtered out, condition-specific cell types can be filtered out if you as a user do not want to run the alternative workflow for condition-specific cell types in the optional step 8 of the core MultiNicheNet analysis.

+
analyse_condition_specific_celltypes = FALSE
+
if(analyse_condition_specific_celltypes == TRUE){
+  senders_oi = senders_oi %>% setdiff(absent_celltypes)
+  receivers_oi = receivers_oi %>% setdiff(absent_celltypes)
+} else {
+  senders_oi = senders_oi %>% 
+    setdiff(union(absent_celltypes, condition_specific_celltypes))
+  receivers_oi = receivers_oi %>% 
+    setdiff(union(absent_celltypes, condition_specific_celltypes))
+}
+
+sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% 
+            c(senders_oi, receivers_oi)
+          ]
+
+
+
+

2.2 Gene filtering: determine which genes are sufficiently expressed in each present cell type

+

Before running the DE analysis, we will determine which genes are not sufficiently expressed and should be filtered out. +We will perform gene filtering based on a similar procedure as used in edgeR::filterByExpr. However, we adapted this procedure to be more interpretable for single-cell datasets.

+

For each cell type, we will consider genes expressed if they are expressed in at least a min_sample_prop fraction of samples in the condition with the lowest number of samples. By default, we set min_sample_prop = 0.50, which means that genes should be expressed in at least 2 samples if the group with lowest nr. of samples has 4 samples like this dataset.

+
min_sample_prop = 0.50
+

But how do we define which genes are expressed in a sample? For this we will consider genes as expressed if they have non-zero expression values in a fraction_cutoff fraction of cells of that cell type in that sample. By default, we set fraction_cutoff = 0.05, which means that genes should show non-zero expression values in at least 5% of cells in a sample.

+
fraction_cutoff = 0.05
+

We recommend using these default values unless there is specific interest in prioritizing (very) weakly expressed interactions. In that case, you could lower the value of fraction_cutoff. We explicitly recommend against using fraction_cutoff > 0.10.

+

Now we will calculate the information required for gene filtering with the following command:

+
frq_list = get_frac_exprs(
+  sce = sce, 
+  sample_id = sample_id, celltype_id =  celltype_id, group_id = group_id, 
+  batches = batches, 
+  min_cells = min_cells, 
+  fraction_cutoff = fraction_cutoff, min_sample_prop = min_sample_prop)
+## [1] "Samples are considered if they have more than 10 cells of the cell type of interest"
+## [1] "Genes with non-zero counts in at least 5% of cells of a cell type of interest in a particular sample will be considered as expressed in that sample."
+## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: L_NK_CD56._CD16."
+## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: L_T_TIM3._CD38._HLADR."
+## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: M_Monocyte_CD16"
+## [1] "6621 genes are considered as expressed in the cell type: L_NK_CD56._CD16."
+## [1] "8461 genes are considered as expressed in the cell type: L_T_TIM3._CD38._HLADR."
+## [1] "8817 genes are considered as expressed in the cell type: M_Monocyte_CD16"
+

Now only keep genes that are expressed by at least one cell type:

+
genes_oi = frq_list$expressed_df %>% 
+  filter(expressed == TRUE) %>% pull(gene) %>% unique() 
+sce = sce[genes_oi, ]
+
+
+

2.3 Pseudobulk expression calculation: determine and normalize per-sample pseudobulk expression levels for each expressed gene in each present cell type

+

After filtering out absent cell types and genes, we will continue the analysis by calculating the different prioritization criteria that we will use to prioritize cell-cell communication patterns.

+

First, we will determine and normalize per-sample pseudobulk expression levels for each expressed gene in each present cell type. The function process_abundance_expression_info will link this expression information for ligands of the sender cell types to the corresponding receptors of the receiver cell types. This will later on allow us to define the cell-type specicificy criteria for ligands and receptors.

+
abundance_expression_info = process_abundance_expression_info(
+  sce = sce, 
+  sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, 
+  min_cells = min_cells, 
+  senders_oi = senders_oi, receivers_oi = receivers_oi, 
+  lr_network = lr_network, 
+  batches = batches, 
+  frq_list = frq_list, 
+  abundance_info = abundance_info)
+

Normalized pseudobulk expression values per gene/celltype/sample can be inspected by:

+
abundance_expression_info$celltype_info$pb_df %>% head()
+## # A tibble: 6 × 4
+##   gene  sample pb_sample celltype              
+##   <chr> <chr>      <dbl> <fct>                 
+## 1 A1BG  A1          3.10 L_T_TIM3._CD38._HLADR.
+## 2 AAAS  A1          4.01 L_T_TIM3._CD38._HLADR.
+## 3 AAGAB A1          5.54 L_T_TIM3._CD38._HLADR.
+## 4 AAK1  A1          6.40 L_T_TIM3._CD38._HLADR.
+## 5 AAMDC A1          4.57 L_T_TIM3._CD38._HLADR.
+## 6 AAMP  A1          5.54 L_T_TIM3._CD38._HLADR.
+

An average of these sample-level expression values per condition/group can be inspected by:

+
abundance_expression_info$celltype_info$pb_df_group %>% head()
+## # A tibble: 6 × 4
+## # Groups:   group, celltype [1]
+##   group celltype         gene  pb_group
+##   <chr> <chr>            <chr>    <dbl>
+## 1 A     L_NK_CD56._CD16. A1BG      3.91
+## 2 A     L_NK_CD56._CD16. AAAS      4.81
+## 3 A     L_NK_CD56._CD16. AAGAB     4.82
+## 4 A     L_NK_CD56._CD16. AAK1      7.03
+## 5 A     L_NK_CD56._CD16. AAMDC     3.66
+## 6 A     L_NK_CD56._CD16. AAMP      6.04
+

Inspecting these values for ligand-receptor interactions can be done by:

+
abundance_expression_info$sender_receiver_info$pb_df %>% head()
+## # A tibble: 6 × 8
+##   sample sender                 receiver   ligand receptor pb_ligand pb_receptor
+##   <chr>  <chr>                  <chr>      <chr>  <chr>        <dbl>       <dbl>
+## 1 M4     M_Monocyte_CD16        M_Monocyt… B2M    LILRB1        14.2        11.4
+## 2 M4     L_NK_CD56._CD16.       M_Monocyt… B2M    LILRB1        14.0        11.4
+## 3 M4     L_T_TIM3._CD38._HLADR. M_Monocyt… B2M    LILRB1        13.8        11.4
+## 4 M5     L_NK_CD56._CD16.       M_Monocyt… B2M    LILRB1        14.5        10.5
+## 5 M5     M_Monocyte_CD16        M_Monocyt… B2M    LILRB1        14.4        10.5
+## 6 M5     L_T_TIM3._CD38._HLADR. M_Monocyt… B2M    LILRB1        14.2        10.5
+## # ℹ 1 more variable: ligand_receptor_pb_prod <dbl>
+abundance_expression_info$sender_receiver_info$pb_df_group %>% head()
+## # A tibble: 6 × 8
+## # Groups:   group, sender [6]
+##   group sender        receiver ligand receptor pb_ligand_group pb_receptor_group
+##   <chr> <chr>         <chr>    <chr>  <chr>              <dbl>             <dbl>
+## 1 M     L_NK_CD56._C… M_Monoc… B2M    LILRB1              14.1             10.0 
+## 2 M     M_Monocyte_C… M_Monoc… B2M    LILRB1              14.1             10.0 
+## 3 M     L_T_TIM3._CD… M_Monoc… B2M    LILRB1              14.0             10.0 
+## 4 A     L_NK_CD56._C… L_NK_CD… B2M    KLRD1               14.3              9.71
+## 5 S     L_NK_CD56._C… L_NK_CD… B2M    KLRD1               14.4              9.46
+## 6 A     L_T_TIM3._CD… L_NK_CD… B2M    KLRD1               13.9              9.71
+## # ℹ 1 more variable: ligand_receptor_pb_prod_group <dbl>
+
+
+

2.4 Differential expression (DE) analysis: determine which genes are differentially expressed

+

In this step, we will perform genome-wide differential expression analysis of receiver and sender cell types to define DE genes between the conditions of interest (as formalized by the contrasts_oi). Based on this analysis, we later can define the levels of differential expression of ligands in senders and receptors in receivers, and define the set of affected target genes in the receiver cell types (which will be used for the ligand activity analysis).

+

We will apply pseudobulking followed by EdgeR to perform multi-condition multi-sample differential expression (DE) analysis (also called ‘differential state’ analysis by the developers of Muscat).

+
DE_info = get_DE_info(
+  sce = sce, 
+  sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, 
+  batches = batches, covariates = covariates, 
+  contrasts_oi = contrasts_oi, 
+  min_cells = min_cells, 
+  expressed_df = frq_list$expressed_df)
+## [1] "DE analysis is done:"
+## [1] "included cell types are:"
+## [1] "L_T_TIM3._CD38._HLADR." "L_NK_CD56._CD16."       "M_Monocyte_CD16"
+
+

2.4.1 Check DE results

+

Check DE output information in table with logFC and p-values for each gene-celltype-contrast:

+
DE_info$celltype_de$de_output_tidy %>% head()
+## # A tibble: 6 × 9
+##   gene  cluster_id         logFC logCPM       F   p_val p_adj.loc p_adj contrast
+##   <chr> <chr>              <dbl>  <dbl>   <dbl>   <dbl>     <dbl> <dbl> <chr>   
+## 1 A1BG  L_T_TIM3._CD38._…  0.127   4.46  0.175  0.68        0.984 0.984 M-(S+A)…
+## 2 AAAS  L_T_TIM3._CD38._… -0.072   4.78  0.0678 0.797       1     1     M-(S+A)…
+## 3 AAGAB L_T_TIM3._CD38._…  0.492   5.01  4.73   0.04        0.481 0.481 M-(S+A)…
+## 4 AAK1  L_T_TIM3._CD38._… -0.521   6.83 12.7    0.00163     0.138 0.138 M-(S+A)…
+## 5 AAMDC L_T_TIM3._CD38._… -0.44    3.95  2.15   0.156       0.724 0.724 M-(S+A)…
+## 6 AAMP  L_T_TIM3._CD38._…  0.224   6.04  2.95   0.235       0.806 0.806 M-(S+A)…
+

Evaluate the distributions of p-values:

+
DE_info$hist_pvals
+

+

These distributions look fine (uniform distribution, except peak at p-value <= 0.05), so we will continue using these regular p-values. In case these p-value distributions look irregular, you can estimate empirical p-values as we will demonstrate in another vignette.

+
empirical_pval = FALSE
+
if(empirical_pval == TRUE){
+  DE_info_emp = get_empirical_pvals(DE_info$celltype_de$de_output_tidy)
+  celltype_de = DE_info_emp$de_output_tidy_emp %>% select(-p_val, -p_adj) %>% 
+    rename(p_val = p_emp, p_adj = p_adj_emp)
+} else {
+  celltype_de = DE_info$celltype_de$de_output_tidy
+} 
+
+
+

2.4.2 Combine DE information for ligand-senders and receptors-receivers

+

To end this step, we will combine the DE information of senders and receivers by linking their ligands and receptors together based on the prior knowledge ligand-receptor network.

+
sender_receiver_de = combine_sender_receiver_de(
+  sender_de = celltype_de,
+  receiver_de = celltype_de,
+  senders_oi = senders_oi,
+  receivers_oi = receivers_oi,
+  lr_network = lr_network
+)
+
sender_receiver_de %>% head(20)
+## # A tibble: 20 × 12
+##    contrast  sender             receiver ligand receptor lfc_ligand lfc_receptor
+##    <chr>     <chr>              <chr>    <chr>  <chr>         <dbl>        <dbl>
+##  1 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… C3     VSIG4         2.96        5.76  
+##  2 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… IL10   IL10RB        5.61        0.715 
+##  3 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… IL18   IL18RAP       1.93        4.39  
+##  4 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… IL10   IL10RB        5.61        0.184 
+##  5 M-(S+A)/2 L_NK_CD56._CD16.   L_T_TIM… IL18   IL18RAP       1.29        4.39  
+##  6 M-(S+A)/2 M_Monocyte_CD16    L_NK_CD… IL10   IL10RA        5.61        0.0393
+##  7 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… IL10   IL10RA        5.61       -0.0612
+##  8 M-(S+A)/2 M_Monocyte_CD16    L_NK_CD… IL10   IL10RB        5.61       -0.0727
+##  9 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… IL10   IL10RA        5.61       -0.316 
+## 10 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… C3     ITGAM         2.96        2.2   
+## 11 A-(S+M)/2 L_T_TIM3._CD38._H… M_Monoc… SCGB3… MARCO         4.49        0.611 
+## 12 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… IL1B   IL1RAP        1.53        3.56  
+## 13 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… THBS1  CD47          4.58        0.446 
+## 14 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… TNFSF… CD163         0.484       4.54  
+## 15 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… THBS1  ITGA6         4.58        0.382 
+## 16 M-(S+A)/2 M_Monocyte_CD16    L_NK_CD… THBS1  ITGA6         4.58        0.352 
+## 17 M-(S+A)/2 M_Monocyte_CD16    L_NK_CD… THBS1  ITGA4         4.58        0.283 
+## 18 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… THBS1  ITGA6         4.58        0.263 
+## 19 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… HMGB1  CD163         0.226       4.54  
+## 20 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HMGB1  CD163         0.154       4.54  
+## # ℹ 5 more variables: ligand_receptor_lfc_avg <dbl>, p_val_ligand <dbl>,
+## #   p_adj_ligand <dbl>, p_val_receptor <dbl>, p_adj_receptor <dbl>
+
+
+
+

2.5 Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes

+

In this step, we will predict NicheNet ligand activities and NicheNet ligand-target links based on these differential expression results. We do this to prioritize interactions based on their predicted effect on a receiver cell type. We will assume that the most important group-specific interactions are those that lead to group-specific gene expression changes in a receiver cell type.

+

Similarly to base NicheNet (https://github.com/saeyslab/nichenetr), we use the DE output to create a “geneset of interest”: here we assume that DE genes within a cell type may be DE because of differential cell-cell communication processes. In the ligand activity prediction, we will assess the enrichment of target genes of ligands within this geneset of interest. In case high-probabiliy target genes of a ligand are enriched in this set compared to the background of expressed genes, we predict that this ligand may have a high activity.

+

Because the ligand activity analysis is an enrichment procedure, it is important that this geneset of interest should contain a sufficient but not too large number of genes. The ratio geneset_oi/background should ideally be between 1/200 and 1/10 (or close to these ratios).

+

To determine the genesets of interest based on DE output, we need to define some logFC and/or p-value thresholds per cell type/contrast combination. In general, we recommend inspecting the nr. of DE genes for all cell types based on the default thresholds and adapting accordingly. By default, we will apply the p-value cutoff on the normal p-values, and not on the p-values corrected for multiple testing. This choice was made because most multi-sample single-cell transcriptomics datasets have just a few samples per group and we might have a lack of statistical power due to pseudobulking. But, if the smallest group >= 20 samples, we typically recommend using p_val_adj = TRUE. When the biological difference between the conditions is very large, we typically recommend increasing the logFC_threshold and/or using p_val_adj = TRUE.

+
+

2.5.1 Assess geneset_oi-vs-background ratios for different DE output tresholds prior to the NicheNet ligand activity analysis

+

We will first inspect the geneset_oi-vs-background ratios for the default tresholds:

+
logFC_threshold = 0.50
+p_val_threshold = 0.05
+
p_val_adj = FALSE 
+
geneset_assessment = contrast_tbl$contrast %>% 
+  lapply(
+    process_geneset_data, 
+    celltype_de, logFC_threshold, p_val_adj, p_val_threshold
+  ) %>% 
+  bind_rows() 
+geneset_assessment
+## # A tibble: 9 × 12
+##   cluster_id            n_background n_geneset_up n_geneset_down prop_geneset_up
+##   <chr>                        <int>        <int>          <int>           <dbl>
+## 1 L_NK_CD56._CD16.              6621          162             82          0.0245
+## 2 L_T_TIM3._CD38._HLAD…         8461          401            194          0.0474
+## 3 M_Monocyte_CD16               8817          647            438          0.0734
+## 4 L_NK_CD56._CD16.              6621          150            219          0.0227
+## 5 L_T_TIM3._CD38._HLAD…         8461          201            320          0.0238
+## 6 M_Monocyte_CD16               8817          368            254          0.0417
+## 7 L_NK_CD56._CD16.              6621          118            110          0.0178
+## 8 L_T_TIM3._CD38._HLAD…         8461          225            150          0.0266
+## 9 M_Monocyte_CD16               8817          262            464          0.0297
+## # ℹ 7 more variables: prop_geneset_down <dbl>, in_range_up <lgl>,
+## #   in_range_down <lgl>, contrast <chr>, logFC_threshold <dbl>,
+## #   p_val_threshold <dbl>, adjusted <lgl>
+

We can see here that for all cell type / contrast combinations, all geneset/background ratio’s are within the recommended range (in_range_up and in_range_down columns). When these geneset/background ratio’s would not be within the recommended ranges, we should interpret ligand activity results for these cell types with more caution, or use different thresholds (for these or all cell types).

+

For the sake of demonstration, we will also calculate these ratio’s in case we would use the adjusted p-value as threshold.

+
geneset_assessment_adjustedPval = contrast_tbl$contrast %>% 
+  lapply(
+    process_geneset_data, 
+    celltype_de, logFC_threshold, p_val_adj = TRUE, p_val_threshold
+    ) %>% 
+  bind_rows() 
+geneset_assessment_adjustedPval
+## # A tibble: 9 × 12
+##   cluster_id            n_background n_geneset_up n_geneset_down prop_geneset_up
+##   <chr>                        <int>        <int>          <int>           <dbl>
+## 1 L_NK_CD56._CD16.              6621            7              0        0.00106 
+## 2 L_T_TIM3._CD38._HLAD…         8461           15              5        0.00177 
+## 3 M_Monocyte_CD16               8817           25             11        0.00284 
+## 4 L_NK_CD56._CD16.              6621           28             50        0.00423 
+## 5 L_T_TIM3._CD38._HLAD…         8461            2              5        0.000236
+## 6 M_Monocyte_CD16               8817           10             15        0.00113 
+## 7 L_NK_CD56._CD16.              6621           36             19        0.00544 
+## 8 L_T_TIM3._CD38._HLAD…         8461           13              1        0.00154 
+## 9 M_Monocyte_CD16               8817            4              3        0.000454
+## # ℹ 7 more variables: prop_geneset_down <dbl>, in_range_up <lgl>,
+## #   in_range_down <lgl>, contrast <chr>, logFC_threshold <dbl>,
+## #   p_val_threshold <dbl>, adjusted <lgl>
+

We can see here that for most cell type / contrast combinations, the geneset/background ratio’s are not within the recommended range. Therefore, we will proceed with the default tresholds for the ligand activity analysis

+
+
+

2.5.2 Perform the ligand activity analysis and ligand-target inference

+

After the ligand activity prediction, we will also infer the predicted target genes of these ligands in each contrast. For this ligand-target inference procedure, we also need to select which top n of the predicted target genes will be considered (here: top 250 targets per ligand). This parameter will not affect the ligand activity predictions. It will only affect ligand-target visualizations and construction of the intercellular regulatory network during the downstream analysis. We recommend users to test other settings in case they would be interested in exploring fewer, but more confident target genes, or vice versa.

+
top_n_target = 250
+

The NicheNet ligand activity analysis can be run in parallel for each receiver cell type, by changing the number of cores as defined here. Using more cores will speed up the analysis at the cost of needing more memory. This is only recommended if you have many receiver cell types of interest.

+
verbose = TRUE
+cores_system = 8
+n.cores = min(cores_system, celltype_de$cluster_id %>% unique() %>% length()) 
+

Running the ligand activity prediction will take some time (the more cell types and contrasts, the more time)

+
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings(
+  get_ligand_activities_targets_DEgenes(
+    receiver_de = celltype_de,
+    receivers_oi = intersect(receivers_oi, celltype_de$cluster_id %>% unique()),
+    ligand_target_matrix = ligand_target_matrix,
+    logFC_threshold = logFC_threshold,
+    p_val_threshold = p_val_threshold,
+    p_val_adj = p_val_adj,
+    top_n_target = top_n_target,
+    verbose = verbose, 
+    n.cores = n.cores
+  )
+))
+

You can check the output of the ligand activity and ligand-target inference here:

+
ligand_activities_targets_DEgenes$ligand_activities %>% head(20)
+## # A tibble: 20 × 8
+## # Groups:   receiver, contrast [1]
+##    ligand activity contrast  target   ligand_target_weight receiver        
+##    <chr>     <dbl> <chr>     <chr>                   <dbl> <chr>           
+##  1 A2M      0.0282 M-(S+A)/2 ACOT7                 0.00632 L_NK_CD56._CD16.
+##  2 A2M      0.0282 M-(S+A)/2 AREG                  0.00638 L_NK_CD56._CD16.
+##  3 A2M      0.0282 M-(S+A)/2 CD55                  0.00649 L_NK_CD56._CD16.
+##  4 A2M      0.0282 M-(S+A)/2 CD74                  0.00634 L_NK_CD56._CD16.
+##  5 A2M      0.0282 M-(S+A)/2 FKBP5                 0.00723 L_NK_CD56._CD16.
+##  6 A2M      0.0282 M-(S+A)/2 FOS                   0.0146  L_NK_CD56._CD16.
+##  7 A2M      0.0282 M-(S+A)/2 GADD45A               0.0110  L_NK_CD56._CD16.
+##  8 A2M      0.0282 M-(S+A)/2 H2AC6                 0.00747 L_NK_CD56._CD16.
+##  9 A2M      0.0282 M-(S+A)/2 H2BC12                0.00692 L_NK_CD56._CD16.
+## 10 A2M      0.0282 M-(S+A)/2 ISG20                 0.00738 L_NK_CD56._CD16.
+## 11 A2M      0.0282 M-(S+A)/2 LMNB1                 0.00699 L_NK_CD56._CD16.
+## 12 A2M      0.0282 M-(S+A)/2 MYC                   0.0199  L_NK_CD56._CD16.
+## 13 A2M      0.0282 M-(S+A)/2 NFKB1                 0.00895 L_NK_CD56._CD16.
+## 14 A2M      0.0282 M-(S+A)/2 NFKBIZ                0.00661 L_NK_CD56._CD16.
+## 15 A2M      0.0282 M-(S+A)/2 PPP1R15A              0.00776 L_NK_CD56._CD16.
+## 16 A2M      0.0282 M-(S+A)/2 SLC1A5                0.00710 L_NK_CD56._CD16.
+## 17 A2M      0.0282 M-(S+A)/2 SMAD3                 0.00776 L_NK_CD56._CD16.
+## 18 A2M      0.0282 M-(S+A)/2 SOCS3                 0.00968 L_NK_CD56._CD16.
+## 19 A2M      0.0282 M-(S+A)/2 TFRC                  0.00712 L_NK_CD56._CD16.
+## 20 A2M      0.0282 M-(S+A)/2 WARS1                 0.00743 L_NK_CD56._CD16.
+## # ℹ 2 more variables: direction_regulation <fct>, activity_scaled <dbl>
+
+
+
+

2.6 Prioritization: rank cell-cell communication patterns through multi-criteria prioritization

+

In the previous steps, we calculated expression, differential expression and NicheNet ligand activity. In the final step, we will now combine all calculated information to rank all sender-ligand—receiver-receptor pairs according to group/condition specificity. We will use the following criteria to prioritize ligand-receptor interactions:

    -
  • Upregulation of the ligand in a sender cell type and/or upregulation -of the receptor in a receiver cell type - in the condition of interest. -: de_ligand and de_receptor
  • -
  • Sufficiently high expression levels of ligand and receptor in many -samples of the same group (to mitigate the influence of outlier -samples). : frac_exprs_ligand_receptor
  • -
  • Cell-type and condition specific expression of the ligand in the -sender cell type and receptor in the receiver cell type (to mitigate the -influence of upregulated but still relatively weakly expressed -ligands/receptors) : exprs_ligand and -exprs_receptor
  • -
  • High NicheNet ligand activity, to further prioritize ligand-receptor -pairs based on their predicted effect of the ligand-receptor interaction -on the gene expression in the receiver cell type : -activity_scaled
  • -
  • High relative abundance of sender and/or receiver in the condition -of interest: abund_sender and abund_receiver -(experimental feature - not recommended to give non-zero weights for -default analyses)
  • +
  • Upregulation of the ligand in a sender cell type and/or upregulation of the receptor in a receiver cell type - in the condition of interest.
  • +
  • Cell-type specific expression of the ligand in the sender cell type and receptor in the receiver cell type in the condition of interest (to mitigate the influence of upregulated but still relatively weakly expressed ligands/receptors).
  • +
  • Sufficiently high expression levels of ligand and receptor in many samples of the same group.
  • +
  • High NicheNet ligand activity, to further prioritize ligand-receptor pairs based on their predicted effect of the ligand-receptor interaction on the gene expression in the receiver cell type.
-

The different properties of the sender-ligand—receiver-receptor pairs -can be weighted according to the user’s preference and insight in the -dataset at hand.

-

Define -the prioritization weights, and prepare grouping objects

-

We will set our preference for this dataset as follows - and -recommend the user to use the same weights by default:

-
prioritizing_weights_DE = c("de_ligand" = 1,
-                         "de_receptor" = 1)
-prioritizing_weights_activity = c("activity_scaled" = 2)
-
-prioritizing_weights_expression_specificity = c("exprs_ligand" = 2,
-                         "exprs_receptor" = 2)
-
-prioritizing_weights_expression_sufficiency = c("frac_exprs_ligand_receptor" = 1)
-
-prioritizing_weights_relative_abundance = c( "abund_sender" = 0,
-                         "abund_receiver" = 0)
-
prioritizing_weights = c(prioritizing_weights_DE, 
-                         prioritizing_weights_activity, 
-                         prioritizing_weights_expression_specificity,
-                         prioritizing_weights_expression_sufficiency, 
-                         prioritizing_weights_relative_abundance)
-

Make necessary grouping data frame

-
sender_receiver_tbl = sender_receiver_de %>% dplyr::distinct(sender, receiver)
-
-metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble()
-
-if(!is.na(batches)){
-  grouping_tbl = metadata_combined[,c(sample_id, group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct()
-  colnames(grouping_tbl) = c("sample","group",batches)
-} else {
-  grouping_tbl = metadata_combined[,c(sample_id, group_id)] %>% tibble::as_tibble() %>% dplyr::distinct()
-  colnames(grouping_tbl) = c("sample","group")
-}
-

Crucial note: grouping_tbl: group should be the same as in the -contrast_tbl, and as in the expression info tables! Rename accordingly -if this would not be the case. If you followed the guidelines of this -tutorial closely, there should be no problem.

-

Run the prioritization

-
prioritization_tables = suppressMessages(generate_prioritization_tables(
-  sender_receiver_info = abundance_expression_info$sender_receiver_info,
-  sender_receiver_de = sender_receiver_de,
-  ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
-  contrast_tbl = contrast_tbl,
-  sender_receiver_tbl = sender_receiver_tbl,
-  grouping_tbl = grouping_tbl,
-  prioritizing_weights = prioritizing_weights,
-  fraction_cutoff = fraction_cutoff, 
-  abundance_data_receiver = abundance_expression_info$abundance_data_receiver,
-  abundance_data_sender = abundance_expression_info$abundance_data_sender
-))
+

We will combine these prioritization criteria in a single aggregated prioritization score. In the default setting, we will weigh each of these criteria equally (scenario = "regular"). This setting is strongly recommended. However, we also provide some additional setting to accomodate different biological scenarios. The setting scenario = "lower_DE" halves the weight for DE criteria and doubles the weight for ligand activity. This is recommended in case your hypothesis is that the differential CCC patterns in your data are less likely to be driven by DE (eg in cases of differential migration into a niche). The setting scenario = "no_frac_LR_expr" ignores the criterion “Sufficiently high expression levels of ligand and receptor in many samples of the same group”. This may be interesting for users that have data with a limited number of samples and don’t want to penalize interactions if they are not sufficiently expressed in some samples.

+

Finally, we still need to make one choice. For NicheNet ligand activity we can choose to prioritize ligands that only induce upregulation of target genes (ligand_activity_down = FALSE) or can lead potentially lead to both up- and downregulation (ligand_activity_down = TRUE). The benefit of ligand_activity_down = FALSE is ease of interpretability: prioritized ligand-receptor pairs will be upregulated in the condition of interest, just like their target genes. ligand_activity_down = TRUE can be harder to interpret because target genes of some interactions may be upregulated in the other conditions compared to the condition of interest. This is harder to interpret, but may help to pick up interactions that can also repress gene expression.

+

Here we will choose for setting ligand_activity_down = FALSE and focus specifically on upregulating ligands.

+
ligand_activity_down = FALSE
+
sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver)
+
+metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble()
+
+if(!is.na(batches)){
+  grouping_tbl = metadata_combined[,c(sample_id, group_id, batches)] %>% 
+    tibble::as_tibble() %>% distinct()
+  colnames(grouping_tbl) = c("sample","group",batches)
+} else {
+  grouping_tbl = metadata_combined[,c(sample_id, group_id)] %>% 
+    tibble::as_tibble() %>% distinct()
+  colnames(grouping_tbl) = c("sample","group")
+}
+
+prioritization_tables = suppressMessages(generate_prioritization_tables(
+    sender_receiver_info = abundance_expression_info$sender_receiver_info,
+    sender_receiver_de = sender_receiver_de,
+    ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
+    contrast_tbl = contrast_tbl,
+    sender_receiver_tbl = sender_receiver_tbl,
+    grouping_tbl = grouping_tbl,
+    scenario = "regular", # all prioritization criteria will be weighted equally
+    fraction_cutoff = fraction_cutoff, 
+    abundance_data_receiver = abundance_expression_info$abundance_data_receiver,
+    abundance_data_sender = abundance_expression_info$abundance_data_sender,
+    ligand_activity_down = ligand_activity_down
+  ))

Check the output tables

First: group-based summary table

-
prioritization_tables$group_prioritization_tbl %>% head(20)
-## # A tibble: 20 x 60
-##    contrast  group sender     recei~1 ligand recep~2 lfc_l~3 lfc_r~4 ligan~5 p_val~6 p_adj~7 p_val~8 p_adj~9 activity direc~* activ~* lr_in~* id    avg_l~* avg_r~* ligan~* fract~* fract~* ligan~* rel_a~* rel_a~* sende~*
-##    <chr>     <chr> <chr>      <chr>   <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <fct>     <dbl> <chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
-##  1 A-(S+M)/2 A     M_Monocyt~ L_T_TI~ NAMPT  ITGB1     0.633   0.835   0.734 0.0525    0.584 0.00849   0.643  1.06e-2 up       0.0376 NAMPT_~ NAMP~   1.35    0.907  1.22     0.730   0.554  0.405    0.001   0.130  0.0656
-##  2 A-(S+M)/2 A     M_Monocyt~ L_T_TI~ NAMPT  ITGB1     0.633   0.835   0.734 0.0525    0.584 0.00849   0.643  2.13e-2 down     4.11   NAMPT_~ NAMP~   1.35    0.907  1.22     0.730   0.554  0.405    0.001   0.130  0.0656
-##  3 M-(S+A)/2 M     L_T_TIM3.~ M_Mono~ IFNG   IFNGR1    1.26    0.507   0.884 0.0233    0.376 0.101     0.396  5.17e-2 up       8.69   IFNG_I~ IFNG~   0.155   0.575  0.0889   0.114   0.498  0.0567   1.00    0.389  0.695 
-##  4 M-(S+A)/2 M     L_T_TIM3.~ M_Mono~ IFNG   IFNGR1    1.26    0.507   0.884 0.0233    0.376 0.101     0.396  5.15e-3 down    -2.51   IFNG_I~ IFNG~   0.155   0.575  0.0889   0.114   0.498  0.0567   1.00    0.389  0.695 
-##  5 A-(S+M)/2 A     M_Monocyt~ L_T_TI~ LGALS3 LAG3      0.915   1.71    1.31  0.00129   0.224 0.0012    0.334  1.75e-2 up       1.35   LGALS3~ LGAL~   1.80    0.978  1.76     0.835   0.499  0.417    0.001   0.130  0.0656
-##  6 A-(S+M)/2 A     M_Monocyt~ L_T_TI~ LGALS3 LAG3      0.915   1.71    1.31  0.00129   0.224 0.0012    0.334  1.27e-2 down     0.886  LGALS3~ LGAL~   1.80    0.978  1.76     0.835   0.499  0.417    0.001   0.130  0.0656
-##  7 S-(M+A)/2 S     M_Monocyt~ L_NK_C~ TGFB1  TGFBR3    0.347   0.278   0.312 0.0336    0.467 0.0157    0.24   9.26e-3 up       4.39   TGFB1_~ TGFB~   0.919   0.461  0.424    0.631   0.253  0.160    0.765   0.459  0.612 
-##  8 S-(M+A)/2 S     M_Monocyt~ L_NK_C~ TGFB1  TGFBR3    0.347   0.278   0.312 0.0336    0.467 0.0157    0.24   2.05e-2 down    -0.0993 TGFB1_~ TGFB~   0.919   0.461  0.424    0.631   0.253  0.160    0.765   0.459  0.612 
-##  9 S-(M+A)/2 S     L_NK_CD56~ L_NK_C~ TGFB1  TGFBR3    0.197   0.278   0.238 0.0778    0.459 0.0157    0.24   9.26e-3 up       4.39   TGFB1_~ TGFB~   0.897   0.461  0.414    0.463   0.253  0.117    0.459   0.459  0.459 
-## 10 S-(M+A)/2 S     L_NK_CD56~ L_NK_C~ TGFB1  TGFBR3    0.197   0.278   0.238 0.0778    0.459 0.0157    0.24   2.05e-2 down    -0.0993 TGFB1_~ TGFB~   0.897   0.461  0.414    0.463   0.253  0.117    0.459   0.459  0.459 
-## 11 S-(M+A)/2 S     M_Monocyt~ M_Mono~ TGFB1  ENG       0.347   1.36    0.854 0.0336    0.467 0.00936   0.299  1.17e-2 up       1.64   TGFB1_~ TGFB~   0.919   0.147  0.135    0.631   0.125  0.0788   0.765   0.765  0.765 
-## 12 S-(M+A)/2 S     M_Monocyt~ M_Mono~ TGFB1  ENG       0.347   1.36    0.854 0.0336    0.467 0.00936   0.299  3.28e-2 down     0.354  TGFB1_~ TGFB~   0.919   0.147  0.135    0.631   0.125  0.0788   0.765   0.765  0.765 
-## 13 A-(S+M)/2 A     M_Monocyt~ M_Mono~ ADAM10 NOTCH4    0.391   1.69    1.04  0.119     0.663 0.00183   0.236  9.22e-3 up      -1.16   ADAM10~ ADAM~   0.685   0.183  0.126    0.532   0.181  0.0960   0.001   0.001  0.001 
-## 14 A-(S+M)/2 A     M_Monocyt~ M_Mono~ ADAM10 NOTCH4    0.391   1.69    1.04  0.119     0.663 0.00183   0.236  1.03e-4 down     2.04   ADAM10~ ADAM~   0.685   0.183  0.126    0.532   0.181  0.0960   0.001   0.001  0.001 
-## 15 A-(S+M)/2 A     M_Monocyt~ M_Mono~ ADAM10 TSPAN14   0.391   0.372   0.382 0.119     0.663 0.0825    0.628  9.22e-3 up      -1.16   ADAM10~ ADAM~   0.685   1.10   0.751    0.532   0.710  0.378    0.001   0.001  0.001 
-## 16 A-(S+M)/2 A     M_Monocyt~ M_Mono~ ADAM10 TSPAN14   0.391   0.372   0.382 0.119     0.663 0.0825    0.628  1.03e-4 down     2.04   ADAM10~ ADAM~   0.685   1.10   0.751    0.532   0.710  0.378    0.001   0.001  0.001 
-## 17 S-(M+A)/2 S     L_NK_CD56~ M_Mono~ TGFB1  ENG       0.197   1.36    0.779 0.0778    0.459 0.00936   0.299  1.17e-2 up       1.64   TGFB1_~ TGFB~   0.897   0.147  0.132    0.463   0.125  0.0578   0.459   0.765  0.612 
-## 18 S-(M+A)/2 S     L_NK_CD56~ M_Mono~ TGFB1  ENG       0.197   1.36    0.779 0.0778    0.459 0.00936   0.299  3.28e-2 down     0.354  TGFB1_~ TGFB~   0.897   0.147  0.132    0.463   0.125  0.0578   0.459   0.765  0.612 
-## 19 A-(S+M)/2 A     M_Monocyt~ M_Mono~ S100A8 CD36      0.536   1.39    0.963 0.378     0.866 0.0907    0.633  1.68e-2 up       1.65   S100A8~ S100~   2.36    0.296  0.699    0.772   0.240  0.185    0.001   0.001  0.001 
-## 20 A-(S+M)/2 A     M_Monocyt~ M_Mono~ S100A8 CD36      0.536   1.39    0.963 0.378     0.866 0.0907    0.633 -3.57e-3 down    -0.159  S100A8~ S100~   2.36    0.296  0.699    0.772   0.240  0.185    0.001   0.001  0.001 
-## # ... with 33 more variables: lfc_pval_ligand <dbl>, p_val_ligand_adapted <dbl>, scaled_lfc_ligand <dbl>, scaled_p_val_ligand <dbl>, scaled_lfc_pval_ligand <dbl>, scaled_p_val_ligand_adapted <dbl>,
-## #   lfc_pval_receptor <dbl>, p_val_receptor_adapted <dbl>, scaled_lfc_receptor <dbl>, scaled_p_val_receptor <dbl>, scaled_lfc_pval_receptor <dbl>, scaled_p_val_receptor_adapted <dbl>, activity_up <dbl>,
-## #   activity_scaled_up <dbl>, scaled_activity_scaled_up <dbl>, scaled_activity_up <dbl>, activity_down <dbl>, activity_scaled_down <dbl>, scaled_activity_scaled_down <dbl>, scaled_activity_down <dbl>,
-## #   scaled_avg_exprs_ligand <dbl>, scaled_avg_frq_ligand <dbl>, pb_ligand_group <dbl>, scaled_pb_ligand <dbl>, scaled_avg_exprs_receptor <dbl>, scaled_avg_frq_receptor <dbl>, pb_receptor_group <dbl>,
-## #   scaled_pb_receptor <dbl>, fraction_expressing_ligand_receptor <dbl>, max_scaled_activity <dbl>, na.rm <lgl>, prioritization_score <dbl>, top_group <chr>, and abbreviated variable names 1: receiver, 2: receptor,
-## #   3: lfc_ligand, 4: lfc_receptor, 5: ligand_receptor_lfc_avg, 6: p_val_ligand, 7: p_adj_ligand, 8: p_val_receptor, 9: p_adj_receptor, *: direction_regulation, *: activity_scaled, *: lr_interaction,
-## #   *: avg_ligand_group, *: avg_receptor_group, *: ligand_receptor_prod_group, *: fraction_ligand_group, *: fraction_receptor_group, *: ligand_receptor_fraction_prod_group, *: rel_abundance_scaled_sender, ...
-

Step -5: Add information on prior knowledge and expression correlation between -LR and target expression.

-

In multi-sample datasets, we have the opportunity to look whether -expression of ligand-receptor across all samples is correlated with the -expression of their by NicheNet predicted target genes. This is what we -will do with the following line of code:

-
lr_target_prior_cor = lr_target_prior_cor_inference(prioritization_tables$group_prioritization_tbl$receiver %>% unique(), abundance_expression_info, celltype_de, grouping_tbl, prioritization_tables, ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, p_val_adj = p_val_adj)
-

Save all the output of -MultiNicheNet

-

To avoid needing to redo the analysis later. All the output written -down here is sufficient to make all in-built downstream -visualizations.

-
path = "./"
-
-multinichenet_output = list(
-    celltype_info = abundance_expression_info$celltype_info,
-    celltype_de = celltype_de,
-    sender_receiver_info = abundance_expression_info$sender_receiver_info,
-    sender_receiver_de =  sender_receiver_de,
-    ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
-    prioritization_tables = prioritization_tables,
-    grouping_tbl = grouping_tbl,
-    lr_target_prior_cor = lr_target_prior_cor
-  ) 
-multinichenet_output = make_lite_output(multinichenet_output)
-
-save = FALSE
-if(save == TRUE){
-  saveRDS(multinichenet_output, paste0(path, "multinichenet_output.rds"))
-
-}
-

Visualization -of the results of the cell-cell communication analysis

-

In a first instance, we will look at the broad overview of -prioritized interactions via condition-specific Circos plots.

- -

We will look here at the top 50 predictions across all contrasts, -senders, and receivers of interest.

-
prioritized_tbl_oi_all = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, rank_per_group = FALSE)
-
prioritized_tbl_oi = multinichenet_output$prioritization_tables$group_prioritization_tbl %>%
-  filter(id %in% prioritized_tbl_oi_all$id) %>%
-  distinct(id, sender, receiver, ligand, receptor, group) %>% left_join(prioritized_tbl_oi_all)
-prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0
-
-senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort()
-
-colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
-colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
-
-circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
-

-

Visualization -of scaled ligand-receptor pseudobulk products and ligand activity

-

Now we will visualize per sample the scaled product of ligand and -receptor expression. Samples that were left out of the DE analysis are -indicated with a smaller dot (this helps to indicate the samples that -did not contribute to the calculation of the logFC, and thus not -contributed to the final prioritization)

-

We will now check the top 50 interactions specific for the MIS-C -group

-
group_oi = "M"
-
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi)
-
-plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50)
-plot_oi
-

-

Typically, there are way more than 50 differentially expressed and -active ligand-receptor pairs per group across all sender-receiver -combinations. Therefore it might be useful to zoom in on specific cell -types as senders/receivers:

+
prioritization_tables$group_prioritization_tbl %>% head(20)
+## # A tibble: 20 × 18
+##    contrast  group sender          receiver ligand receptor lr_interaction id   
+##    <chr>     <chr> <chr>           <chr>    <chr>  <chr>    <chr>          <chr>
+##  1 M-(S+A)/2 M     M_Monocyte_CD16 L_NK_CD… HLA.E  KLRC1    HLA.E_KLRC1    HLA.…
+##  2 M-(S+A)/2 M     L_T_TIM3._CD38… M_Monoc… IFNG   IFNGR1   IFNG_IFNGR1    IFNG…
+##  3 A-(S+M)/2 A     M_Monocyte_CD16 L_T_TIM… LGALS3 LAG3     LGALS3_LAG3    LGAL…
+##  4 M-(S+A)/2 M     L_T_TIM3._CD38… M_Monoc… IFNG   IFNGR2   IFNG_IFNGR2    IFNG…
+##  5 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… CXCL16 CXCR6    CXCL16_CXCR6   CXCL…
+##  6 A-(S+M)/2 A     M_Monocyte_CD16 M_Monoc… VCAN   TLR2     VCAN_TLR2      VCAN…
+##  7 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.E  CD8A     HLA.E_CD8A     HLA.…
+##  8 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… S100A9 CD68     S100A9_CD68    S100…
+##  9 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.D… LAG3     HLA.DRA_LAG3   HLA.…
+## 10 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… S100A8 CD69     S100A8_CD69    S100…
+## 11 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… HLA.F  LILRB1   HLA.F_LILRB1   HLA.…
+## 12 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… TNF    LTBR     TNF_LTBR       TNF_…
+## 13 A-(S+M)/2 A     L_T_TIM3._CD38… L_T_TIM… LGALS… ITGB1    LGALS3BP_ITGB1 LGAL…
+## 14 A-(S+M)/2 A     M_Monocyte_CD16 M_Monoc… S100A8 CD36     S100A8_CD36    S100…
+## 15 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… HLA.F  LILRB2   HLA.F_LILRB2   HLA.…
+## 16 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.A  CD8A     HLA.A_CD8A     HLA.…
+## 17 M-(S+A)/2 M     M_Monocyte_CD16 L_NK_CD… HLA.C  KIR2DL1  HLA.C_KIR2DL1  HLA.…
+## 18 S-(M+A)/2 S     L_T_TIM3._CD38… L_T_TIM… CD99   CD81     CD99_CD81      CD99…
+## 19 S-(M+A)/2 S     L_NK_CD56._CD1… M_Monoc… TGFB1  ENG      TGFB1_ENG      TGFB…
+## 20 S-(M+A)/2 S     M_Monocyte_CD16 M_Monoc… TGFB1  ENG      TGFB1_ENG      TGFB…
+## # ℹ 10 more variables: scaled_lfc_ligand <dbl>,
+## #   scaled_p_val_ligand_adapted <dbl>, scaled_lfc_receptor <dbl>,
+## #   scaled_p_val_receptor_adapted <dbl>, max_scaled_activity <dbl>,
+## #   scaled_pb_ligand <dbl>, scaled_pb_receptor <dbl>,
+## #   fraction_expressing_ligand_receptor <dbl>, prioritization_score <dbl>,
+## #   top_group <chr>
+

This table gives the final prioritization score of each interaction, and the values of the individual prioritization criteria.

+

With this step, all required steps are finished. Now, we can optionally still run the following steps +* Calculate the across-samples expression correlation between ligand-receptor pairs and target genes +* Prioritize communication patterns involving condition-specific cell types through an alternative prioritization scheme

+

Here we will only focus on the expression correlation step:

+
+
+

2.7 Calculate the across-samples expression correlation between ligand-receptor pairs and target genes

+

In multi-sample datasets, we have the opportunity to look whether expression of ligand-receptor across all samples is correlated with the expression of their by NicheNet predicted target genes. This is what we will do with the following line of code:

+
lr_target_prior_cor = lr_target_prior_cor_inference(
+  receivers_oi = prioritization_tables$group_prioritization_tbl$receiver %>% unique(), 
+  abundance_expression_info = abundance_expression_info, 
+  celltype_de = celltype_de, 
+  grouping_tbl = grouping_tbl, 
+  prioritization_tables = prioritization_tables, 
+  ligand_target_matrix = ligand_target_matrix, 
+  logFC_threshold = logFC_threshold, 
+  p_val_threshold = p_val_threshold, 
+  p_val_adj = p_val_adj
+  )
+
+
+

2.8 Save all the output of MultiNicheNet

+

To avoid needing to redo the analysis later, we will here to save an output object that contains all information to perform all downstream analyses.

+
path = "./"
+
+multinichenet_output = list(
+    celltype_info = abundance_expression_info$celltype_info,
+    celltype_de = celltype_de,
+    sender_receiver_info = abundance_expression_info$sender_receiver_info,
+    sender_receiver_de =  sender_receiver_de,
+    ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
+    prioritization_tables = prioritization_tables,
+    grouping_tbl = grouping_tbl,
+    lr_target_prior_cor = lr_target_prior_cor
+  ) 
+multinichenet_output = make_lite_output(multinichenet_output)
+
+save = FALSE
+if(save == TRUE){
+  saveRDS(multinichenet_output, paste0(path, "multinichenet_output.rds"))
+
+}
+
+
+
+

3 Interpreting the MultiNicheNet analysis output

+
+

3.1 Visualization of differential cell-cell interactions

+
+

3.1.1 Summarizing ChordDiagram circos plots

+

In a first instance, we will look at the broad overview of prioritized interactions via condition-specific Chordiagram circos plots.

+

We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest.

+
prioritized_tbl_oi_all = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  top_n = 50, 
+  rank_per_group = FALSE
+  )
+
prioritized_tbl_oi = 
+  multinichenet_output$prioritization_tables$group_prioritization_tbl %>%
+  filter(id %in% prioritized_tbl_oi_all$id) %>%
+  distinct(id, sender, receiver, ligand, receptor, group) %>% 
+  left_join(prioritized_tbl_oi_all)
+prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0
+
+senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort()
+
+colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
+colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
+
+circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
+

+
+
+

3.1.2 Interpretable bubble plots

+

Whereas these ChordDiagrams show the most specific interactions per group, they don’t give insights into the data behind these predictions. Therefore we will now look at visualizations that indicate the different prioritization criteria used in MultiNicheNet.

+

In the next type of plots, we will 1) visualize the per-sample scaled product of normalized ligand and receptor pseudobulk expression, 2) visualize the scaled ligand activities, 3) cell-type specificity.

+

We will now check the top 50 interactions specific for the MIS-C group

+
group_oi = "M"
+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  top_n = 50, 
+  groups_oi = group_oi)
+
plot_oi = make_sample_lr_prod_activity_plots(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50)
+plot_oi
+

+Samples that were left out of the DE analysis are indicated with a smaller dot (this helps to indicate the samples that did not contribute to the calculation of the logFC, and thus not contributed to the final prioritization)

+

As a further help for further prioritization, we can assess the level of curation of these LR pairs as defined by the Intercellular Communication part of the Omnipath database

+
prioritized_tbl_oi_M_50_omnipath = prioritized_tbl_oi_M_50 %>% 
+  inner_join(lr_network_all)
+

Now we add this to the bubble plot visualization:

+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50_omnipath)
+plot_oi
+

+

As you can see, the HEBP1-FPR2 interaction has no Omnipath DB scores. This is because this LR pair was not documented by the Omnipath LR database. Instead it was documented by the original NicheNet LR network (source: Guide2Pharmacology) as can be seen in the table.

+

Further note: Typically, there are way more than 50 differentially expressed and active ligand-receptor pairs per group across all sender-receiver combinations. Therefore it might be useful to zoom in on specific cell types as senders/receivers:

Eg M_Monocyte_CD16 as receiver:

-
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi, receivers_oi = "M_Monocyte_CD16")
-
-plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50)
-plot_oi
-

+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi, 
+  receivers_oi = "M_Monocyte_CD16")
+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all))
+plot_oi
+

Eg M_Monocyte_CD16 as sender:

-
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi, senders_oi = "M_Monocyte_CD16")
-
-plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50)
-plot_oi
-

-

You can make these plots also for the other groups, like we will -illustrate now for the S group

-
group_oi = "S"
-
prioritized_tbl_oi_S_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi)
-
-plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_S_50)
-plot_oi
-

-

Note: Use -make_sample_lr_prod_activity_batch_plots if you have -batches and want to visualize them on this plot!

- -

In another type of plot, we can visualize the ligand activities for a -group-receiver combination, and show the predicted ligand-target links, -and also the expression of the predicted target genes across -samples.

-

For this, we now need to define a receiver cell type of interest. As -example, we will take L_T_TIM3._CD38._HLADR. cells as -receiver, and look at the top 20 senderLigand-receiverReceptor pairs -with these cells as receiver.

-
group_oi = "M"
-receiver_oi = "L_T_TIM3._CD38._HLADR."
-prioritized_tbl_oi_M_10 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 10, groups_oi = group_oi, receivers_oi = receiver_oi)
-
combined_plot = make_ligand_activity_target_plot(group_oi, receiver_oi, prioritized_tbl_oi_M_10, multinichenet_output$prioritization_tables, multinichenet_output$ligand_activities_targets_DEgenes, contrast_tbl, multinichenet_output$grouping_tbl, multinichenet_output$celltype_info, ligand_target_matrix, plot_legend = FALSE)
-combined_plot
-## $combined_plot
-

+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi, 
+  senders_oi = "M_Monocyte_CD16")
+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all))
+plot_oi
+

+

You can make these plots also for the other groups, like we will illustrate now for the S group

+
group_oi = "S"
+
prioritized_tbl_oi_S_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi)
+
+plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_S_50 %>% inner_join(lr_network_all))
+plot_oi
+

+

Note: Use make_sample_lr_prod_activity_batch_plots if you have batches and want to visualize them on this plot!

+
+
+ +
+

3.3 Intercellular regulatory network inference and visualization

+

In the plots before, we demonstrated that some DE genes have both expression correlation and prior knowledge support to be downstream of ligand-receptor pairs. Interestingly, some target genes can be ligands or receptors themselves. This illustrates that cells can send signals to other cells, who as a response to these signals produce signals themselves to feedback to the original sender cells, or who will effect other cell types.

+

As last plot, we can generate a ‘systems’ view of these intercellular feedback and cascade processes than can be occuring between the different cell populations involved. In this plot, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein interactions! We will infer this intercellular regulatory network here for the top250 interactions.

+
prioritized_tbl_oi = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  250, 
+  rank_per_group = FALSE)
+
+lr_target_prior_cor_filtered = 
+  multinichenet_output$prioritization_tables$group_prioritization_tbl$group %>% unique() %>% 
+  lapply(function(group_oi){
+    lr_target_prior_cor_filtered = multinichenet_output$lr_target_prior_cor %>%
+      inner_join(
+        multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>%
+          distinct(ligand, target, direction_regulation, contrast)
+        ) %>% 
+      inner_join(contrast_tbl) %>% filter(group == group_oi)
+    
+    lr_target_prior_cor_filtered_up = lr_target_prior_cor_filtered %>% 
+      filter(direction_regulation == "up") %>% 
+      filter( (rank_of_target < top_n_target) & (pearson > 0.50 | spearman > 0.50))
+    
+    lr_target_prior_cor_filtered_down = lr_target_prior_cor_filtered %>% 
+      filter(direction_regulation == "down") %>% 
+      filter( (rank_of_target < top_n_target) & (pearson < -0.50 | spearman < -0.50))
+    lr_target_prior_cor_filtered = bind_rows(
+      lr_target_prior_cor_filtered_up, 
+      lr_target_prior_cor_filtered_down
+      )
+}) %>% bind_rows()
+
+lr_target_df = lr_target_prior_cor_filtered %>% 
+  distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation) 
+
network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi)
+network$links %>% head()
+## # A tibble: 6 × 6
+##   sender_ligand          receiver_target direction_regulation group type  weight
+##   <chr>                  <chr>           <fct>                <chr> <chr>  <dbl>
+## 1 L_T_TIM3._CD38._HLADR… L_NK_CD56._CD1… up                   M     Liga…      1
+## 2 M_Monocyte_CD16_ICAM1  L_NK_CD56._CD1… up                   M     Liga…      1
+## 3 M_Monocyte_CD16_B2M    M_Monocyte_CD1… up                   M     Liga…      1
+## 4 M_Monocyte_CD16_B2M    M_Monocyte_CD1… up                   M     Liga…      1
+## 5 M_Monocyte_CD16_HLA.A  M_Monocyte_CD1… up                   M     Liga…      1
+## 6 M_Monocyte_CD16_HLA.A  M_Monocyte_CD1… up                   M     Liga…      1
+network$nodes %>% head()
+## # A tibble: 6 × 4
+##   node                         celltype               gene    type_gene      
+##   <chr>                        <chr>                  <chr>   <chr>          
+## 1 L_T_TIM3._CD38._HLADR._ICAM3 L_T_TIM3._CD38._HLADR. ICAM3   ligand/receptor
+## 2 M_Monocyte_CD16_LILRB2       M_Monocyte_CD16        LILRB2  ligand/receptor
+## 3 M_Monocyte_CD16_ICAM3        M_Monocyte_CD16        ICAM3   ligand/receptor
+## 4 M_Monocyte_CD16_ITGB2        M_Monocyte_CD16        ITGB2   ligand/receptor
+## 5 M_Monocyte_CD16_SIGLEC9      M_Monocyte_CD16        SIGLEC9 ligand/receptor
+## 6 M_Monocyte_CD16_ENG          M_Monocyte_CD16        ENG     ligand/receptor
+
colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the  original yellow background with white font is not very readable
+network_graph = visualize_network(network, colors_sender)
+network_graph$plot
+

+

Interestingly, we can also use this network to further prioritize differential CCC interactions. Here we will assume that the most important LR interactions are the ones that are involved in this intercellular regulatory network. We can get these interactions as follows:

+
network$prioritized_lr_interactions
+## # A tibble: 117 × 5
+##    group sender                 receiver         ligand  receptor
+##    <chr> <chr>                  <chr>            <chr>   <chr>   
+##  1 M     L_T_TIM3._CD38._HLADR. L_NK_CD56._CD16. CLEC2B  KLRB1   
+##  2 M     M_Monocyte_CD16        L_NK_CD56._CD16. ICAM1   ITGAX   
+##  3 M     M_Monocyte_CD16        M_Monocyte_CD16  B2M     LILRB1  
+##  4 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.A   LILRB1  
+##  5 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.B   LILRB1  
+##  6 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.C   LILRB1  
+##  7 M     L_T_TIM3._CD38._HLADR. M_Monocyte_CD16  HLA.A   LILRB1  
+##  8 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.A   LILRB2  
+##  9 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.C   LILRB2  
+## 10 M     M_Monocyte_CD16        M_Monocyte_CD16  HLA.DRA CD63    
+## # ℹ 107 more rows
+
prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join(
+  network$prioritized_lr_interactions)
+prioritized_tbl_oi_network
+## # A tibble: 117 × 8
+##    group sender              receiver ligand receptor id    prioritization_score
+##    <chr> <chr>               <chr>    <chr>  <chr>    <chr>                <dbl>
+##  1 M     M_Monocyte_CD16     L_NK_CD… HLA.E  KLRC1    HLA.…                0.956
+##  2 M     L_T_TIM3._CD38._HL… M_Monoc… IFNG   IFNGR1   IFNG…                0.948
+##  3 M     L_T_TIM3._CD38._HL… M_Monoc… IFNG   IFNGR2   IFNG…                0.928
+##  4 M     M_Monocyte_CD16     L_T_TIM… CXCL16 CXCR6    CXCL…                0.922
+##  5 M     M_Monocyte_CD16     L_T_TIM… HLA.E  CD8A     HLA.…                0.919
+##  6 M     M_Monocyte_CD16     L_T_TIM… HLA.D… LAG3     HLA.…                0.910
+##  7 M     M_Monocyte_CD16     L_T_TIM… S100A8 CD69     S100…                0.907
+##  8 M     M_Monocyte_CD16     M_Monoc… HLA.F  LILRB1   HLA.…                0.903
+##  9 M     M_Monocyte_CD16     M_Monoc… TNF    LTBR     TNF_…                0.902
+## 10 M     M_Monocyte_CD16     M_Monoc… HLA.F  LILRB2   HLA.…                0.898
+## # ℹ 107 more rows
+## # ℹ 1 more variable: prioritization_rank <dbl>
+

Visualize now the expression and activity of these interactions for the M group

+
group_oi = "M"
+
prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi)
+
+plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M %>% inner_join(lr_network_all)
+  )
+plot_oi
+

+
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/vignettes/basic_analysis_steps_MISC.md b/vignettes/basic_analysis_steps_MISC.md index ffb1996..f34f13b 100644 --- a/vignettes/basic_analysis_steps_MISC.md +++ b/vignettes/basic_analysis_steps_MISC.md @@ -1,80 +1,55 @@ -MultiNicheNet analysis: MIS-C threewise comparison - step-by-step -================ -Robin Browaeys -2023-06-06 - - - -In this vignette, you can learn how to perform an all-vs-all -MultiNicheNet analysis. In this vignette, we start from one -SingleCellExperiment object containing cells from both sender and -receiver cell types and from different patients. - -A MultiNicheNet analysis can be performed if you have multi-sample, -multi-group single-cell data. MultiNicheNet will look for cell-cell -communication between the cell types in your data for each sample, and -compare the cell-cell communication patterns between the groups of -interest. Therefore, the absolute minimum of meta data you need to have, -are following columns indicating for each cell: the **group**, -**sample** and **cell type**. + + +rmarkdown::render(“vignettes/basic\_analysis\_steps\_MISC.Rmd”,output\_format += “md\_document”) + +In this vignette, you can learn how to perform a MultiNicheNet analysis +to compare cell-cell communication between conditions of interest. A +MultiNicheNet analysis can be performed if you have multi-sample, +multi-condition/group single-cell data. We strongly recommend having at +least 4 samples in each of the groups/conditions you want to compare. +With less samples, the benefits of performing a pseudobulk-based DE +analysis are less clear. For those datasets, you can check and run our +alternative workflow that makes use of cell-level sample-agnostic +differential expression tools. + +As input you need a SingleCellExperiment object containing at least the +raw count matrix and metadata providing the following information for +each cell: the **group**, **sample** and **cell type**. As example expression data of interacting cells, we will here use scRNAseq data of immune cells in MIS-C patients and healthy siblings from this paper of Hoste et al.: [TIM3+ TRBV11-2 T cells and IFNγ signature in patrolling monocytes and CD16+ NK cells delineate MIS-C](https://rupress.org/jem/article/219/2/e20211381/212918/TIM3-TRBV11-2-T-cells-and-IFN-signature-in) -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6362434.svg)](https://doi.org/10.5281/zenodo.6362434) -. MIS-C (multisystem inflammatory syndrome in children) is a novel rare +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6362434.svg)](https://doi.org/10.5281/zenodo.6362434). +MIS-C (multisystem inflammatory syndrome in children) is a novel rare immunodysregulation syndrome that can arise after SARS-CoV-2 infection -in children. We will use NicheNet to explore immune cell crosstalk -enriched in MIS-C compared to healthy siblings. - -In this vignette, we will prepare the data and analysis parameters, and -then perform the MultiNicheNet analysis. - -The different steps of the MultiNicheNet analysis are the following: - -- 0. Preparation of the analysis: load packages, NicheNet LR network & - ligand-target matrix, single-cell expression data, and define main - settings of the MultiNicheNet analysis - -- 1. Extract cell type abundance and expression information from - receiver and sender cell types, and link this expression - information for ligands of the sender cell types to the - corresponding receptors of the receiver cell types - -- 2. Perform genome-wide differential expression analysis of receiver - and sender cell types to define DE genes between the conditions of - interest. Based on this analysis, we can define the logFC/p-value - of ligands in senders and receptors in receivers, and define the - set of affected target genes in the receiver. - -- 3. Predict NicheNet ligand activities and NicheNet ligand-target - links based on these differential expression results +in children. We will use MultiNicheNet to explore immune cell crosstalk +enriched in MIS-C compared to healthy siblings and adult COVID-19 +patients. -- 4. Use the information collected above to prioritize all - sender-ligand—receiver-receptor pairs. +In this vignette, we will first prepare the MultiNicheNet core analysis, +then run the several steps in the MultiNicheNet core analysis, and +finally interpret the output. -- 5. Calculate correlation in expression between ligand-receptor pairs - and their predicted target genes +# Preparation of the MultiNicheNet core analysis -In this vignette, we will demonstrate all these steps one by one. + library(SingleCellExperiment) + library(dplyr) + library(ggplot2) + library(nichenetr) + library(multinichenetr) -After the MultiNicheNet analysis is done, we will explore the output of -the analysis with different ways of visualization. +## Load NicheNet’s ligand-receptor network and ligand-target matrix -# Step 0: Preparation of the analysis: load packages, NicheNet LR network & ligand-target matrix, single-cell expression data - -## Step 0.1: Load required packages and NicheNet ligand-receptor network and ligand-target matrix - -``` r -library(SingleCellExperiment) -library(dplyr) -library(ggplot2) -library(multinichenetr) -``` +MultiNicheNet builds upon the NicheNet framework and uses the same prior +knowledge networks (ligand-receptor network and ligand-target matrix, +currently v2 version). The Nichenet v2 networks and matrices for both mouse and human can be downloaded from Zenodo @@ -85,35 +60,77 @@ human patients. Gene names are here made syntactically valid via `make.names()` to avoid the loss of genes (eg H2-M3) in downstream visualizations. -``` r -organism = "human" -if(organism == "human"){ - lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_human_21122021.rds")) - lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor) %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor)) - ligand_target_matrix = readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds")) - colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% make.names() - rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% make.names() -} else if(organism == "mouse"){ - lr_network = readRDS(url("https://zenodo.org/record/7074291/files/lr_network_mouse_21122021.rds")) - lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor) %>% mutate(ligand = make.names(ligand), receptor = make.names(receptor)) - ligand_target_matrix = readRDS(url("https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds")) - colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% make.names() - rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% make.names() -} -``` - -## Step 0.2: Read in SingleCellExperiment Objects - -In this vignette, sender and receiver cell types are in the same -SingleCellExperiment object, which we will load here. In this vignette, -we will load in a subset of the scRNAseq data of the MIS-C + organism = "human" + + options(timeout = 120) + + if(organism == "human"){ + + lr_network_all = + readRDS(url( + "https://zenodo.org/record/10229222/files/lr_network_human_allInfo_30112033.rds" + )) %>% + mutate( + ligand = convert_alias_to_symbols(ligand, organism = organism), + receptor = convert_alias_to_symbols(receptor, organism = organism)) + + lr_network_all = lr_network_all %>% + mutate(ligand = make.names(ligand), receptor = make.names(receptor)) + + lr_network = lr_network_all %>% + distinct(ligand, receptor) + + ligand_target_matrix = readRDS(url( + "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds" + )) + + colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% + convert_alias_to_symbols(organism = organism) %>% make.names() + rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% + convert_alias_to_symbols(organism = organism) %>% make.names() + + lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix)) + ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()] + + } else if(organism == "mouse"){ + + lr_network_all = readRDS(url( + "https://zenodo.org/record/10229222/files/lr_network_mouse_allInfo_30112033.rds" + )) %>% + mutate( + ligand = convert_alias_to_symbols(ligand, organism = organism), + receptor = convert_alias_to_symbols(receptor, organism = organism)) + + lr_network_all = lr_network_all %>% + mutate(ligand = make.names(ligand), receptor = make.names(receptor)) + lr_network = lr_network_all %>% + distinct(ligand, receptor) + + ligand_target_matrix = readRDS(url( + "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds" + )) + + colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% + convert_alias_to_symbols(organism = organism) %>% make.names() + rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% + convert_alias_to_symbols(organism = organism) %>% make.names() + + lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix)) + ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()] + + } + +## Read in SingleCellExperiment Object + +In this vignette, we will load in a subset of the scRNAseq data of the +MIS-C [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8010790.svg)](https://doi.org/10.5281/zenodo.8010790). For the sake of demonstration, this subset only contains 3 cell types. These celltypes are some of the cell types that were found to be most interesting related to MIS-C according to Hoste et al.  If you start from a Seurat object, you can convert it easily to a -SingleCellExperiment via +SingleCellExperiment object via `sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA")`. Because the NicheNet 2.0. networks are in the most recent version of the @@ -122,541 +139,196 @@ the expression data are also updated (= converted from their “aliases” to official gene symbols). Afterwards, we will make them again syntactically valid. -``` r -sce = readRDS(url("https://zenodo.org/record/8010790/files/sce_subset_misc.rds")) -sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE() -## [1] "there are provided symbols that are not in the alias annotation table: " -## [1] "AL627309.1" "AL669831.5" "AL645608.8" "AL645608.2" "AL162741.1" "AL391244.3" "AL645728.1" "AL691432.2" "FO704657.1" "AL109917.1" "AL590822.2" "AL139246.5" "AL139246.3" "AL513320.1" "AL805961.2" -## [16] "AL365255.1" "AL031848.2" "Z98884.1" "AL034417.4" "AL034417.3" "AL096855.1" "AL139424.2" "AL354956.1" "AL139423.1" "AL109811.3" "AL109811.1" "AL109811.2" "AC243836.1" "AC254633.1" "AL031283.1" -## [31] "AL121992.3" "AL121992.1" "AL450998.2" "AL137802.2" "AL021920.3" "BX284668.2" "BX284668.5" "BX284668.6" "AL035413.1" "AL031727.2" "AL020998.1" "AL031005.1" "AL031728.1" "AL031281.2" "AL445686.2" -## [46] "AL445471.1" "AL445471.2" "AL031432.5" "AL031432.3" "AL606491.1" "AL031280.1" "AL020996.1" "AL033528.2" "AL391650.1" "AL512408.1" "BX293535.1" "AL512288.1" "AL353622.1" "AL353622.2" "AL360012.1" -## [61] "AC114488.2" "AC114488.3" "AL136115.2" "AL445248.1" "AL049795.1" "AL662907.3" "AC114490.3" "AC114490.2" "AC004865.2" "AL591845.1" "AL139260.1" "AL033527.3" "AL033527.5" "AL050341.2" "AL603839.3" -## [76] "AC098484.3" "AL512353.1" "AL139289.2" "AL139289.1" "AL451062.3" "AL357079.1" "AL139220.2" "AL592166.1" "AC104170.1" "AL050343.1" "AL445685.1" "AL606760.2" "AL606760.3" "AL606760.1" "AC119428.2" -## [91] "AL161644.1" "AC105277.1" "AC099791.2" "AL357078.1" "AL583808.1" "AC118549.1" "AC103591.3" "AL359504.2" "AL078459.1" "AL049597.2" "AC099063.4" "AC099063.1" "AC099568.2" "AC104836.1" "AC105942.1" -## [106] "AC092802.1" "AC118553.1" "AC118553.2" "AC104506.1" "AC093157.2" "AC093157.1" "AL109741.1" "AL390036.1" "AL359258.2" "AL356488.3" "AL356488.2" "SARS" "AC000032.1" "AL450468.2" "AL160006.1" -## [121] "AL355488.1" "AL365361.1" "AL360270.1" "AL360270.3" "AL355816.2" "AL354760.1" "AL603832.1" "AL357055.3" "AL137856.1" "AL390066.1" "AL445231.1" "AL157902.1" "AL359915.2" "AC244453.2" "AC244021.1" -## [136] "AL592494.3" "AC239800.2" "AC239800.3" "AC245595.1" "AC246785.3" "FP700111.1" "AC245014.3" "AC245014.1" "AC243547.3" "AC243547.2" "NOTCH2NL" "AC239799.2" "AC242426.2" "AC239809.3" "AC245297.3" -## [151] "AC245297.2" "AC239868.3" "AC239868.2" "AL590133.2" "AL391069.2" "AL391069.3" "AL391335.1" "AL589765.7" "AL450992.2" "AL162258.2" "AL358472.5" "AL358472.4" "AL358472.3" "AL358472.2" "AL451085.1" -## [166] "AL451085.2" "AC234582.1" "AC234582.2" "AL353807.2" "AL355388.2" "AL139412.1" "AL590560.1" "AL590560.2" "AL121987.2" "AL139011.1" "AL138930.1" "AL121985.1" "AL591806.3" "AL590714.1" "AL592295.4" -## [181] "AL359541.1" "AL596325.1" "AL356441.1" "AL451074.2" "AL451050.2" "AL359962.2" "AL031733.2" "AL021068.1" "AL031599.1" "AL645568.1" "AL121983.1" "AL121983.2" "Z99127.1" "AL513329.1" "AL359265.3" -## [196] "AL137796.1" "AL449106.1" "AL353708.3" "AL353708.1" "AL162431.2" "AL445228.2" "AL078644.2" "AL133383.1" "AL390957.1" "AL136987.1" "AL136454.1" "AL157402.2" "AC096677.1" "AC099676.1" "AC098934.4" -## [211] "AL512306.3" "AL512306.2" "AC244034.2" "AL137789.1" "AL031316.1" "AL606468.1" "AL451060.1" "AL360091.3" "AC092803.2" "AL590648.3" "AC096642.1" "AL513314.2" "AL592148.3" "AL392172.1" "AL359979.2" -## [226] "AC138393.3" "AC092809.4" "AC092811.2" "AL591895.1" "AL512343.2" "AL353593.1" "AL353593.2" "AL670729.1" "AL117350.1" "AL121990.1" "AL445524.1" "AL355472.3" "AL355472.4" "AL355472.1" "AL160408.2" -## [241] "AL391832.2" "AL391832.3" "AL359921.2" "AL359921.1" "BX323046.1" "AL356512.1" "AL591848.3" "AL390728.6" "AL390728.5" "AC138089.1" "AC098483.1" "AC092159.2" "AC093390.2" "AC231981.1" "AC108488.1" -## [256] "AC017076.1" "AC010904.2" "AC092580.2" "AC073195.1" "AC010969.2" "AC104794.2" "AC007240.1" "AC007249.2" "AC062028.1" "AC079148.1" "AC013400.1" "AC079145.1" "AC098828.2" "AC012065.3" "AC012065.4" -## [271] "AC018742.1" "AC012073.1" "AC012074.1" "AC104699.1" "AC013472.3" "AC013403.2" "AC074117.1" "AC093690.1" "AC104695.2" "AC104695.3" "AC092164.1" "AC016907.2" "AL121652.1" "AL121658.1" "AL133245.1" -## [286] "AC007378.1" "AC006369.1" "AC074366.1" "AC019171.1" "AC007388.1" "AC083949.1" "AC010883.1" "AC019129.2" "AC018682.1" "AC016722.3" "AC073283.1" "AC079807.1" "AC092650.1" "AC093635.1" "AC008280.3" -## [301] "AC093110.1" "AC012358.3" "AC015982.1" "AC007743.1" "AC016727.1" "AC107081.2" "AC009501.1" "AC012368.1" "AC008074.3" "AC007365.1" "AC017083.3" "AC017083.2" "AC017083.1" "AC022201.2" "AC007881.3" -## [316] "AC007878.1" "AC073046.1" "AC073263.1" "AC073263.2" "AC006030.1" "AC005041.1" "AC019069.1" "AC104135.1" "AC005034.3" "AC009309.1" "AC015971.1" "AC133644.2" "AC104134.1" "AC062029.1" "AC103563.7" -## [331] "AC092835.1" "AC009237.14" "AC021188.1" "AC092683.1" "AC109826.1" "AC079447.1" "AC092587.1" "AC018690.1" "AC092667.1" "AC016738.1" "AC104655.1" "AC012360.1" "AC012360.3" "AC010978.1" "AC013271.1" -## [346] "AC108463.3" "AC068491.3" "AC017002.3" "AC017002.1" "AC079922.2" "AC016683.1" "AC016745.2" "AC104653.1" "AC110769.2" "AC009312.1" "AC009404.1" "AC093901.1" "AC012447.1" "AC068282.1" "AC012306.2" -## [361] "AC073869.3" "AC011893.1" "AC114763.1" "AC092620.1" "AC007364.1" "AC091488.1" "AC009506.1" "AC009961.1" "AC009299.3" "AC008063.1" "AC009495.3" "AC007405.3" "AC078883.2" "AC078883.3" "AC078883.1" -## [376] "AC010894.2" "AC010894.4" "AC093459.1" "AC079305.1" "AC073834.1" "AC009948.1" "AC009948.4" "AC009948.3" "AC010680.3" "AC010680.2" "AC010680.1" "AC068196.1" "AC009962.1" "AC096667.1" "AC017071.1" -## [391] "AC013468.1" "AC108047.1" "AC005540.1" "AC064834.1" "AC114760.2" "AC020571.1" "AC013264.1" "AC010746.1" "AC011997.1" "AC007163.1" "AC005037.1" "AC064836.3" "AC007383.2" "AC007383.3" "AC008269.1" -## [406] "AC007879.2" "AC009226.1" "AC007038.2" "AC007038.1" "AC079610.2" "AC079610.1" "AC068051.1" "AC016708.1" "AC098820.2" "AC021016.3" "AC012510.1" "AC009974.1" "AC097468.3" "AC068946.1" "AC053503.2" -## [421] "AC079834.2" "AC097461.1" "AC097662.1" "AC009950.1" "AC012507.1" "AC012507.4" "AC073254.1" "AC105760.1" "AC105760.2" "AC112721.2" "AC104667.1" "AC012485.3" "AC016999.1" "AC062017.1" "SEPT2" -## [436] "AC114730.1" "AC131097.4" "AC093642.1" "AC024060.1" "AC018816.1" "AC026202.2" "AC026202.3" "AC026191.1" "AC018809.2" "AC034198.2" "AC093495.1" "AC090948.1" "AC090948.2" "AC090948.3" "AC116096.1" -## [451] "AC112220.4" "AC123023.1" "AC097359.3" "AC097359.2" "AC092053.3" "AC092053.2" "AC099541.1" "AC006059.1" "AC099329.3" "AC104184.1" "AC104187.1" "AC099669.1" "AC124045.1" "AC098613.1" "AC099778.1" -## [466] "AC134772.1" "AC137630.2" "AC137630.1" "AC137630.3" "QARS" "AC105935.1" "U73166.1" "AC096920.1" "AC006252.1" "AC006254.1" "AC096887.1" "AC012467.2" "AC135507.1" "AC012557.2" "AC109587.1" -## [481] "AC097634.3" "AC097634.1" "AC104435.2" "AC107204.1" "AC069222.1" "AC128688.2" "AC074044.1" "AC078785.1" "AC093010.2" "AC073352.1" "AC073352.2" "AC092910.3" "AC083798.2" "AC092903.2" "AC079848.1" -## [496] "AC112484.3" "AC112484.1" "AC108673.2" "AC108673.3" "AC107027.1" "AC117382.2" "AC096992.2" "AC097103.2" "AC026304.1" "AC020636.1" "AC108718.1" "AC084036.1" "AC106707.1" "AC080013.6" "AC080013.1" -## [511] "AC080013.5" "AC069224.1" "AC112491.1" "AC078802.1" "AC078795.1" "AC078795.2" "AC008040.1" "AC008040.5" "AC007823.1" "AC007620.2" "AC125618.1" "AC069431.1" "AC092953.2" "AC131235.3" "AC068631.1" -## [526] "AC112907.3" "AC069213.1" "AC233280.1" "AC055764.2" "AC055764.1" "AC107464.1" "AC107464.3" "AC139887.4" "AC139887.2" "AC139887.1" "AC092535.4" "AC147067.1" "AC016773.1" "AL158068.2" "AC105415.1" -## [541] "AC093323.1" "AC097382.2" "AC104825.1" "AC105345.1" "AC099550.1" "AC116651.1" "AC006160.1" "AC133961.1" "AC106047.1" "AC104078.1" "AC108471.2" "AC095057.2" "AC111006.1" "AC096734.2" "AC096586.1" -## [556] "AC107068.1" "AC027271.1" "AC110792.3" "AC068620.1" "AC068620.2" "AC104806.2" "AC053527.1" "AC093677.2" "AC098818.2" "AC124016.2" "AC124016.1" "AC093827.5" "AC093827.4" "AC083829.2" "AC019077.1" -## [571] "AC114811.2" "AC019131.2" "AP002026.1" "AC097460.1" "AP001816.1" "AC098487.1" "AF213884.3" "AC018797.2" "AC004069.1" "AC109361.1" "AC096564.1" "AC096564.2" "AC004067.1" "AC126283.1" "AC109347.1" -## [586] "AC106864.1" "AC110079.2" "AC108866.1" "AC108062.1" "AC096711.3" "AC097376.2" "AC112236.2" "AC096733.2" "AC097504.2" "AC139720.1" "AC104596.1" "AC093835.1" "AC095055.1" "AC097375.1" "AC106882.1" -## [601] "AC105285.1" "AC097534.1" "AC097534.2" "AC019163.1" "AC078881.1" "AC107214.1" "AC084871.2" "AC106897.1" "AC097652.1" "AC021087.1" "AC106772.1" "AC026740.1" "AC116351.1" "AC026412.3" "AC091965.1" -## [616] "AC091965.4" "AC034229.4" "AC012640.2" "AC010491.1" "AC022113.1" "AC022113.2" "AC026785.2" "AC025181.2" "AC034231.1" "AC026801.2" "AC137810.1" "AC008957.1" "AC026741.1" "AC008875.1" "AC025171.3" -## [631] "AC025171.5" "AC025171.2" "AC025171.4" "AC114947.2" "AC114956.1" "AC093297.2" "AC008966.1" "AC008914.1" "AC008892.1" "AC016596.1" "AC008937.1" "AC008937.3" "AC092343.1" "AC104113.1" "AC092354.2" -## [646] "AC025442.1" "AC024581.1" "AC010280.2" "AC146944.4" "AC035140.1" "AC008972.2" "AC099522.2" "AC010501.1" "AC116337.3" "AC008897.2" "AC010245.2" "AC026725.1" "AC012636.1" "AC010260.1" "AC008771.1" -## [661] "AC008434.1" "AC104118.1" "AC018754.1" "AC123595.1" "AC008840.1" "AC020900.1" "AC008906.1" "AC008906.2" "AC009126.1" "AC008522.1" "AC021086.1" "AC008467.1" "AC008494.3" "AC010226.1" "AC008549.2" -## [676] "AC034236.2" "AC008629.1" "AC093535.1" "AC011416.3" "AC116366.3" "AC116366.2" "AC116366.1" "AC004775.1" "AC010240.3" "AC008608.2" "AC104109.3" "AC104109.2" "AC109454.2" "AC026691.1" "AC106791.1" -## [691] "AC113382.1" "AC011405.1" "AC135457.1" "AC008667.1" "AC008438.1" "AC005618.1" "AC008781.1" "AC091959.3" "AC091948.1" "AC131025.1" "AC008641.1" "AC011337.1" "AC011374.2" "AC091982.3" "AC009185.1" -## [706] "AC008676.1" "AC134043.1" "AC008691.1" "AC113414.1" "AC034199.1" "AC022217.3" "AC008429.1" "AC008429.3" "AC008378.1" "AC139491.1" "AC139491.7" "AC139795.3" "AC139795.2" "AC106795.2" "AC136604.2" -## [721] "AC136604.3" "AC008393.1" "AC008610.1" "AC008443.5" "AC008443.6" "AC138035.1" "AL357054.4" "AL357054.2" "AL133351.1" "AL031963.3" "AL138831.1" "AL162718.1" "AL359643.3" "AL136307.1" "AL024498.1" -## [736] "AL157373.2" "AL008729.1" "AL008729.2" "AL441883.1" "AL109914.1" "AL138720.1" "AL136162.1" "AL137003.1" "AL137003.2" "AL138724.1" "AL031775.1" "U91328.2" "U91328.1" "HIST1H2BC" "AL353759.1" -## [751] "HIST1H2BE" "AL031777.3" "HIST1H2BF" "HIST1H2BG" "HIST1H2BI" "AL121936.1" "AL513548.3" "AL513548.1" "AL009179.1" "AL121944.1" "AL358933.1" "AL049543.1" "AL645929.2" "AL662797.1" "AL662844.4" -## [766] "AL645933.2" "AL662899.2" "AL662884.4" "AL662796.1" "AL669918.1" "AL645941.3" "AL645940.1" "AL451165.2" "Z84485.1" "AL353597.1" "AL121574.1" "AL365205.1" "AL513008.1" "AL035587.1" "AL133375.1" -## [781] "AL355802.2" "AL096865.1" "AL035701.1" "AL355353.1" "AL021368.1" "AL021368.2" "AL135905.1" "AL354719.2" "AL391807.1" "AL365226.2" "AL365226.1" "AC019205.1" "AL121972.1" "AL590428.1" "AL080250.1" -## [796] "AL359715.3" "AL359715.1" "AL359715.2" "AL139274.2" "AL049697.1" "AL353135.1" "AL096678.1" "AL121787.1" "AL132996.1" "AL589740.1" "AL513550.1" "AL137784.2" "AL133338.1" "AL133406.3" "AL133406.2" -## [811] "AL024507.2" "AL390208.1" "AL359711.2" "AL512303.1" "AC002464.1" "AL080317.1" "AL080317.2" "AL080317.3" "Z97989.1" "AL050331.2" "AL365275.1" "AL096711.2" "AL590006.1" "AL355581.1" "AL353596.1" -## [826] "AL024508.2" "AL356234.2" "AL357060.1" "AL590617.2" "AL031772.1" "AL158850.1" "AL035446.1" "AL023806.1" "AL356599.1" "AL031056.1" "AL358852.1" "AL355312.4" "AL080276.2" "AL590867.1" "AL596202.1" -## [841] "AL355297.3" "AL355297.4" "AL391863.1" "AL035634.1" "AL627422.2" "AL035530.2" "AL356417.3" "AL356417.1" "AL139393.2" "AL022069.1" "AL159163.1" "Z94721.1" "AL121935.1" "AL121935.2" "BX322234.1" -## [856] "AL109910.2" "AC093627.5" "AC093627.4" "AC147651.1" "AC147651.4" "AC073957.3" "AC073957.2" "AC091729.3" "AC102953.2" "AC104129.1" "AC004906.1" "AC024028.1" "AC092171.4" "AC092171.3" "AC073343.2" -## [871] "AC004982.2" "AC004948.1" "AC006042.4" "AC006042.2" "AC007029.1" "AC073333.1" "AC073332.1" "AC004130.1" "AC073072.1" "AC005082.1" "MPP6" "AC005165.1" "AC004520.1" "AC004540.1" "AC004593.3" -## [886] "AC007285.1" "AC007036.1" "AC005154.6" "GARS" "AC018647.2" "AC007349.3" "AC006033.2" "AC011290.1" "AC072061.1" "AC004951.1" "AC004854.2" "AC073115.2" "AC020743.1" "AC074351.1" "AC118758.3" -## [901] "AC092634.3" "AC073349.1" "AC068533.4" "AC008267.5" "AC027644.3" "AC073335.2" "AC006480.2" "AC211476.2" "AC004921.1" "AC005076.1" "AC003991.2" "AC002383.1" "AC007566.1" "AC002454.1" "AC002451.1" -## [916] "AC004834.1" "AC004922.1" "AC073842.2" "AC073842.1" "AC092849.1" "AC069281.1" "AC105446.1" "AC006329.1" "AC093668.3" "AC005064.1" "AC007384.1" "AC005070.3" "AC073073.2" "AC007032.1" "AC004492.1" -## [931] "AC004839.1" "AC002467.1" "AC005046.1" "AC006333.2" "AC073934.1" "AC090114.2" "AC018638.7" "AC078846.1" "AC073320.1" "AC087071.1" "AC007938.2" "AC007938.3" "AC016831.7" "AC016831.1" "AC016831.5" -## [946] "AC058791.1" "AC008264.2" "AC009275.1" "AC083862.2" "AC009542.1" "AC024084.1" "AC078845.1" "AC083880.1" "AC004918.1" "AC093673.1" "AC004889.1" "AC074386.1" "AC005229.4" "AC073314.1" "AC004877.1" -## [961] "AC092681.3" "AC005586.1" "AC073111.5" "AC021097.1" "AC006017.1" "AC093726.2" "AC144652.1" "AC009403.1" "AC005534.1" "AC005481.1" "AC011899.2" "AC011899.3" "AL732314.6" "AL732314.4" "AL672277.1" -## [976] "AL683807.1" "BX890604.1" "AC073529.1" "AC131011.1" "CXorf21" "AC108879.1" "AC234772.3" "AC234772.2" "AF196972.1" "AC115618.1" "AC231533.1" "AC232271.1" "AC233976.1" "AL445472.1" "AL354793.1" -## [991] "AL022157.1" "AL034397.2" "AL034397.3" "AL590764.1" "AL353804.1" "Z68871.1" "AL121601.1" "AL683813.1" "AL078639.1" "AC244197.3" -## [ reached getOption("max.print") -- omitted 2569 entries ] -## [1] "they are added to the alias annotation table, so they don't get lost" -## [1] "following are the official gene symbols of input aliases: " -## symbol alias -## 1 AARS1 AARS -## 2 ABITRAM FAM206A -## 3 ACP3 ACPP -## 4 ACTN1-DT ACTN1-AS1 -## 5 ADPRS ADPRHL2 -## 6 ADSS1 ADSSL1 -## 7 ADSS2 ADSS -## 8 ANKRD20A2P ANKRD20A2 -## 9 ANKRD20A3P ANKRD20A3 -## 10 ANKRD20A4P ANKRD20A4 -## 11 ANKRD40CL LINC00483 -## 12 ANTKMT FAM173A -## 13 AOPEP C9orf3 -## 14 ARLNC1 LINC02170 -## 15 ATP5MJ ATP5MPL -## 16 ATP5MK ATP5MD -## 17 ATPSCKMT FAM173B -## 18 B3GALT9 B3GNT10 -## 19 BBLN C9orf16 -## 20 BMERB1 C16orf45 -## 21 BPNT2 IMPAD1 -## 22 BRME1 C19orf57 -## 23 CARNMT1-AS1 C9orf41-AS1 -## 24 CARS1 CARS -## 25 CARS1-AS1 CARS-AS1 -## 26 CBY2 SPERT -## 27 CCDC200 LINC00854 -## 28 CCDC28A-AS1 GVQW2 -## 29 CCN2 CTGF -## 30 CCN3 NOV -## 31 CCNP CNTD2 -## 32 CDIN1 C15orf41 -## 33 CENATAC CCDC84 -## 34 CEP20 FOPNL -## 35 CEP43 FGFR1OP -## 36 CERT1 COL4A3BP -## 37 CFAP20DC C3orf67 -## 38 CFAP251 WDR66 -## 39 CFAP410 C21orf2 -## 40 CFAP91 MAATS1 -## 41 CFAP92 KIAA1257 -## 42 CIAO2A FAM96A -## 43 CIAO2B FAM96B -## 44 CIAO3 NARFL -## 45 CIBAR1 FAM92A -## 46 CIBAR2 FAM92B -## 47 CILK1 ICK -## 48 CLLU1-AS1 CLLU1OS -## 49 COA8 APOPT1 -## 50 CRACD KIAA1211 -## 51 CRACDL KIAA1211L -## 52 CRPPA ISPD -## 53 CYRIA FAM49A -## 54 CYRIB FAM49B -## 55 CZIB C1orf123 -## 56 DARS1 DARS -## 57 DARS1-AS1 DARS-AS1 -## 58 DENND10 FAM45A -## 59 DENND11 KIAA1147 -## 60 DENND2B ST5 -## 61 DIPK1A FAM69A -## 62 DIPK1B FAM69B -## 63 DIPK2A C3orf58 -## 64 DMAC2L ATP5S -## 65 DNAAF10 WDR92 -## 66 DNAAF11 LRRC6 -## 67 DNAAF8 C16orf71 -## 68 DNAAF9 C20orf194 -## 69 DNAI3 WDR63 -## 70 DNAI4 WDR78 -## 71 DNAI7 CASC1 -## 72 DOCK8-AS1 C9orf66 -## 73 DOP1A DOPEY1 -## 74 DOP1B DOPEY2 -## 75 DUSP29 DUPD1 -## 76 DYNC2I1 WDR60 -## 77 DYNC2I2 WDR34 -## 78 DYNLT2 TCTE3 -## 79 DYNLT2B TCTEX1D2 -## 80 DYNLT4 TCTEX1D4 -## 81 DYNLT5 TCTEX1D1 -## 82 ECRG4 C2orf40 -## 83 ELAPOR1 KIAA1324 -## 84 ELAPOR2 KIAA1324L -## 85 EOLA1 CXorf40A -## 86 EOLA2 CXorf40B -## 87 EPRS1 EPRS -## 88 EZHIP CXorf67 -## 89 FAM153CP FAM153C -## 90 FAM174C C19orf24 -## 91 FAM86C1P FAM86C1 -## 92 FCSK FUK -## 93 FHIP1A FAM160A1 -## 94 FHIP1A-DT FAM160A1-DT -## 95 FHIP1B FAM160A2 -## 96 FHIP2A FAM160B1 -## 97 FHIP2B FAM160B2 -## 98 FLACC1 ALS2CR12 -## 99 FMC1-LUC7L2 C7orf55-LUC7L2 -## 100 GARRE1 KIAA0355 -## 101 GASK1A FAM198A -## 102 GASK1B FAM198B -## 103 GASK1B-AS1 FAM198B-AS1 -## 104 GCNT4 LINC01336 -## 105 GDF5-AS1 GDF5OS -## 106 GET1 WRB -## 107 GET3 ASNA1 -## 108 GFUS TSTA3 -## 109 GOLM2 CASC4 -## 110 H1-0 H1F0 -## 111 H1-1 HIST1H1A -## 112 H1-10 H1FX -## 113 H1-10-AS1 H1FX-AS1 -## 114 H1-2 HIST1H1C -## 115 H1-3 HIST1H1D -## 116 H1-4 HIST1H1E -## 117 H1-5 HIST1H1B -## 118 H1-6 HIST1H1T -## 119 H2AB1 H2AFB1 -## 120 H2AC11 HIST1H2AG -## 121 H2AC12 HIST1H2AH -## 122 H2AC13 HIST1H2AI -## 123 H2AC14 HIST1H2AJ -## 124 H2AC15 HIST1H2AK -## 125 H2AC16 HIST1H2AL -## 126 H2AC17 HIST1H2AM -## 127 H2AC18 HIST2H2AA3 -## 128 H2AC19 HIST2H2AA4 -## 129 H2AC20 HIST2H2AC -## 130 H2AC21 HIST2H2AB -## 131 H2AC4 HIST1H2AB -## 132 H2AC6 HIST1H2AC -## 133 H2AC8 HIST1H2AE -## 134 H2AJ H2AFJ -## 135 H2AW HIST3H2A -## 136 H2AX H2AFX -## 137 H2AZ1 H2AFZ -## 138 H2AZ2 H2AFV -## 139 H2BC11 HIST1H2BJ -## 140 H2BC12 HIST1H2BK -## 141 H2BC13 HIST1H2BL -## 142 H2BC14 HIST1H2BM -## 143 H2BC15 HIST1H2BN -## 144 H2BC17 HIST1H2BO -## 145 H2BC18 HIST2H2BF -## 146 H2BC21 HIST2H2BE -## 147 H2BC3 HIST1H2BB -## 148 H2BC5 HIST1H2BD -## 149 H2BC9 HIST1H2BH -## 150 H2BU1 HIST3H2BB -## 151 H3-2 HIST2H3PS2 -## 152 H3-3A H3F3A -## 153 H3-3B H3F3B -## 154 H3-5 H3F3C -## 155 H3C1 HIST1H3A -## 156 H3C10 HIST1H3H -## 157 H3C11 HIST1H3I -## 158 H3C12 HIST1H3J -## 159 H3C13 HIST2H3D -## 160 H3C2 HIST1H3B -## 161 H3C3 HIST1H3C -## 162 H3C6 HIST1H3E -## 163 H3C7 HIST1H3F -## 164 H3C8 HIST1H3G -## 165 H4-16 HIST4H4 -## 166 H4C1 HIST1H4A -## 167 H4C11 HIST1H4J -## 168 H4C13 HIST1H4L -## 169 H4C14 HIST2H4A -## 170 H4C15 HIST2H4B -## 171 H4C2 HIST1H4B -## 172 H4C3 HIST1H4C -## 173 H4C4 HIST1H4D -## 174 H4C5 HIST1H4E -## 175 H4C6 HIST1H4F -## 176 H4C8 HIST1H4H -## 177 H4C9 HIST1H4I -## 178 HARS1 HARS -## 179 HEXD HEXDC -## 180 HROB C17orf53 -## 181 HSDL2-AS1 C9orf147 -## 182 IARS1 IARS -## 183 IFTAP C11orf74 -## 184 ILRUN C6orf106 -## 185 IRAG1 MRVI1 -## 186 IRAG1-AS1 MRVI1-AS1 -## 187 IRAG2 LRMP -## 188 ITPRID2 SSFA2 -## 189 KARS1 KARS -## 190 KASH5 CCDC155 -## 191 KATNIP KIAA0556 -## 192 KICS2 C12orf66 -## 193 KIFBP KIF1BP -## 194 KRT10-AS1 TMEM99 -## 195 LARS1 LARS -## 196 LDAF1 TMEM159 -## 197 LINC02481 LINC002481 -## 198 LINC02693 C17orf51 -## 199 LINC02694 C15orf53 -## 200 LINC02870 C10orf91 -## 201 LINC02875 C17orf82 -## 202 LINC02899 C5orf17 -## 203 LINC02901 C6orf99 -## 204 LINC02908 C9orf139 -## 205 LINC02909 C12orf77 -## 206 LINC02910 C20orf197 -## 207 LINC02913 C9orf106 -## 208 LNCTAM34A LINC01759 -## 209 LRATD2 FAM84B -## 210 LTO1 ORAOV1 -## 211 MACIR C5orf30 -## 212 MACROH2A1 H2AFY -## 213 MACROH2A2 H2AFY2 -## 214 MAILR AZIN1-AS1 -## 215 MARCHF1 MARCH1 -## 216 MARCHF10 MARCH10 -## 217 MARCHF2 MARCH2 -## 218 MARCHF3 MARCH3 -## 219 MARCHF4 MARCH4 -## 220 MARCHF5 MARCH5 -## 221 MARCHF6 MARCH6 -## 222 MARCHF7 MARCH7 -## 223 MARCHF8 MARCH8 -## 224 MARCHF9 MARCH9 -## 225 MEAK7 TLDC1 -## 226 METTL25B RRNAD1 -## 227 MICOS10 MINOS1 -## 228 MICOS13 C19orf70 -## 229 MIDEAS ELMSAN1 -## 230 MINAR1 KIAA1024 -## 231 MIR3667HG C22orf34 -## 232 MIR9-1HG C1orf61 -## 233 MIX23 CCDC58 -## 234 MMUT MUT -## 235 MROCKI LINC01268 -## 236 MRTFA MKL1 -## 237 MRTFB MKL2 -## 238 MTARC1 MARC1 -## 239 MTARC2 MARC2 -## 240 MTLN SMIM37 -## 241 MTRES1 C6orf203 -## 242 MTRFR C12orf65 -## 243 MTSS2 MTSS1L -## 244 MYG1 C12orf10 -## 245 NARS1 NARS -## 246 NCBP2AS2 NCBP2-AS2 -## 247 NDUFV1-DT C11orf72 -## 248 NIBAN1 FAM129A -## 249 NIBAN2 FAM129B -## 250 NIBAN3 FAM129C -## 251 NOPCHAP1 C12orf45 -## 252 NTAQ1 WDYHV1 -## 253 NUP42 NUPL2 -## 254 OBI1 RNF219 -## 255 OBI1-AS1 RNF219-AS1 -## 256 ODAD1 CCDC114 -## 257 ODAD2 ARMC4 -## 258 ODAD3 CCDC151 -## 259 ODAD4 TTC25 -## 260 PABIR1 FAM122A -## 261 PABIR2 FAM122B -## 262 PABIR3 FAM122C -## 263 PACC1 TMEM206 -## 264 PALM2AKAP2 AKAP2 -## 265 PALM2AKAP2 PALM2-AKAP2 -## 266 PALS1 MPP5 -## 267 PAXIP1-DT PAXIP1-AS1 -## 268 PEDS1 TMEM189 -## 269 PEDS1-UBE2V1 TMEM189-UBE2V1 -## 270 PELATON SMIM25 -## 271 PGAP4 TMEM246 -## 272 PGAP6 TMEM8A -## 273 PHAF1 C16orf70 -## 274 PIK3IP1-DT PIK3IP1-AS1 -## 275 PITX1-AS1 C5orf66 -## 276 PLA2G2C UBXN10-AS1 -## 277 PLAAT1 HRASLS -## 278 PLAAT2 HRASLS2 -## 279 PLAAT3 PLA2G16 -## 280 PLAAT4 RARRES3 -## 281 PLAAT5 HRASLS5 -## 282 PLEKHG7 C12orf74 -## 283 POGLUT2 KDELC1 -## 284 POGLUT3 KDELC2 -## 285 POLR1F TWISTNB -## 286 POLR1G CD3EAP -## 287 POLR1H ZNRD1 -## 288 PPP1R13B-DT LINC00637 -## 289 PRANCR LINC01481 -## 290 PRECSIT LINC00346 -## 291 PRORP KIAA0391 -## 292 PRSS42P PRSS42 -## 293 PRXL2A FAM213A -## 294 PRXL2B FAM213B -## 295 PRXL2C AAED1 -## 296 PSME3IP1 FAM192A -## 297 RAB30-DT RAB30-AS1 -## 298 RADX CXorf57 -## 299 RAMAC RAMMET -## 300 RARS1 RARS -## 301 RBIS C8orf59 -## 302 RELCH KIAA1468 -## 303 RESF1 KIAA1551 -## 304 RO60 TROVE2 -## 305 RPL34-DT RPL34-AS1 -## 306 RRM2 C2orf48 -## 307 RSKR SGK494 -## 308 RUSF1 C16orf58 -## 309 SANBR KIAA1841 -## 310 SCAT1 LINC02081 -## 311 SEPTIN1 SEPT1 -## 312 SEPTIN10 SEPT10 -## 313 SEPTIN11 SEPT11 -## 314 SEPTIN3 SEPT3 -## 315 SEPTIN4 SEPT4 -## 316 SEPTIN4-AS1 SEPT4-AS1 -## 317 SEPTIN5 SEPT5 -## 318 SEPTIN6 SEPT6 -## 319 SEPTIN7 SEPT7 -## 320 SEPTIN7-DT SEPT7-AS1 -## 321 SEPTIN8 SEPT8 -## 322 SEPTIN9 SEPT9 -## 323 SHFL C19orf66 -## 324 SHOC1 C9orf84 -## 325 SLC49A4 DIRC2 -## 326 SLC66A1 PQLC2 -## 327 SLC66A2 PQLC1 -## 328 SLC66A3 PQLC3 -## 329 SMC5-DT SMC5-AS1 -## 330 SMIM43 TMEM155 -## 331 SNHG30 LINC02001 -## 332 SNHG32 C6orf48 -## 333 SPRING1 C12orf49 -## 334 SSPOP SSPO -## 335 STAM-DT STAM-AS1 -## 336 STEEP1 CXorf56 -## 337 STIMATE-MUSTN1 TMEM110-MUSTN1 -## 338 STING1 TMEM173 -## 339 STX17-DT STX17-AS1 -## 340 TAFA1 FAM19A1 -## 341 TAFA2 FAM19A2 -## 342 TAMALIN GRASP -## 343 TARS1 TARS -## 344 TARS3 TARSL2 -## 345 TASOR FAM208A -## 346 TASOR2 FAM208B -## 347 TBC1D29P TBC1D29 -## 348 TIMM23B-AGAP6 LINC00843 -## 349 TLCD3A FAM57A -## 350 TLCD3B FAM57B -## 351 TLCD4 TMEM56 -## 352 TLCD5 TMEM136 -## 353 TLE5 AES -## 354 TM4SF19-DYNLT2B TM4SF19-TCTEX1D2 -## 355 TMCC1-DT TMCC1-AS1 -## 356 TOLLIP-DT TOLLIP-AS1 -## 357 TRAPPC14 C7orf43 -## 358 TUBB8B TUBB8P12 -## 359 USP27X-DT USP27X-AS1 -## 360 USP46-DT USP46-AS1 -## 361 VARS1 VARS -## 362 WARS1 WARS -## 363 YAE1 YAE1D1 -## 364 YARS1 YARS -## 365 YJU2B CCDC130 -## 366 YTHDF3-DT YTHDF3-AS1 -## 367 ZFTA C11orf95 -## 368 ZNF22-AS1 C10orf25 -## 369 ZNF407-AS1 LINC00909 -## 370 ZNF516-DT C18orf65 -## 371 ZNF582-DT ZNF582-AS1 -## 372 ZNF875 HKR1 -## 373 ZNRD2 SSSCA1 -``` - -## Step 0.3: Prepare settings of the MultiNicheNet cell-cell communication analysis + sce = readRDS(url( + "https://zenodo.org/record/8010790/files/sce_subset_misc.rds" + )) + sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE() + +## Prepare the settings of the MultiNicheNet cell-cell communication analysis + +In this step, we will formalize our research question into MultiNicheNet +input arguments. ### Define in which metadata columns we can find the **group**, **sample** and **cell type** IDs In this case study, we want to study differences in cell-cell communication patterns between MIS-C patients (M), their healthy siblings (S) and adult patients with severe covid (A). The meta data -columns that indicate this disease status is `MIS.C.AgeTier`. +columns that indicate this disease status(=group/condition of interest) +is `MIS.C.AgeTier`. Cell type annotations are indicated in the `Annotation_v2.0` column, and the sample is indicated by the `ShortID` column. If your cells are -annotated in multiple hierarchical levels, we recommend using a high -level in the hierarchy. This for 2 reasons: 1) MultiNicheNet focuses on -differential expression and not differential abundance, and 2) there -should be sufficient cells per sample-celltype combination. +annotated in multiple hierarchical levels, we recommend using a +relatively high level in the hierarchy. This for 2 reasons: 1) +MultiNicheNet focuses on differential expression and not differential +abundance, and 2) there should be sufficient cells per sample-celltype +combination (see later). + + sample_id = "ShortID" + group_id = "MIS.C.AgeTier" + celltype_id = "Annotation_v2.0" + +**Important**: It is required that each sample-id is uniquely assigned +to only one condition/group of interest. See the vignettes about paired +and multifactorial analysis to see how to define your analysis input +when you have multiple samples (and conditions) per patient. If you would have batch effects or covariates you can correct for, you -can define this here as well. +can define this here as well. However, this is not applicable to this +dataset. Therefore we will use the following NA settings: + + covariates = NA + batches = NA -Important: for categorical covariates and batches, there should be at -least one sample for every group-batch combination. If one of your +**Important**: for categorical covariates and batches, there should be +at least one sample for every group-batch combination. If one of your groups/conditions lacks a certain level of your batch, you won’t be able to correct for the batch effect because the model is then not able to distinguish batch from group/condition effects. -Important: the column names of group, sample, cell type, batches and +**Important**: The column names of group, sample, cell type, batches and covariates should be syntactically valid (`make.names`) -Important: All group, sample, cell type, batch and covariate names +**Important**: All group, sample, cell type, batch and covariate names should be syntactically valid as well (`make.names`) (eg through `SummarizedExperiment::colData(sce)$ShortID = SummarizedExperiment::colData(sce)$ShortID %>% make.names()`) -``` r -sample_id = "ShortID" -group_id = "MIS.C.AgeTier" -celltype_id = "Annotation_v2.0" -covariates = NA -batches = NA -``` +### Define the contrasts of interest. + +Here, we want to compare each patient group to the other groups, so the +MIS-C (M) group vs healthy control siblings (S) and adult COVID19 +patients (A) (= M vs S+A) and so on. We want to know which cell-cell +communication patterns are specific for the M vs A+S group, the A vs M+S +group and the S vs A+M group. + +To perform this comparison, we need to set the following contrasts: + + contrasts_oi = c("'M-(S+A)/2','S-(M+A)/2','A-(S+M)/2'") + +**Very Important** Note the format to indicate the contrasts! This +formatting should be adhered to very strictly, and white spaces are not +allowed! Check `?get_DE_info` for explanation about how to define this +well. The most important points are that: *each contrast is surrounded +by single quotation marks *contrasts are separated by a comma without +any white space \*all contrasts together are surrounded by double +quotation marks. + +If you compare against two groups, you should divide by 2 (as +demonstrated here), if you compare against three groups, you should +divide by 3 and so on. + +For downstream visualizations and linking contrasts to their main +condition, we also need to run the following: This is necessary because +we will also calculate cell-type+condition specificity of ligands and +receptors. + + contrast_tbl = tibble(contrast = + c("M-(S+A)/2","S-(M+A)/2", "A-(S+M)/2"), + group = c("M","S","A")) + +If you want to compare only two groups (eg M vs S), you can use the +following: `contrasts_oi = c("'M-S','S-M'")` +`contrast_tbl = tibble(contrast = c("M-S","S-M"), group = c("M","S"))` + +Other vignettes will demonstrate how to formalize different types of +research questions. + +### Define the sender and receiver cell types of interest. + +If you want to focus the analysis on specific cell types (e.g. because +you know which cell types reside in the same microenvironments based on +spatial data), you can define this here. If you have sufficient +computational resources and no specific idea of cell-type +colocalzations, we recommend to consider all cell types as potential +senders and receivers. Later on during analysis of the output it is +still possible to zoom in on the cell types that interest you most, but +your analysis is not biased to them. + +Here we will consider all cell types in the data: + + senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() + receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() + sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% + c(senders_oi, receivers_oi) + ] + +In case you would have samples in your data that do not belong to one of +the groups/conditions of interest, we recommend removing them and only +keeping conditions of interst: + + conditions_keep = c("M", "S", "A") + sce = sce[, SummarizedExperiment::colData(sce)[,group_id] %in% + conditions_keep + ] + +# Running the MultiNicheNet core analysis + +Now we will run the core of a MultiNicheNet analysis. This analysis +consists of the following steps: + +- 1. Cell-type filtering: determine which cell types are sufficiently + present + +- 1. Gene filtering: determine which genes are sufficiently expressed + in each present cell type -Sender and receiver cell types also need to be defined. Both are here -all cell types in the dataset because we are interested in an All-vs-All -analysis. +- 1. Pseudobulk expression calculation: determine and normalize + per-sample pseudobulk expression levels for each expressed gene + in each present cell type -``` r -senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() -receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique() -``` +- 1. Differential expression (DE) analysis: determine which genes are + differentially expressed -If the user wants it, it is possible to use only a subset of senders and -receivers. Senders and receivers can be entirely different, but also -overlapping, or the same. If you don’t use all the cell types in your -data, we recommend to continue with a subset of your data. +- 1. Ligand activity prediction: use the DE analysis output to + predict the activity of ligands in receiver cell types and infer + their potential target genes -``` r -sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% c(senders_oi, receivers_oi)] -``` +- 1. Prioritization: rank cell-cell communication patterns through + multi-criteria prioritization -Now we will go to the first real step of the MultiNicheNet analysis +Following these steps, one can optionally \* 7. Calculate the +across-samples expression correlation between ligand-receptor pairs and +target genes \* 8. Prioritize communication patterns involving +condition-specific cell types through an alternative prioritization +scheme -# Step 1: Extract cell type abundance and expression information from receiver and sender cell types, and link this expression information for ligands of the sender cell types to the corresponding receptors of the receiver cell types +After these steps, the output can be further explored as we will +demonstrate in the “Downstream analysis of the MultiNicheNet output” +section. + +In this vignette, we will demonstrate these steps one-by-one, which +offers the most flexibility to the user to assess intermediary results. +Other vignettes will demonstrate the use of the +`multi_nichenet_analysis` wrapper function. + +## Cell-type filtering: determine which cell types are sufficiently present + +In this step we will calculate and visualize cell type abundances. This +will give an indication about which cell types will be retained in the +analysis, and which cell types will be filtered out. Since MultiNicheNet will infer group differences at the sample level for each cell type (currently via Muscat - pseudobulking + EdgeR), we need -to have sufficient cells per sample of a cell type, and this for both +to have sufficient cells per sample of a cell type, and this for all groups. In the following analysis we will set this minimum number of -cells per cell type per sample at 10 (recommended minimum). - -``` r -min_cells = 10 -``` - -Now we will calculate abundance and expression information for each cell -type / sample / group combination with the following functions. In the -output of this function, you can also find some ‘Cell type abundance -diagnostic plots’ that will the users which celltype-sample combinations -will be left out later on for DE calculation because the nr of cells is -lower than de defined minimum defined here above. If too many -celltype-sample combinations don’t pass this threshold, we recommend to -define your cell types in a more general way (use one level higher of -the cell type ontology hierarchy) (eg TH17 CD4T cells –\> CD4T cells). - -``` r -abundance_expression_info = get_abundance_expression_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network, batches = batches) -``` +cells per cell type per sample at 10. Samples that have less than +`min_cells` cells will be excluded from the analysis for that specific +cell type. + + min_cells = 10 + +We recommend using `min_cells = 10`, except for datasets with several +lowly abundant cell types of interest. For those datasets, we recommend +using `min_cells = 5`. + + abundance_info = get_abundance_info( + sce = sce, + sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, + min_cells = min_cells, + senders_oi = senders_oi, receivers_oi = receivers_oi, + batches = batches + ) First, we will check the cell type abundance diagnostic plots. @@ -667,531 +339,828 @@ combination, and indicates which combinations are removed during the DE analysis because there are less than `min_cells` in the celltype-sample combination. -``` r -abundance_expression_info$abund_plot_sample -``` + abundance_info$abund_plot_sample + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-12-1.png) -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-9-1.png) The red dotted line indicates the required minimum of cells as defined above in `min_cells`. We can see here that some sample-celltype combinations are left out. For the DE analysis in the next step, only cell types will be considered if there are at least two samples per -group with a sufficient number of cells. +group with a sufficient number of cells. But as we can see here: all +cell types will be considered for the analysis and there are no +condition-specific cell types. **Important**: Based on the cell type abundance diagnostics, we recommend users to change their analysis settings if required (eg changing cell type annotation level, batches, …), before proceeding with -the rest of the analysis. - -# Step 2: Perform genome-wide differential expression analysis of receiver and sender cell types to define DE genes between the conditions of interest. Based on this analysis, we can define the logFC/p-value of ligands in senders and receptors in receivers, and define the set of affected target genes in the receiver. - -Now we will go over to the multi-group, multi-sample differential -expression (DE) analysis (also called ‘differential state’ analysis by -the developers of Muscat). - -### Define the contrasts and covariates of interest for the DE analysis. - -Here, we want to compare each patient group to the other groups, so the -MIS-C (M) group vs healthy control siblins (S) and adult COVID19 -patients (A) etcetera. To do this comparison, we need to set the -following contrasts: - -``` r -contrasts_oi = c("'M-(S+A)/2','S-(M+A)/2','A-(S+M)/2'") -``` - -**Very Important** Note the format to indicate the contrasts! This -formatting should be adhered to very strictly, and white spaces are not -allowed! Check `?get_DE_info` for explanation about how to define this -well. The most important things are that: each contrast is surrounded by -single quotation marks, contrasts are separated by a comma without any -whitespace, and alle contrasts together are surrounded by double -quotation marks. If you compare against two groups, you should divide by -2, if you compare against three groups, you should divide by 3 etcetera. - -For downstream visualizations and linking contrasts to their main group, -you need to run the following: - -``` r -contrast_tbl = tibble(contrast = - c("M-(S+A)/2","S-(M+A)/2", "A-(S+M)/2"), - group = c("M","S","A")) -``` - -If you want to compare only two groups (eg M vs S), you can do like -this: `contrasts_oi = c("'M-S','S-M'")` -`contrast_tbl = tibble(contrast = c("M-S","S-M"), group = c("M","S"))` - -### Perform the DE analysis for each cell type. - -``` r -DE_info = get_DE_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, batches = batches, covariates = covariates, contrasts_oi = contrasts_oi, min_cells = min_cells) -``` +the rest of the analysis. If too many celltype-sample combinations don’t +pass this threshold, we recommend to define your cell types in a more +general way (use one level higher of the cell type ontology hierarchy) +(eg TH17 CD4T cells –> CD4T cells) or use `min_cells = 5` if this +would not be possible. + +### Cell type filtering based on cell type abundance information + +Running the following block of code can help you determine which cell +types are condition-specific and which cell types are absent. + + sample_group_celltype_df = abundance_info$abundance_data %>% + filter(n > min_cells) %>% + ungroup() %>% + distinct(sample_id, group_id) %>% + cross_join( + abundance_info$abundance_data %>% + ungroup() %>% + distinct(celltype_id) + ) %>% + arrange(sample_id) + + abundance_df = sample_group_celltype_df %>% left_join( + abundance_info$abundance_data %>% ungroup() + ) + + abundance_df$n[is.na(abundance_df$n)] = 0 + abundance_df$keep[is.na(abundance_df$keep)] = FALSE + abundance_df_summarized = abundance_df %>% + mutate(keep = as.logical(keep)) %>% + group_by(group_id, celltype_id) %>% + summarise(samples_present = sum((keep))) + + celltypes_absent_one_condition = abundance_df_summarized %>% + filter(samples_present == 0) %>% pull(celltype_id) %>% unique() + # find truly condition-specific cell types by searching for cell types + # truely absent in at least one condition + + celltypes_present_one_condition = abundance_df_summarized %>% + filter(samples_present >= 2) %>% pull(celltype_id) %>% unique() + # require presence in at least 2 samples of one group so + # it is really present in at least one condition + + condition_specific_celltypes = intersect( + celltypes_absent_one_condition, + celltypes_present_one_condition) + + total_nr_conditions = SummarizedExperiment::colData(sce)[,group_id] %>% + unique() %>% length() + + absent_celltypes = abundance_df_summarized %>% + filter(samples_present < 2) %>% + group_by(celltype_id) %>% + count() %>% + filter(n == total_nr_conditions) %>% + pull(celltype_id) + + print("condition-specific celltypes:") + ## [1] "condition-specific celltypes:" + print(condition_specific_celltypes) + ## character(0) + + print("absent celltypes:") + ## [1] "absent celltypes:" + print(absent_celltypes) + ## character(0) + +Absent cell types will be filtered out, condition-specific cell types +can be filtered out if you as a user do not want to run the alternative +workflow for condition-specific cell types in the optional step 8 of the +core MultiNicheNet analysis. + + analyse_condition_specific_celltypes = FALSE + + if(analyse_condition_specific_celltypes == TRUE){ + senders_oi = senders_oi %>% setdiff(absent_celltypes) + receivers_oi = receivers_oi %>% setdiff(absent_celltypes) + } else { + senders_oi = senders_oi %>% + setdiff(union(absent_celltypes, condition_specific_celltypes)) + receivers_oi = receivers_oi %>% + setdiff(union(absent_celltypes, condition_specific_celltypes)) + } + + sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% + c(senders_oi, receivers_oi) + ] + +## Gene filtering: determine which genes are sufficiently expressed in each present cell type + +Before running the DE analysis, we will determine which genes are not +sufficiently expressed and should be filtered out. We will perform gene +filtering based on a similar procedure as used in `edgeR::filterByExpr`. +However, we adapted this procedure to be more interpretable for +single-cell datasets. + +For each cell type, we will consider genes expressed if they are +expressed in at least a `min_sample_prop` fraction of samples in the +condition with the lowest number of samples. By default, we set +`min_sample_prop = 0.50`, which means that genes should be expressed in +at least 2 samples if the group with lowest nr. of samples has 4 samples +like this dataset. + + min_sample_prop = 0.50 + +But how do we define which genes are expressed in a sample? For this we +will consider genes as expressed if they have non-zero expression values +in a `fraction_cutoff` fraction of cells of that cell type in that +sample. By default, we set `fraction_cutoff = 0.05`, which means that +genes should show non-zero expression values in at least 5% of cells in +a sample. + + fraction_cutoff = 0.05 + +We recommend using these default values unless there is specific +interest in prioritizing (very) weakly expressed interactions. In that +case, you could lower the value of `fraction_cutoff`. We explicitly +recommend against using `fraction_cutoff > 0.10`. + +Now we will calculate the information required for gene filtering with +the following command: + + frq_list = get_frac_exprs( + sce = sce, + sample_id = sample_id, celltype_id = celltype_id, group_id = group_id, + batches = batches, + min_cells = min_cells, + fraction_cutoff = fraction_cutoff, min_sample_prop = min_sample_prop) + ## [1] "Samples are considered if they have more than 10 cells of the cell type of interest" + ## [1] "Genes with non-zero counts in at least 5% of cells of a cell type of interest in a particular sample will be considered as expressed in that sample." + ## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: L_NK_CD56._CD16." + ## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: L_T_TIM3._CD38._HLADR." + ## [1] "Genes expressed in at least 2 samples will considered as expressed in the cell type: M_Monocyte_CD16" + ## [1] "6621 genes are considered as expressed in the cell type: L_NK_CD56._CD16." + ## [1] "8461 genes are considered as expressed in the cell type: L_T_TIM3._CD38._HLADR." + ## [1] "8817 genes are considered as expressed in the cell type: M_Monocyte_CD16" + +Now only keep genes that are expressed by at least one cell type: + + genes_oi = frq_list$expressed_df %>% + filter(expressed == TRUE) %>% pull(gene) %>% unique() + sce = sce[genes_oi, ] + +## Pseudobulk expression calculation: determine and normalize per-sample pseudobulk expression levels for each expressed gene in each present cell type + +After filtering out absent cell types and genes, we will continue the +analysis by calculating the different prioritization criteria that we +will use to prioritize cell-cell communication patterns. + +First, we will determine and normalize per-sample pseudobulk expression +levels for each expressed gene in each present cell type. The function +`process_abundance_expression_info` will link this expression +information for ligands of the sender cell types to the corresponding +receptors of the receiver cell types. This will later on allow us to +define the cell-type specicificy criteria for ligands and receptors. + + abundance_expression_info = process_abundance_expression_info( + sce = sce, + sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, + min_cells = min_cells, + senders_oi = senders_oi, receivers_oi = receivers_oi, + lr_network = lr_network, + batches = batches, + frq_list = frq_list, + abundance_info = abundance_info) + +Normalized pseudobulk expression values per gene/celltype/sample can be +inspected by: + + abundance_expression_info$celltype_info$pb_df %>% head() + ## # A tibble: 6 × 4 + ## gene sample pb_sample celltype + ## + ## 1 A1BG A1 3.10 L_T_TIM3._CD38._HLADR. + ## 2 AAAS A1 4.01 L_T_TIM3._CD38._HLADR. + ## 3 AAGAB A1 5.54 L_T_TIM3._CD38._HLADR. + ## 4 AAK1 A1 6.40 L_T_TIM3._CD38._HLADR. + ## 5 AAMDC A1 4.57 L_T_TIM3._CD38._HLADR. + ## 6 AAMP A1 5.54 L_T_TIM3._CD38._HLADR. + +An average of these sample-level expression values per condition/group +can be inspected by: + + abundance_expression_info$celltype_info$pb_df_group %>% head() + ## # A tibble: 6 × 4 + ## # Groups: group, celltype [1] + ## group celltype gene pb_group + ## + ## 1 A L_NK_CD56._CD16. A1BG 3.91 + ## 2 A L_NK_CD56._CD16. AAAS 4.81 + ## 3 A L_NK_CD56._CD16. AAGAB 4.82 + ## 4 A L_NK_CD56._CD16. AAK1 7.03 + ## 5 A L_NK_CD56._CD16. AAMDC 3.66 + ## 6 A L_NK_CD56._CD16. AAMP 6.04 + +Inspecting these values for ligand-receptor interactions can be done by: + + abundance_expression_info$sender_receiver_info$pb_df %>% head() + ## # A tibble: 6 × 8 + ## sample sender receiver ligand receptor pb_ligand pb_receptor ligand_receptor_pb_prod + ## + ## 1 M4 M_Monocyte_CD16 M_Monocyte_CD16 B2M LILRB1 14.2 11.4 162. + ## 2 M4 L_NK_CD56._CD16. M_Monocyte_CD16 B2M LILRB1 14.0 11.4 160. + ## 3 M4 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 B2M LILRB1 13.8 11.4 158. + ## 4 M5 L_NK_CD56._CD16. M_Monocyte_CD16 B2M LILRB1 14.5 10.5 153. + ## 5 M5 M_Monocyte_CD16 M_Monocyte_CD16 B2M LILRB1 14.4 10.5 152. + ## 6 M5 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 B2M LILRB1 14.2 10.5 149. + abundance_expression_info$sender_receiver_info$pb_df_group %>% head() + ## # A tibble: 6 × 8 + ## # Groups: group, sender [6] + ## group sender receiver ligand receptor pb_ligand_group pb_receptor_group ligand_receptor_pb_prod_group + ## + ## 1 M L_NK_CD56._CD16. M_Monocyte_CD16 B2M LILRB1 14.1 10.0 141. + ## 2 M M_Monocyte_CD16 M_Monocyte_CD16 B2M LILRB1 14.1 10.0 141. + ## 3 M L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 B2M LILRB1 14.0 10.0 140. + ## 4 A L_NK_CD56._CD16. L_NK_CD56._CD16. B2M KLRD1 14.3 9.71 139. + ## 5 S L_NK_CD56._CD16. L_NK_CD56._CD16. B2M KLRD1 14.4 9.46 136. + ## 6 A L_T_TIM3._CD38._HLADR. L_NK_CD56._CD16. B2M KLRD1 13.9 9.71 135. + +## Differential expression (DE) analysis: determine which genes are differentially expressed + +In this step, we will perform genome-wide differential expression +analysis of receiver and sender cell types to define DE genes between +the conditions of interest (as formalized by the `contrasts_oi`). Based +on this analysis, we later can define the levels of differential +expression of ligands in senders and receptors in receivers, and define +the set of affected target genes in the receiver cell types (which will +be used for the ligand activity analysis). + +We will apply pseudobulking followed by EdgeR to perform multi-condition +multi-sample differential expression (DE) analysis (also called +‘differential state’ analysis by the developers of Muscat). + + DE_info = get_DE_info( + sce = sce, + sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, + batches = batches, covariates = covariates, + contrasts_oi = contrasts_oi, + min_cells = min_cells, + expressed_df = frq_list$expressed_df) + ## [1] "DE analysis is done:" + ## [1] "included cell types are:" + ## [1] "L_T_TIM3._CD38._HLADR." "L_NK_CD56._CD16." "M_Monocyte_CD16" ### Check DE results -Table with logFC and p-values for each gene-celltype-contrast: - -``` r -DE_info$celltype_de$de_output_tidy %>% arrange(p_adj) %>% head() -## # A tibble: 6 x 9 -## gene cluster_id logFC logCPM F p_val p_adj.loc p_adj contrast -## -## 1 TRBV11.2 L_T_TIM3._CD38._HLADR. 4.73 9.84 246 4.33e-13 0.00000000283 0.00000000283 M-(S+A)/2 -## 2 IGLV3.25 L_NK_CD56._CD16. 7.54 5 218 6.04e-13 0.00000000516 0.00000000516 A-(S+M)/2 -## 3 IGHV3.11 L_NK_CD56._CD16. 5.72 3.09 182 6.56e-11 0.00000028 0.00000028 A-(S+M)/2 -## 4 IGHV1.46 L_NK_CD56._CD16. 4.58 4.12 125 1.44e-10 0.000000411 0.000000411 A-(S+M)/2 -## 5 IGLV6.57 L_NK_CD56._CD16. 4.76 3.83 112 3.83e-10 0.000000817 0.000000817 A-(S+M)/2 -## 6 IGKV1.27 L_NK_CD56._CD16. 4.13 3.35 108 5.5 e-10 0.000000939 0.000000939 A-(S+M)/2 -``` - -``` r -celltype_de = DE_info$celltype_de$de_output_tidy -``` - -In the next step, we will combine the DE information of senders and -receivers by linking their ligands and receptors together: - -### Combine DE information for ligand-senders and receptors-receivers (similar to step1 - `abundance_expression_info$sender_receiver_info`) - -``` r -sender_receiver_de = combine_sender_receiver_de( - sender_de = celltype_de, - receiver_de = celltype_de, - senders_oi = senders_oi, - receivers_oi = receivers_oi, - lr_network = lr_network -) -``` - -``` r -sender_receiver_de %>% head(20) -## # A tibble: 20 x 12 -## contrast sender receiver ligand receptor lfc_ligand lfc_receptor ligand_receptor_lfc_avg p_val_ligand p_adj_ligand p_val_receptor p_adj_receptor -## -## 1 M-(S+A)/2 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 CCL3 CCR1 2.89 0.8 1.85 0.00053 0.0881 0.0568 0.308 -## 2 M-(S+A)/2 L_T_TIM3._CD38._HLADR. L_T_TIM3._CD38._HLADR. CCL3 CCR5 2.89 0.4 1.64 0.00053 0.0881 0.267 0.821 -## 3 M-(S+A)/2 L_NK_CD56._CD16. M_Monocyte_CD16 S100A8 CD69 1.68 1.53 1.60 0.031 0.428 0.00275 0.0893 -## 4 A-(S+M)/2 M_Monocyte_CD16 M_Monocyte_CD16 HLA.G KLRD1 1.46 1.6 1.53 0.00266 0.268 0.0991 0.644 -## 5 M-(S+A)/2 M_Monocyte_CD16 M_Monocyte_CD16 C1QB CD33 2.99 0.0287 1.51 0.00000638 0.0131 0.926 0.992 -## 6 A-(S+M)/2 M_Monocyte_CD16 M_Monocyte_CD16 C1QB LRP1 2.21 0.578 1.39 0.000377 0.152 0.0223 0.518 -## 7 M-(S+A)/2 M_Monocyte_CD16 M_Monocyte_CD16 SLAMF7 SLAMF7 1.38 1.38 1.38 0.0318 0.239 0.0318 0.239 -## 8 M-(S+A)/2 L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 GZMB IGF2R 2.28 0.423 1.35 0.0000633 0.0414 0.304 0.64 -## 9 M-(S+A)/2 M_Monocyte_CD16 M_Monocyte_CD16 C1QB LRP1 2.99 -0.326 1.33 0.00000638 0.0131 0.0727 0.346 -## 10 M-(S+A)/2 L_T_TIM3._CD38._HLADR. L_T_TIM3._CD38._HLADR. GZMB IGF2R 2.28 0.376 1.33 0.0000633 0.0414 0.198 0.755 -## 11 M-(S+A)/2 M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. IL18 IL18R1 1.9 0.726 1.31 0.0637 0.327 0.0692 0.55 -## 12 A-(S+M)/2 M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. LGALS3 LAG3 0.915 1.71 1.31 0.00129 0.224 0.0012 0.334 -## 13 M-(S+A)/2 L_NK_CD56._CD16. L_T_TIM3._CD38._HLADR. FAM3C ADGRG5 0.101 2.52 1.31 0.576 0.932 0.00294 0.173 -## 14 M-(S+A)/2 M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. HLA.A KLRD1 0.799 1.78 1.29 0.0000086 0.0133 0.000741 0.0969 -## 15 M-(S+A)/2 M_Monocyte_CD16 L_NK_CD56._CD16. IL18 IL18R1 1.9 0.673 1.29 0.0637 0.327 0.0275 0.414 -## 16 S-(M+A)/2 M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. RETN GPR25 -0.209 2.75 1.27 0.666 0.962 0.00000532 0.0171 -## 17 A-(S+M)/2 L_NK_CD56._CD16. M_Monocyte_CD16 SPON2 ITGAM 0.442 2.09 1.27 0.0424 0.569 0.0111 0.416 -## 18 M-(S+A)/2 M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. TNF TNFRSF1A 1.78 0.727 1.25 0.021 0.205 0.0745 0.561 -## 19 A-(S+M)/2 L_NK_CD56._CD16. M_Monocyte_CD16 ICAM2 ITGAM 0.397 2.09 1.24 0.0228 0.446 0.0111 0.416 -## 20 A-(S+M)/2 M_Monocyte_CD16 L_NK_CD56._CD16. HLA.G LILRB1 1.46 0.996 1.23 0.00266 0.268 0.00673 0.282 -``` - -# Step 3: Predict NicheNet ligand activities and NicheNet ligand-target links based on these differential expression results - -## Define the parameters for the NicheNet ligand activity analysis - -Here, we need to define the thresholds that will be used to consider -genes as differentially expressed or not (logFC, p-value, decision -whether to use adjusted or normal p-value, minimum fraction of cells -that should express a gene in at least one sample in a group, whether to -use the normal p-values or empirical p-values). - -NicheNet ligand activity will then be calculated as the enrichment of -predicted target genes of ligands in this set of DE genes compared to -the genomic background. Here we choose for a minimum logFC of 0.50, -maximum p-value of 0.05, and minimum fraction of expression of 0.05. - -``` r -logFC_threshold = 0.50 -p_val_threshold = 0.05 -fraction_cutoff = 0.05 -``` - -We will here choose for applying the p-value cutoff on the normal -p-values, and not on the p-values corrected for multiple testing. This -choice was made here because this dataset has only a few samples per -group and we might have a lack of statistical power due to -pseudobulking. In case of more samples per group, and a sufficient high -number of DE genes per group-celltype (\> 50), we would recommend using -the adjusted p-values. - -``` r -# p_val_adj = TRUE -p_val_adj = FALSE -``` - -For the NicheNet ligand-target inference, we also need to select which -top n of the predicted target genes will be considered (here: top 250 -targets per ligand). - -``` r -top_n_target = 250 -``` +Check DE output information in table with logFC and p-values for each +gene-celltype-contrast: + + DE_info$celltype_de$de_output_tidy %>% head() + ## # A tibble: 6 × 9 + ## gene cluster_id logFC logCPM F p_val p_adj.loc p_adj contrast + ## + ## 1 A1BG L_T_TIM3._CD38._HLADR. 0.127 4.46 0.175 0.68 0.984 0.984 M-(S+A)/2 + ## 2 AAAS L_T_TIM3._CD38._HLADR. -0.072 4.78 0.0678 0.797 1 1 M-(S+A)/2 + ## 3 AAGAB L_T_TIM3._CD38._HLADR. 0.492 5.01 4.73 0.04 0.481 0.481 M-(S+A)/2 + ## 4 AAK1 L_T_TIM3._CD38._HLADR. -0.521 6.83 12.7 0.00163 0.138 0.138 M-(S+A)/2 + ## 5 AAMDC L_T_TIM3._CD38._HLADR. -0.44 3.95 2.15 0.156 0.724 0.724 M-(S+A)/2 + ## 6 AAMP L_T_TIM3._CD38._HLADR. 0.224 6.04 2.95 0.235 0.806 0.806 M-(S+A)/2 + +Evaluate the distributions of p-values: + + DE_info$hist_pvals + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-26-1.png) + +These distributions look fine (uniform distribution, except peak at +p-value <= 0.05), so we will continue using these regular p-values. +In case these p-value distributions look irregular, you can estimate +empirical p-values as we will demonstrate in another vignette. + + empirical_pval = FALSE + + if(empirical_pval == TRUE){ + DE_info_emp = get_empirical_pvals(DE_info$celltype_de$de_output_tidy) + celltype_de = DE_info_emp$de_output_tidy_emp %>% select(-p_val, -p_adj) %>% + rename(p_val = p_emp, p_adj = p_adj_emp) + } else { + celltype_de = DE_info$celltype_de$de_output_tidy + } + +### Combine DE information for ligand-senders and receptors-receivers + +To end this step, we will combine the DE information of senders and +receivers by linking their ligands and receptors together based on the +prior knowledge ligand-receptor network. + + sender_receiver_de = combine_sender_receiver_de( + sender_de = celltype_de, + receiver_de = celltype_de, + senders_oi = senders_oi, + receivers_oi = receivers_oi, + lr_network = lr_network + ) + + sender_receiver_de %>% head(20) + ## # A tibble: 20 × 12 + ## contrast sender receiver ligand receptor lfc_ligand lfc_receptor ligand_receptor_lfc_…¹ p_val_ligand p_adj_ligand p_val_receptor p_adj_receptor + ## + ## 1 M-(S+A)/2 M_Mon… M_Monoc… C3 VSIG4 2.96 5.76 4.36 0.0964 0.439 0.00154 0.0923 + ## 2 M-(S+A)/2 M_Mon… M_Monoc… IL10 IL10RB 5.61 0.715 3.16 0.0052 0.142 0.0123 0.202 + ## 3 M-(S+A)/2 M_Mon… L_T_TIM… IL18 IL18RAP 1.93 4.39 3.16 0.0445 0.321 0.000199 0.056 + ## 4 M-(S+A)/2 M_Mon… L_T_TIM… IL10 IL10RB 5.61 0.184 2.90 0.0052 0.142 0.557 0.95 + ## 5 M-(S+A)/2 L_NK_… L_T_TIM… IL18 IL18RAP 1.29 4.39 2.84 0.00758 0.231 0.000199 0.056 + ## 6 M-(S+A)/2 M_Mon… L_NK_CD… IL10 IL10RA 5.61 0.0393 2.82 0.0052 0.142 0.814 0.996 + ## 7 M-(S+A)/2 M_Mon… L_T_TIM… IL10 IL10RA 5.61 -0.0612 2.77 0.0052 0.142 0.728 0.997 + ## 8 M-(S+A)/2 M_Mon… L_NK_CD… IL10 IL10RB 5.61 -0.0727 2.77 0.0052 0.142 0.78 0.987 + ## 9 M-(S+A)/2 M_Mon… M_Monoc… IL10 IL10RA 5.61 -0.316 2.65 0.0052 0.142 0.233 0.621 + ## 10 M-(S+A)/2 M_Mon… L_T_TIM… C3 ITGAM 2.96 2.2 2.58 0.0964 0.439 0.00108 0.124 + ## 11 A-(S+M)/2 L_T_T… M_Monoc… SCGB3… MARCO 4.49 0.611 2.55 0.0492 0.932 0.577 0.937 + ## 12 M-(S+A)/2 M_Mon… M_Monoc… IL1B IL1RAP 1.53 3.56 2.54 0.0145 0.216 0.00294 0.114 + ## 13 M-(S+A)/2 M_Mon… M_Monoc… THBS1 CD47 4.58 0.446 2.51 0.00784 0.166 0.0682 0.387 + ## 14 M-(S+A)/2 L_T_T… M_Monoc… TNFSF… CD163 0.484 4.54 2.51 0.282 0.84 0.00403 0.127 + ## 15 M-(S+A)/2 M_Mon… M_Monoc… THBS1 ITGA6 4.58 0.382 2.48 0.00784 0.166 0.527 0.83 + ## 16 M-(S+A)/2 M_Mon… L_NK_CD… THBS1 ITGA6 4.58 0.352 2.47 0.00784 0.166 0.0844 0.524 + ## 17 M-(S+A)/2 M_Mon… L_NK_CD… THBS1 ITGA4 4.58 0.283 2.43 0.00784 0.166 0.0206 0.341 + ## 18 M-(S+A)/2 M_Mon… L_T_TIM… THBS1 ITGA6 4.58 0.263 2.42 0.00784 0.166 0.321 0.869 + ## 19 M-(S+A)/2 L_T_T… M_Monoc… HMGB1 CD163 0.226 4.54 2.38 0.111 0.649 0.00403 0.127 + ## 20 M-(S+A)/2 M_Mon… M_Monoc… HMGB1 CD163 0.154 4.54 2.35 0.287 0.673 0.00403 0.127 + ## # ℹ abbreviated name: ¹​ligand_receptor_lfc_avg + +## Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes + +In this step, we will predict NicheNet ligand activities and NicheNet +ligand-target links based on these differential expression results. We +do this to prioritize interactions based on their predicted effect on a +receiver cell type. We will assume that the most important +group-specific interactions are those that lead to group-specific gene +expression changes in a receiver cell type. + +Similarly to base NicheNet (), we +use the DE output to create a “geneset of interest”: here we assume that +DE genes within a cell type may be DE because of differential cell-cell +communication processes. In the ligand activity prediction, we will +assess the enrichment of target genes of ligands within this geneset of +interest. In case high-probabiliy target genes of a ligand are enriched +in this set compared to the background of expressed genes, we predict +that this ligand may have a high activity. + +Because the ligand activity analysis is an enrichment procedure, it is +important that this geneset of interest should contain a sufficient but +not too large number of genes. The ratio geneset\_oi/background should +ideally be between 1/200 and 1/10 (or close to these ratios). + +To determine the genesets of interest based on DE output, we need to +define some logFC and/or p-value thresholds per cell type/contrast +combination. In general, we recommend inspecting the nr. of DE genes for +all cell types based on the default thresholds and adapting accordingly. +By default, we will apply the p-value cutoff on the normal p-values, and +not on the p-values corrected for multiple testing. This choice was made +because most multi-sample single-cell transcriptomics datasets have just +a few samples per group and we might have a lack of statistical power +due to pseudobulking. But, if the smallest group >= 20 samples, we +typically recommend using p\_val\_adj = TRUE. When the biological +difference between the conditions is very large, we typically recommend +increasing the logFC\_threshold and/or using p\_val\_adj = TRUE. + +### Assess geneset\_oi-vs-background ratios for different DE output tresholds prior to the NicheNet ligand activity analysis + +We will first inspect the geneset\_oi-vs-background ratios for the +default tresholds: + + logFC_threshold = 0.50 + p_val_threshold = 0.05 + + p_val_adj = FALSE + + geneset_assessment = contrast_tbl$contrast %>% + lapply( + process_geneset_data, + celltype_de, logFC_threshold, p_val_adj, p_val_threshold + ) %>% + bind_rows() + geneset_assessment + ## # A tibble: 9 × 12 + ## cluster_id n_background n_geneset_up n_geneset_down prop_geneset_up prop_geneset_down in_range_up in_range_down contrast logFC_threshold + ## + ## 1 L_NK_CD56._CD16. 6621 162 82 0.0245 0.0124 TRUE TRUE M-(S+A)… 0.5 + ## 2 L_T_TIM3._CD38._HL… 8461 401 194 0.0474 0.0229 TRUE TRUE M-(S+A)… 0.5 + ## 3 M_Monocyte_CD16 8817 647 438 0.0734 0.0497 TRUE TRUE M-(S+A)… 0.5 + ## 4 L_NK_CD56._CD16. 6621 150 219 0.0227 0.0331 TRUE TRUE S-(M+A)… 0.5 + ## 5 L_T_TIM3._CD38._HL… 8461 201 320 0.0238 0.0378 TRUE TRUE S-(M+A)… 0.5 + ## 6 M_Monocyte_CD16 8817 368 254 0.0417 0.0288 TRUE TRUE S-(M+A)… 0.5 + ## 7 L_NK_CD56._CD16. 6621 118 110 0.0178 0.0166 TRUE TRUE A-(S+M)… 0.5 + ## 8 L_T_TIM3._CD38._HL… 8461 225 150 0.0266 0.0177 TRUE TRUE A-(S+M)… 0.5 + ## 9 M_Monocyte_CD16 8817 262 464 0.0297 0.0526 TRUE TRUE A-(S+M)… 0.5 + ## # ℹ 2 more variables: p_val_threshold , adjusted + +We can see here that for all cell type / contrast combinations, all +geneset/background ratio’s are within the recommended range +(`in_range_up` and `in_range_down` columns). When these +geneset/background ratio’s would not be within the recommended ranges, +we should interpret ligand activity results for these cell types with +more caution, or use different thresholds (for these or all cell types). + +For the sake of demonstration, we will also calculate these ratio’s in +case we would use the adjusted p-value as threshold. + + geneset_assessment_adjustedPval = contrast_tbl$contrast %>% + lapply( + process_geneset_data, + celltype_de, logFC_threshold, p_val_adj = TRUE, p_val_threshold + ) %>% + bind_rows() + geneset_assessment_adjustedPval + ## # A tibble: 9 × 12 + ## cluster_id n_background n_geneset_up n_geneset_down prop_geneset_up prop_geneset_down in_range_up in_range_down contrast logFC_threshold + ## + ## 1 L_NK_CD56._CD16. 6621 7 0 0.00106 0 FALSE FALSE M-(S+A)… 0.5 + ## 2 L_T_TIM3._CD38._HL… 8461 15 5 0.00177 0.000591 FALSE FALSE M-(S+A)… 0.5 + ## 3 M_Monocyte_CD16 8817 25 11 0.00284 0.00125 FALSE FALSE M-(S+A)… 0.5 + ## 4 L_NK_CD56._CD16. 6621 28 50 0.00423 0.00755 FALSE TRUE S-(M+A)… 0.5 + ## 5 L_T_TIM3._CD38._HL… 8461 2 5 0.000236 0.000591 FALSE FALSE S-(M+A)… 0.5 + ## 6 M_Monocyte_CD16 8817 10 15 0.00113 0.00170 FALSE FALSE S-(M+A)… 0.5 + ## 7 L_NK_CD56._CD16. 6621 36 19 0.00544 0.00287 TRUE FALSE A-(S+M)… 0.5 + ## 8 L_T_TIM3._CD38._HL… 8461 13 1 0.00154 0.000118 FALSE FALSE A-(S+M)… 0.5 + ## 9 M_Monocyte_CD16 8817 4 3 0.000454 0.000340 FALSE FALSE A-(S+M)… 0.5 + ## # ℹ 2 more variables: p_val_threshold , adjusted + +We can see here that for most cell type / contrast combinations, the +geneset/background ratio’s are not within the recommended range. +Therefore, we will proceed with the default tresholds for the ligand +activity analysis + +### Perform the ligand activity analysis and ligand-target inference + +After the ligand activity prediction, we will also infer the predicted +target genes of these ligands in each contrast. For this ligand-target +inference procedure, we also need to select which top n of the predicted +target genes will be considered (here: top 250 targets per ligand). This +parameter will not affect the ligand activity predictions. It will only +affect ligand-target visualizations and construction of the +intercellular regulatory network during the downstream analysis. We +recommend users to test other settings in case they would be interested +in exploring fewer, but more confident target genes, or vice versa. + + top_n_target = 250 The NicheNet ligand activity analysis can be run in parallel for each receiver cell type, by changing the number of cores as defined here. -This is only recommended if you have many receiver cell type. - -``` r -verbose = TRUE -cores_system = 8 -n.cores = min(cores_system, sender_receiver_de$receiver %>% unique() %>% length()) # use one core per receiver cell type -``` - -## Run the NicheNet ligand activity analysis - -(this might take some time) - -``` r -ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings(get_ligand_activities_targets_DEgenes( - receiver_de = celltype_de, - receivers_oi = receivers_oi, - ligand_target_matrix = ligand_target_matrix, - logFC_threshold = logFC_threshold, - p_val_threshold = p_val_threshold, - p_val_adj = p_val_adj, - top_n_target = top_n_target, - verbose = verbose, - n.cores = n.cores -))) -``` - -Check the DE genes used for the activity analysis - -``` r -ligand_activities_targets_DEgenes$de_genes_df %>% head(20) -## # A tibble: 20 x 6 -## gene receiver logFC p_val p_adj contrast -## -## 1 TNFRSF14.AS1 L_NK_CD56._CD16. 0.957 0.00935 0.276 M-(S+A)/2 -## 2 ACOT7 L_NK_CD56._CD16. 0.711 0.00449 0.218 M-(S+A)/2 -## 3 KDM1A L_NK_CD56._CD16. 0.632 0.0268 0.408 M-(S+A)/2 -## 4 SMAP2 L_NK_CD56._CD16. 0.56 0.00332 0.198 M-(S+A)/2 -## 5 P3H1 L_NK_CD56._CD16. 0.58 0.00937 0.276 M-(S+A)/2 -## 6 CPT2 L_NK_CD56._CD16. 0.632 0.0256 0.401 M-(S+A)/2 -## 7 DHCR24 L_NK_CD56._CD16. 1.15 0.00182 0.165 M-(S+A)/2 -## 8 GADD45A L_NK_CD56._CD16. 0.538 0.0224 0.382 M-(S+A)/2 -## 9 S100A9 L_NK_CD56._CD16. 1.57 0.0315 0.431 M-(S+A)/2 -## 10 S100A8 L_NK_CD56._CD16. 1.68 0.031 0.428 M-(S+A)/2 -## 11 XCL1 L_NK_CD56._CD16. 0.896 0.0467 0.473 M-(S+A)/2 -## 12 SELL L_NK_CD56._CD16. 0.576 0.00131 0.15 M-(S+A)/2 -## 13 IER5 L_NK_CD56._CD16. 0.736 0.0152 0.338 M-(S+A)/2 -## 14 MIR181A1HG L_NK_CD56._CD16. 0.885 0.0153 0.339 M-(S+A)/2 -## 15 CD55 L_NK_CD56._CD16. 0.58 0.0251 0.4 M-(S+A)/2 -## 16 UTP25 L_NK_CD56._CD16. 0.697 0.0479 0.475 M-(S+A)/2 -## 17 H2AW L_NK_CD56._CD16. 0.677 0.000416 0.109 M-(S+A)/2 -## 18 RNF187 L_NK_CD56._CD16. 0.661 0.0032 0.198 M-(S+A)/2 -## 19 LPIN1 L_NK_CD56._CD16. 0.555 0.000124 0.0778 M-(S+A)/2 -## 20 POMC L_NK_CD56._CD16. 0.59 0.015 0.335 M-(S+A)/2 -``` - -Check the output of the activity analysis - -``` r -ligand_activities_targets_DEgenes$ligand_activities %>% head(20) -## # A tibble: 20 x 8 -## # Groups: receiver, contrast [1] -## ligand activity contrast target ligand_target_weight receiver direction_regulation activity_scaled -## -## 1 A2M 0.0224 M-(S+A)/2 CD55 0.00649 L_NK_CD56._CD16. up 0.823 -## 2 A2M 0.0224 M-(S+A)/2 FKBP5 0.00723 L_NK_CD56._CD16. up 0.823 -## 3 A2M 0.0224 M-(S+A)/2 FOS 0.0146 L_NK_CD56._CD16. up 0.823 -## 4 A2M 0.0224 M-(S+A)/2 GADD45A 0.0110 L_NK_CD56._CD16. up 0.823 -## 5 A2M 0.0224 M-(S+A)/2 H2AC6 0.00747 L_NK_CD56._CD16. up 0.823 -## 6 A2M 0.0224 M-(S+A)/2 H2BC12 0.00692 L_NK_CD56._CD16. up 0.823 -## 7 A2M 0.0224 M-(S+A)/2 ISG20 0.00738 L_NK_CD56._CD16. up 0.823 -## 8 A2M 0.0224 M-(S+A)/2 LMNB1 0.00699 L_NK_CD56._CD16. up 0.823 -## 9 A2M 0.0224 M-(S+A)/2 MYC 0.0199 L_NK_CD56._CD16. up 0.823 -## 10 A2M 0.0224 M-(S+A)/2 NFKB1 0.00895 L_NK_CD56._CD16. up 0.823 -## 11 A2M 0.0224 M-(S+A)/2 NFKBIZ 0.00661 L_NK_CD56._CD16. up 0.823 -## 12 A2M 0.0224 M-(S+A)/2 PPP1R15A 0.00776 L_NK_CD56._CD16. up 0.823 -## 13 A2M 0.0224 M-(S+A)/2 SLC1A5 0.00710 L_NK_CD56._CD16. up 0.823 -## 14 A2M 0.0224 M-(S+A)/2 SMAD3 0.00776 L_NK_CD56._CD16. up 0.823 -## 15 A2M 0.0224 M-(S+A)/2 SOCS3 0.00968 L_NK_CD56._CD16. up 0.823 -## 16 A2M 0.0224 M-(S+A)/2 TFRC 0.00712 L_NK_CD56._CD16. up 0.823 -## 17 A2M 0.0224 M-(S+A)/2 WARS1 0.00743 L_NK_CD56._CD16. up 0.823 -## 18 A2M 0.0224 M-(S+A)/2 ZFP36 0.00732 L_NK_CD56._CD16. up 0.823 -## 19 AANAT 0.0215 M-(S+A)/2 FKBP5 0.00514 L_NK_CD56._CD16. up 0.652 -## 20 AANAT 0.0215 M-(S+A)/2 FOS 0.00863 L_NK_CD56._CD16. up 0.652 -``` - -# Step 4: Use the information collected above to prioritize all sender-ligand—receiver-receptor pairs. - -In the 3 previous steps, we calculated expression, differential -expression and NicheNet activity information. Now we will combine these -different types of information in one prioritization scheme. - -MultiNicheNet allows the user to define the weights of the following +Using more cores will speed up the analysis at the cost of needing more +memory. This is only recommended if you have many receiver cell types of +interest. + + verbose = TRUE + cores_system = 8 + n.cores = min(cores_system, celltype_de$cluster_id %>% unique() %>% length()) + +Running the ligand activity prediction will take some time (the more +cell types and contrasts, the more time) + + ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings( + get_ligand_activities_targets_DEgenes( + receiver_de = celltype_de, + receivers_oi = intersect(receivers_oi, celltype_de$cluster_id %>% unique()), + ligand_target_matrix = ligand_target_matrix, + logFC_threshold = logFC_threshold, + p_val_threshold = p_val_threshold, + p_val_adj = p_val_adj, + top_n_target = top_n_target, + verbose = verbose, + n.cores = n.cores + ) + )) + +You can check the output of the ligand activity and ligand-target +inference here: + + ligand_activities_targets_DEgenes$ligand_activities %>% head(20) + ## # A tibble: 20 × 8 + ## # Groups: receiver, contrast [1] + ## ligand activity contrast target ligand_target_weight receiver direction_regulation activity_scaled + ## + ## 1 A2M 0.0282 M-(S+A)/2 ACOT7 0.00632 L_NK_CD56._CD16. up 0.794 + ## 2 A2M 0.0282 M-(S+A)/2 AREG 0.00638 L_NK_CD56._CD16. up 0.794 + ## 3 A2M 0.0282 M-(S+A)/2 CD55 0.00649 L_NK_CD56._CD16. up 0.794 + ## 4 A2M 0.0282 M-(S+A)/2 CD74 0.00634 L_NK_CD56._CD16. up 0.794 + ## 5 A2M 0.0282 M-(S+A)/2 FKBP5 0.00723 L_NK_CD56._CD16. up 0.794 + ## 6 A2M 0.0282 M-(S+A)/2 FOS 0.0146 L_NK_CD56._CD16. up 0.794 + ## 7 A2M 0.0282 M-(S+A)/2 GADD45A 0.0110 L_NK_CD56._CD16. up 0.794 + ## 8 A2M 0.0282 M-(S+A)/2 H2AC6 0.00747 L_NK_CD56._CD16. up 0.794 + ## 9 A2M 0.0282 M-(S+A)/2 H2BC12 0.00692 L_NK_CD56._CD16. up 0.794 + ## 10 A2M 0.0282 M-(S+A)/2 ISG20 0.00738 L_NK_CD56._CD16. up 0.794 + ## 11 A2M 0.0282 M-(S+A)/2 LMNB1 0.00699 L_NK_CD56._CD16. up 0.794 + ## 12 A2M 0.0282 M-(S+A)/2 MYC 0.0199 L_NK_CD56._CD16. up 0.794 + ## 13 A2M 0.0282 M-(S+A)/2 NFKB1 0.00895 L_NK_CD56._CD16. up 0.794 + ## 14 A2M 0.0282 M-(S+A)/2 NFKBIZ 0.00661 L_NK_CD56._CD16. up 0.794 + ## 15 A2M 0.0282 M-(S+A)/2 PPP1R15A 0.00776 L_NK_CD56._CD16. up 0.794 + ## 16 A2M 0.0282 M-(S+A)/2 SLC1A5 0.00710 L_NK_CD56._CD16. up 0.794 + ## 17 A2M 0.0282 M-(S+A)/2 SMAD3 0.00776 L_NK_CD56._CD16. up 0.794 + ## 18 A2M 0.0282 M-(S+A)/2 SOCS3 0.00968 L_NK_CD56._CD16. up 0.794 + ## 19 A2M 0.0282 M-(S+A)/2 TFRC 0.00712 L_NK_CD56._CD16. up 0.794 + ## 20 A2M 0.0282 M-(S+A)/2 WARS1 0.00743 L_NK_CD56._CD16. up 0.794 + +## Prioritization: rank cell-cell communication patterns through multi-criteria prioritization + +In the previous steps, we calculated expression, differential expression +and NicheNet ligand activity. In the final step, we will now combine all +calculated information to rank all sender-ligand—receiver-receptor pairs +according to group/condition specificity. We will use the following criteria to prioritize ligand-receptor interactions: -- Upregulation of the ligand in a sender cell type and/or upregulation - of the receptor in a receiver cell type - in the condition of - interest. : `de_ligand` and `de_receptor` -- Sufficiently high expression levels of ligand and receptor in many - samples of the same group (to mitigate the influence of outlier - samples). : `frac_exprs_ligand_receptor` -- Cell-type and condition specific expression of the ligand in the - sender cell type and receptor in the receiver cell type (to mitigate - the influence of upregulated but still relatively weakly expressed - ligands/receptors) : `exprs_ligand` and `exprs_receptor` -- High NicheNet ligand activity, to further prioritize ligand-receptor - pairs based on their predicted effect of the ligand-receptor - interaction on the gene expression in the receiver cell type : - `activity_scaled` -- High relative abundance of sender and/or receiver in the condition of - interest: `abund_sender` and `abund_receiver` (experimental feature - - not recommended to give non-zero weights for default analyses) - -The different properties of the sender-ligand—receiver-receptor pairs -can be weighted according to the user’s preference and insight in the -dataset at hand. - -## Define the prioritization weights, and prepare grouping objects - -We will set our preference for this dataset as follows - and recommend -the user to use the same weights by default: - -``` r -prioritizing_weights_DE = c("de_ligand" = 1, - "de_receptor" = 1) -prioritizing_weights_activity = c("activity_scaled" = 2) - -prioritizing_weights_expression_specificity = c("exprs_ligand" = 2, - "exprs_receptor" = 2) - -prioritizing_weights_expression_sufficiency = c("frac_exprs_ligand_receptor" = 1) - -prioritizing_weights_relative_abundance = c( "abund_sender" = 0, - "abund_receiver" = 0) -``` - -``` r -prioritizing_weights = c(prioritizing_weights_DE, - prioritizing_weights_activity, - prioritizing_weights_expression_specificity, - prioritizing_weights_expression_sufficiency, - prioritizing_weights_relative_abundance) -``` - -Make necessary grouping data frame - -``` r -sender_receiver_tbl = sender_receiver_de %>% dplyr::distinct(sender, receiver) - -metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() - -if(!is.na(batches)){ - grouping_tbl = metadata_combined[,c(sample_id, group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct() - colnames(grouping_tbl) = c("sample","group",batches) -} else { - grouping_tbl = metadata_combined[,c(sample_id, group_id)] %>% tibble::as_tibble() %>% dplyr::distinct() - colnames(grouping_tbl) = c("sample","group") -} -``` - -Crucial note: grouping_tbl: group should be the same as in the -contrast_tbl, and as in the expression info tables! Rename accordingly -if this would not be the case. If you followed the guidelines of this -tutorial closely, there should be no problem. - -## Run the prioritization - -``` r -prioritization_tables = suppressMessages(generate_prioritization_tables( - sender_receiver_info = abundance_expression_info$sender_receiver_info, - sender_receiver_de = sender_receiver_de, - ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, - contrast_tbl = contrast_tbl, - sender_receiver_tbl = sender_receiver_tbl, - grouping_tbl = grouping_tbl, - prioritizing_weights = prioritizing_weights, - fraction_cutoff = fraction_cutoff, - abundance_data_receiver = abundance_expression_info$abundance_data_receiver, - abundance_data_sender = abundance_expression_info$abundance_data_sender -)) -``` +- Upregulation of the ligand in a sender cell type and/or upregulation + of the receptor in a receiver cell type - in the condition of + interest. +- Cell-type specific expression of the ligand in the sender cell type + and receptor in the receiver cell type in the condition of interest + (to mitigate the influence of upregulated but still relatively + weakly expressed ligands/receptors). +- Sufficiently high expression levels of ligand and receptor in many + samples of the same group. +- High NicheNet ligand activity, to further prioritize ligand-receptor + pairs based on their predicted effect of the ligand-receptor + interaction on the gene expression in the receiver cell type. + +We will combine these prioritization criteria in a single aggregated +prioritization score. In the default setting, we will weigh each of +these criteria equally (`scenario = "regular"`). This setting is +strongly recommended. However, we also provide some additional setting +to accomodate different biological scenarios. The setting +`scenario = "lower_DE"` halves the weight for DE criteria and doubles +the weight for ligand activity. This is recommended in case your +hypothesis is that the differential CCC patterns in your data are less +likely to be driven by DE (eg in cases of differential migration into a +niche). The setting `scenario = "no_frac_LR_expr"` ignores the criterion +“Sufficiently high expression levels of ligand and receptor in many +samples of the same group”. This may be interesting for users that have +data with a limited number of samples and don’t want to penalize +interactions if they are not sufficiently expressed in some samples. + +Finally, we still need to make one choice. For NicheNet ligand activity +we can choose to prioritize ligands that only induce upregulation of +target genes (`ligand_activity_down = FALSE`) or can lead potentially +lead to both up- and downregulation (`ligand_activity_down = TRUE`). The +benefit of `ligand_activity_down = FALSE` is ease of interpretability: +prioritized ligand-receptor pairs will be upregulated in the condition +of interest, just like their target genes. `ligand_activity_down = TRUE` +can be harder to interpret because target genes of some interactions may +be upregulated in the other conditions compared to the condition of +interest. This is harder to interpret, but may help to pick up +interactions that can also repress gene expression. + +Here we will choose for setting `ligand_activity_down = FALSE` and focus +specifically on upregulating ligands. + + ligand_activity_down = FALSE + + sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver) + + metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble() + + if(!is.na(batches)){ + grouping_tbl = metadata_combined[,c(sample_id, group_id, batches)] %>% + tibble::as_tibble() %>% distinct() + colnames(grouping_tbl) = c("sample","group",batches) + } else { + grouping_tbl = metadata_combined[,c(sample_id, group_id)] %>% + tibble::as_tibble() %>% distinct() + colnames(grouping_tbl) = c("sample","group") + } + + prioritization_tables = suppressMessages(generate_prioritization_tables( + sender_receiver_info = abundance_expression_info$sender_receiver_info, + sender_receiver_de = sender_receiver_de, + ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, + contrast_tbl = contrast_tbl, + sender_receiver_tbl = sender_receiver_tbl, + grouping_tbl = grouping_tbl, + scenario = "regular", # all prioritization criteria will be weighted equally + fraction_cutoff = fraction_cutoff, + abundance_data_receiver = abundance_expression_info$abundance_data_receiver, + abundance_data_sender = abundance_expression_info$abundance_data_sender, + ligand_activity_down = ligand_activity_down + )) Check the output tables First: group-based summary table -``` r -prioritization_tables$group_prioritization_tbl %>% head(20) -## # A tibble: 20 x 60 -## contrast group sender recei~1 ligand recep~2 lfc_l~3 lfc_r~4 ligan~5 p_val~6 p_adj~7 p_val~8 p_adj~9 activity direc~* activ~* lr_in~* id avg_l~* avg_r~* ligan~* fract~* fract~* ligan~* rel_a~* rel_a~* sende~* -## -## 1 A-(S+M)/2 A M_Monocyt~ L_T_TI~ NAMPT ITGB1 0.633 0.835 0.734 0.0525 0.584 0.00849 0.643 1.06e-2 up 0.0376 NAMPT_~ NAMP~ 1.35 0.907 1.22 0.730 0.554 0.405 0.001 0.130 0.0656 -## 2 A-(S+M)/2 A M_Monocyt~ L_T_TI~ NAMPT ITGB1 0.633 0.835 0.734 0.0525 0.584 0.00849 0.643 2.13e-2 down 4.11 NAMPT_~ NAMP~ 1.35 0.907 1.22 0.730 0.554 0.405 0.001 0.130 0.0656 -## 3 M-(S+A)/2 M L_T_TIM3.~ M_Mono~ IFNG IFNGR1 1.26 0.507 0.884 0.0233 0.376 0.101 0.396 5.17e-2 up 8.69 IFNG_I~ IFNG~ 0.155 0.575 0.0889 0.114 0.498 0.0567 1.00 0.389 0.695 -## 4 M-(S+A)/2 M L_T_TIM3.~ M_Mono~ IFNG IFNGR1 1.26 0.507 0.884 0.0233 0.376 0.101 0.396 5.15e-3 down -2.51 IFNG_I~ IFNG~ 0.155 0.575 0.0889 0.114 0.498 0.0567 1.00 0.389 0.695 -## 5 A-(S+M)/2 A M_Monocyt~ L_T_TI~ LGALS3 LAG3 0.915 1.71 1.31 0.00129 0.224 0.0012 0.334 1.75e-2 up 1.35 LGALS3~ LGAL~ 1.80 0.978 1.76 0.835 0.499 0.417 0.001 0.130 0.0656 -## 6 A-(S+M)/2 A M_Monocyt~ L_T_TI~ LGALS3 LAG3 0.915 1.71 1.31 0.00129 0.224 0.0012 0.334 1.27e-2 down 0.886 LGALS3~ LGAL~ 1.80 0.978 1.76 0.835 0.499 0.417 0.001 0.130 0.0656 -## 7 S-(M+A)/2 S M_Monocyt~ L_NK_C~ TGFB1 TGFBR3 0.347 0.278 0.312 0.0336 0.467 0.0157 0.24 9.26e-3 up 4.39 TGFB1_~ TGFB~ 0.919 0.461 0.424 0.631 0.253 0.160 0.765 0.459 0.612 -## 8 S-(M+A)/2 S M_Monocyt~ L_NK_C~ TGFB1 TGFBR3 0.347 0.278 0.312 0.0336 0.467 0.0157 0.24 2.05e-2 down -0.0993 TGFB1_~ TGFB~ 0.919 0.461 0.424 0.631 0.253 0.160 0.765 0.459 0.612 -## 9 S-(M+A)/2 S L_NK_CD56~ L_NK_C~ TGFB1 TGFBR3 0.197 0.278 0.238 0.0778 0.459 0.0157 0.24 9.26e-3 up 4.39 TGFB1_~ TGFB~ 0.897 0.461 0.414 0.463 0.253 0.117 0.459 0.459 0.459 -## 10 S-(M+A)/2 S L_NK_CD56~ L_NK_C~ TGFB1 TGFBR3 0.197 0.278 0.238 0.0778 0.459 0.0157 0.24 2.05e-2 down -0.0993 TGFB1_~ TGFB~ 0.897 0.461 0.414 0.463 0.253 0.117 0.459 0.459 0.459 -## 11 S-(M+A)/2 S M_Monocyt~ M_Mono~ TGFB1 ENG 0.347 1.36 0.854 0.0336 0.467 0.00936 0.299 1.17e-2 up 1.64 TGFB1_~ TGFB~ 0.919 0.147 0.135 0.631 0.125 0.0788 0.765 0.765 0.765 -## 12 S-(M+A)/2 S M_Monocyt~ M_Mono~ TGFB1 ENG 0.347 1.36 0.854 0.0336 0.467 0.00936 0.299 3.28e-2 down 0.354 TGFB1_~ TGFB~ 0.919 0.147 0.135 0.631 0.125 0.0788 0.765 0.765 0.765 -## 13 A-(S+M)/2 A M_Monocyt~ M_Mono~ ADAM10 NOTCH4 0.391 1.69 1.04 0.119 0.663 0.00183 0.236 9.22e-3 up -1.16 ADAM10~ ADAM~ 0.685 0.183 0.126 0.532 0.181 0.0960 0.001 0.001 0.001 -## 14 A-(S+M)/2 A M_Monocyt~ M_Mono~ ADAM10 NOTCH4 0.391 1.69 1.04 0.119 0.663 0.00183 0.236 1.03e-4 down 2.04 ADAM10~ ADAM~ 0.685 0.183 0.126 0.532 0.181 0.0960 0.001 0.001 0.001 -## 15 A-(S+M)/2 A M_Monocyt~ M_Mono~ ADAM10 TSPAN14 0.391 0.372 0.382 0.119 0.663 0.0825 0.628 9.22e-3 up -1.16 ADAM10~ ADAM~ 0.685 1.10 0.751 0.532 0.710 0.378 0.001 0.001 0.001 -## 16 A-(S+M)/2 A M_Monocyt~ M_Mono~ ADAM10 TSPAN14 0.391 0.372 0.382 0.119 0.663 0.0825 0.628 1.03e-4 down 2.04 ADAM10~ ADAM~ 0.685 1.10 0.751 0.532 0.710 0.378 0.001 0.001 0.001 -## 17 S-(M+A)/2 S L_NK_CD56~ M_Mono~ TGFB1 ENG 0.197 1.36 0.779 0.0778 0.459 0.00936 0.299 1.17e-2 up 1.64 TGFB1_~ TGFB~ 0.897 0.147 0.132 0.463 0.125 0.0578 0.459 0.765 0.612 -## 18 S-(M+A)/2 S L_NK_CD56~ M_Mono~ TGFB1 ENG 0.197 1.36 0.779 0.0778 0.459 0.00936 0.299 3.28e-2 down 0.354 TGFB1_~ TGFB~ 0.897 0.147 0.132 0.463 0.125 0.0578 0.459 0.765 0.612 -## 19 A-(S+M)/2 A M_Monocyt~ M_Mono~ S100A8 CD36 0.536 1.39 0.963 0.378 0.866 0.0907 0.633 1.68e-2 up 1.65 S100A8~ S100~ 2.36 0.296 0.699 0.772 0.240 0.185 0.001 0.001 0.001 -## 20 A-(S+M)/2 A M_Monocyt~ M_Mono~ S100A8 CD36 0.536 1.39 0.963 0.378 0.866 0.0907 0.633 -3.57e-3 down -0.159 S100A8~ S100~ 2.36 0.296 0.699 0.772 0.240 0.185 0.001 0.001 0.001 -## # ... with 33 more variables: lfc_pval_ligand , p_val_ligand_adapted , scaled_lfc_ligand , scaled_p_val_ligand , scaled_lfc_pval_ligand , scaled_p_val_ligand_adapted , -## # lfc_pval_receptor , p_val_receptor_adapted , scaled_lfc_receptor , scaled_p_val_receptor , scaled_lfc_pval_receptor , scaled_p_val_receptor_adapted , activity_up , -## # activity_scaled_up , scaled_activity_scaled_up , scaled_activity_up , activity_down , activity_scaled_down , scaled_activity_scaled_down , scaled_activity_down , -## # scaled_avg_exprs_ligand , scaled_avg_frq_ligand , pb_ligand_group , scaled_pb_ligand , scaled_avg_exprs_receptor , scaled_avg_frq_receptor , pb_receptor_group , -## # scaled_pb_receptor , fraction_expressing_ligand_receptor , max_scaled_activity , na.rm , prioritization_score , top_group , and abbreviated variable names 1: receiver, 2: receptor, -## # 3: lfc_ligand, 4: lfc_receptor, 5: ligand_receptor_lfc_avg, 6: p_val_ligand, 7: p_adj_ligand, 8: p_val_receptor, 9: p_adj_receptor, *: direction_regulation, *: activity_scaled, *: lr_interaction, -## # *: avg_ligand_group, *: avg_receptor_group, *: ligand_receptor_prod_group, *: fraction_ligand_group, *: fraction_receptor_group, *: ligand_receptor_fraction_prod_group, *: rel_abundance_scaled_sender, ... -``` - -# Step 5: Add information on prior knowledge and expression correlation between LR and target expression. + prioritization_tables$group_prioritization_tbl %>% head(20) + ## # A tibble: 20 × 18 + ## contrast group sender receiver ligand receptor lr_interaction id scaled_lfc_ligand scaled_p_val_ligand_…¹ scaled_lfc_receptor + ## + ## 1 M-(S+A)/2 M M_Monocyte_CD16 L_NK_CD… HLA.E KLRC1 HLA.E_KLRC1 HLA.… 0.816 0.982 0.954 + ## 2 M-(S+A)/2 M L_T_TIM3._CD38._HLAD… M_Monoc… IFNG IFNGR1 IFNG_IFNGR1 IFNG… 0.958 0.946 0.863 + ## 3 A-(S+M)/2 A M_Monocyte_CD16 L_T_TIM… LGALS3 LAG3 LGALS3_LAG3 LGAL… 0.918 0.980 0.985 + ## 4 M-(S+A)/2 M L_T_TIM3._CD38._HLAD… M_Monoc… IFNG IFNGR2 IFNG_IFNGR2 IFNG… 0.958 0.946 0.691 + ## 5 M-(S+A)/2 M M_Monocyte_CD16 L_T_TIM… CXCL16 CXCR6 CXCL16_CXCR6 CXCL… 0.861 0.936 0.966 + ## 6 A-(S+M)/2 A M_Monocyte_CD16 M_Monoc… VCAN TLR2 VCAN_TLR2 VCAN… 0.942 0.812 0.948 + ## 7 M-(S+A)/2 M M_Monocyte_CD16 L_T_TIM… HLA.E CD8A HLA.E_CD8A HLA.… 0.816 0.982 0.834 + ## 8 M-(S+A)/2 M M_Monocyte_CD16 M_Monoc… S100A9 CD68 S100A9_CD68 S100… 0.891 0.880 0.857 + ## 9 M-(S+A)/2 M M_Monocyte_CD16 L_T_TIM… HLA.D… LAG3 HLA.DRA_LAG3 HLA.… 0.886 0.978 0.784 + ## 10 M-(S+A)/2 M M_Monocyte_CD16 L_T_TIM… S100A8 CD69 S100A8_CD69 S100… 0.871 0.791 0.920 + ## 11 M-(S+A)/2 M M_Monocyte_CD16 M_Monoc… HLA.F LILRB1 HLA.F_LILRB1 HLA.… 0.890 0.985 0.982 + ## 12 M-(S+A)/2 M M_Monocyte_CD16 M_Monoc… TNF LTBR TNF_LTBR TNF_… 0.984 0.953 0.902 + ## 13 A-(S+M)/2 A L_T_TIM3._CD38._HLAD… L_T_TIM… LGALS… ITGB1 LGALS3BP_ITGB1 LGAL… 0.947 0.995 0.937 + ## 14 A-(S+M)/2 A M_Monocyte_CD16 M_Monoc… S100A8 CD36 S100A8_CD36 S100… 0.830 0.697 0.969 + ## 15 M-(S+A)/2 M M_Monocyte_CD16 M_Monoc… HLA.F LILRB2 HLA.F_LILRB2 HLA.… 0.890 0.985 0.942 + ## 16 M-(S+A)/2 M M_Monocyte_CD16 L_T_TIM… HLA.A CD8A HLA.A_CD8A HLA.… 0.925 0.999 0.834 + ## 17 M-(S+A)/2 M M_Monocyte_CD16 L_NK_CD… HLA.C KIR2DL1 HLA.C_KIR2DL1 HLA.… 0.897 0.972 0.790 + ## 18 S-(M+A)/2 S L_T_TIM3._CD38._HLAD… L_T_TIM… CD99 CD81 CD99_CD81 CD99… 0.656 0.780 0.962 + ## 19 S-(M+A)/2 S L_NK_CD56._CD16. M_Monoc… TGFB1 ENG TGFB1_ENG TGFB… 0.672 0.873 0.974 + ## 20 S-(M+A)/2 S M_Monocyte_CD16 M_Monoc… TGFB1 ENG TGFB1_ENG TGFB… 0.791 0.930 0.974 + ## # ℹ abbreviated name: ¹​scaled_p_val_ligand_adapted + ## # ℹ 7 more variables: scaled_p_val_receptor_adapted , max_scaled_activity , scaled_pb_ligand , scaled_pb_receptor , + ## # fraction_expressing_ligand_receptor , prioritization_score , top_group + +This table gives the final prioritization score of each interaction, and +the values of the individual prioritization criteria. + +With this step, all required steps are finished. Now, we can optionally +still run the following steps \* Calculate the across-samples expression +correlation between ligand-receptor pairs and target genes \* Prioritize +communication patterns involving condition-specific cell types through +an alternative prioritization scheme + +Here we will only focus on the expression correlation step: + +## Calculate the across-samples expression correlation between ligand-receptor pairs and target genes In multi-sample datasets, we have the opportunity to look whether expression of ligand-receptor across all samples is correlated with the expression of their by NicheNet predicted target genes. This is what we will do with the following line of code: -``` r -lr_target_prior_cor = lr_target_prior_cor_inference(prioritization_tables$group_prioritization_tbl$receiver %>% unique(), abundance_expression_info, celltype_de, grouping_tbl, prioritization_tables, ligand_target_matrix, logFC_threshold = logFC_threshold, p_val_threshold = p_val_threshold, p_val_adj = p_val_adj) -``` + lr_target_prior_cor = lr_target_prior_cor_inference( + receivers_oi = prioritization_tables$group_prioritization_tbl$receiver %>% unique(), + abundance_expression_info = abundance_expression_info, + celltype_de = celltype_de, + grouping_tbl = grouping_tbl, + prioritization_tables = prioritization_tables, + ligand_target_matrix = ligand_target_matrix, + logFC_threshold = logFC_threshold, + p_val_threshold = p_val_threshold, + p_val_adj = p_val_adj + ) -# Save all the output of MultiNicheNet +## Save all the output of MultiNicheNet -To avoid needing to redo the analysis later. All the output written down -here is sufficient to make all in-built downstream visualizations. +To avoid needing to redo the analysis later, we will here to save an +output object that contains all information to perform all downstream +analyses. -``` r -path = "./" + path = "./" -multinichenet_output = list( - celltype_info = abundance_expression_info$celltype_info, - celltype_de = celltype_de, - sender_receiver_info = abundance_expression_info$sender_receiver_info, - sender_receiver_de = sender_receiver_de, - ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, - prioritization_tables = prioritization_tables, - grouping_tbl = grouping_tbl, - lr_target_prior_cor = lr_target_prior_cor - ) -multinichenet_output = make_lite_output(multinichenet_output) + multinichenet_output = list( + celltype_info = abundance_expression_info$celltype_info, + celltype_de = celltype_de, + sender_receiver_info = abundance_expression_info$sender_receiver_info, + sender_receiver_de = sender_receiver_de, + ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes, + prioritization_tables = prioritization_tables, + grouping_tbl = grouping_tbl, + lr_target_prior_cor = lr_target_prior_cor + ) + multinichenet_output = make_lite_output(multinichenet_output) -save = FALSE -if(save == TRUE){ - saveRDS(multinichenet_output, paste0(path, "multinichenet_output.rds")) + save = FALSE + if(save == TRUE){ + saveRDS(multinichenet_output, paste0(path, "multinichenet_output.rds")) -} -``` + } -# Visualization of the results of the cell-cell communication analysis +# Interpreting the MultiNicheNet analysis output -In a first instance, we will look at the broad overview of prioritized -interactions via condition-specific Circos plots. +## Visualization of differential cell-cell interactions -## Circos plot of top-prioritized links +### Summarizing ChordDiagram circos plots + +In a first instance, we will look at the broad overview of prioritized +interactions via condition-specific Chordiagram circos plots. We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest. -``` r -prioritized_tbl_oi_all = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, rank_per_group = FALSE) -``` + prioritized_tbl_oi_all = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + top_n = 50, + rank_per_group = FALSE + ) -``` r -prioritized_tbl_oi = multinichenet_output$prioritization_tables$group_prioritization_tbl %>% - filter(id %in% prioritized_tbl_oi_all$id) %>% - distinct(id, sender, receiver, ligand, receptor, group) %>% left_join(prioritized_tbl_oi_all) -prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0 + prioritized_tbl_oi = + multinichenet_output$prioritization_tables$group_prioritization_tbl %>% + filter(id %in% prioritized_tbl_oi_all$id) %>% + distinct(id, sender, receiver, ligand, receptor, group) %>% + left_join(prioritized_tbl_oi_all) + prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0 -senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort() + senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort() -colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) -colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) + colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) + colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers) -circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver) -``` + circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver) -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-32-1.png)![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-32-2.png)![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-32-3.png)![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-32-4.png) +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-1.png)![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-2.png)![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-3.png)![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-4.png) -## Visualization of scaled ligand-receptor pseudobulk products and ligand activity +### Interpretable bubble plots -Now we will visualize per sample the scaled product of ligand and -receptor expression. Samples that were left out of the DE analysis are -indicated with a smaller dot (this helps to indicate the samples that -did not contribute to the calculation of the logFC, and thus not -contributed to the final prioritization) +Whereas these ChordDiagrams show the most specific interactions per +group, they don’t give insights into the data behind these predictions. +Therefore we will now look at visualizations that indicate the different +prioritization criteria used in MultiNicheNet. + +In the next type of plots, we will 1) visualize the per-sample scaled +product of normalized ligand and receptor pseudobulk expression, 2) +visualize the scaled ligand activities, 3) cell-type specificity. We will now check the top 50 interactions specific for the MIS-C group -``` r -group_oi = "M" -``` + group_oi = "M" + + prioritized_tbl_oi_M_50 = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + top_n = 50, + groups_oi = group_oi) + + plot_oi = make_sample_lr_prod_activity_plots( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M_50) + plot_oi + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-48-1.png) +Samples that were left out of the DE analysis are indicated with a +smaller dot (this helps to indicate the samples that did not contribute +to the calculation of the logFC, and thus not contributed to the final +prioritization) + +As a further help for further prioritization, we can assess the level of +curation of these LR pairs as defined by the Intercellular Communication +part of the Omnipath database -``` r -prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi) + prioritized_tbl_oi_M_50_omnipath = prioritized_tbl_oi_M_50 %>% + inner_join(lr_network_all) -plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50) -plot_oi -``` +Now we add this to the bubble plot visualization: -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-34-1.png) + plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M_50_omnipath) + plot_oi -Typically, there are way more than 50 differentially expressed and -active ligand-receptor pairs per group across all sender-receiver -combinations. Therefore it might be useful to zoom in on specific cell -types as senders/receivers: +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-50-1.png) -Eg M_Monocyte_CD16 as receiver: +As you can see, the HEBP1-FPR2 interaction has no Omnipath DB scores. +This is because this LR pair was not documented by the Omnipath LR +database. Instead it was documented by the original NicheNet LR network +(source: Guide2Pharmacology) as can be seen in the table. -``` r -prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi, receivers_oi = "M_Monocyte_CD16") +Further note: Typically, there are way more than 50 differentially +expressed and active ligand-receptor pairs per group across all +sender-receiver combinations. Therefore it might be useful to zoom in on +specific cell types as senders/receivers: -plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50) -plot_oi -``` +Eg M\_Monocyte\_CD16 as receiver: -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-35-1.png) + prioritized_tbl_oi_M_50 = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + groups_oi = group_oi, + receivers_oi = "M_Monocyte_CD16") -Eg M_Monocyte_CD16 as sender: + plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all)) + plot_oi -``` r -prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi, senders_oi = "M_Monocyte_CD16") +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-52-1.png) -plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_M_50) -plot_oi -``` +Eg M\_Monocyte\_CD16 as sender: -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-36-1.png) + prioritized_tbl_oi_M_50 = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + groups_oi = group_oi, + senders_oi = "M_Monocyte_CD16") + + plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all)) + plot_oi + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-54-1.png) You can make these plots also for the other groups, like we will illustrate now for the S group -``` r -group_oi = "S" -``` + group_oi = "S" -``` r -prioritized_tbl_oi_S_50 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 50, groups_oi = group_oi) + prioritized_tbl_oi_S_50 = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + groups_oi = group_oi) -plot_oi = make_sample_lr_prod_activity_plots(multinichenet_output$prioritization_tables, prioritized_tbl_oi_S_50) -plot_oi -``` + plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_S_50 %>% inner_join(lr_network_all)) + plot_oi -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-38-1.png) +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-56-1.png) **Note**: Use `make_sample_lr_prod_activity_batch_plots` if you have batches and want to visualize them on this plot! -## Visualization of ligand-activity, ligand-target links, and target gene expression +## Visualization of differential ligand-target links + +### Without filtering of target genes based on LR-target expression correlation In another type of plot, we can visualize the ligand activities for a group-receiver combination, and show the predicted ligand-target links, @@ -1199,78 +1168,141 @@ and also the expression of the predicted target genes across samples. For this, we now need to define a receiver cell type of interest. As example, we will take `L_T_TIM3._CD38._HLADR.` cells as receiver, and -look at the top 20 senderLigand-receiverReceptor pairs with these cells +look at the top 10 senderLigand-receiverReceptor pairs with these cells as receiver. -``` r -group_oi = "M" -receiver_oi = "L_T_TIM3._CD38._HLADR." -prioritized_tbl_oi_M_10 = get_top_n_lr_pairs(multinichenet_output$prioritization_tables, 10, groups_oi = group_oi, receivers_oi = receiver_oi) -``` - -``` r -combined_plot = make_ligand_activity_target_plot(group_oi, receiver_oi, prioritized_tbl_oi_M_10, multinichenet_output$prioritization_tables, multinichenet_output$ligand_activities_targets_DEgenes, contrast_tbl, multinichenet_output$grouping_tbl, multinichenet_output$celltype_info, ligand_target_matrix, plot_legend = FALSE) -combined_plot -## $combined_plot -``` - -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-40-1.png) + group_oi = "M" + receiver_oi = "M_Monocyte_CD16" + prioritized_tbl_oi_M_10 = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 10, + groups_oi = group_oi, + receivers_oi = receiver_oi) + + combined_plot = make_ligand_activity_target_plot( + group_oi, + receiver_oi, + prioritized_tbl_oi_M_10, + multinichenet_output$prioritization_tables, + multinichenet_output$ligand_activities_targets_DEgenes, contrast_tbl, + multinichenet_output$grouping_tbl, + multinichenet_output$celltype_info, + ligand_target_matrix, + plot_legend = FALSE) + combined_plot + ## $combined_plot + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-1.png) ## ## $legends -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-40-2.png) +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-2.png) + +What if there is a specific ligand you are interested in? + + group_oi = "M" + receiver_oi = "M_Monocyte_CD16" + ligands_oi = c("IFNG","IL15") + prioritized_tbl_ligands_oi = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 10000, + groups_oi = group_oi, + receivers_oi = receiver_oi + ) %>% filter(ligand %in% ligands_oi) # ligands should still be in the output tables of course + + combined_plot = make_ligand_activity_target_plot( + group_oi, + receiver_oi, + prioritized_tbl_ligands_oi, + multinichenet_output$prioritization_tables, + multinichenet_output$ligand_activities_targets_DEgenes, + contrast_tbl, + multinichenet_output$grouping_tbl, + multinichenet_output$celltype_info, + ligand_target_matrix, + plot_legend = FALSE) + combined_plot + ## $combined_plot + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-1.png) -**Note** Use `make_DEgene_dotplot_pseudobulk_batch` if you want to -indicate the batch of each sample to the plot - -## Visualization of expression-correlated target genes of ligand-receptor pairs - -Before, we had calculated the correlation in expression between -ligand-receptor pairs and DE genes. Now we will filter out correlated -ligand-receptor –\> target links that both show high expression -correlation (spearman or activity \> 0.50 in this example) and have some -prior knowledge to support their link. + ## + ## $legends -``` r -group_oi = "M" -receiver_oi = "M_Monocyte_CD16" -lr_target_prior_cor_filtered = multinichenet_output$lr_target_prior_cor %>% inner_join(ligand_activities_targets_DEgenes$ligand_activities %>% distinct(ligand, target, direction_regulation, contrast)) %>% inner_join(contrast_tbl) %>% filter(group == group_oi, receiver == receiver_oi) -lr_target_prior_cor_filtered_up = lr_target_prior_cor_filtered %>% filter(direction_regulation == "up") %>% filter( (rank_of_target < top_n_target) & (pearson > 0.50 | spearman > 0.50)) -lr_target_prior_cor_filtered_down = lr_target_prior_cor_filtered %>% filter(direction_regulation == "down") %>% filter( (rank_of_target < top_n_target) & (pearson < -0.50 | spearman < -0.50)) # downregulation -- negative correlation -lr_target_prior_cor_filtered = bind_rows(lr_target_prior_cor_filtered_up, lr_target_prior_cor_filtered_down) -``` +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-2.png) + +### With filtering of target genes based on LR-target expression correlation + +In the previous plots, target genes were shown that are predicted as +target gene of ligands based on prior knowledge. However, we can use the +multi-sample nature of this data to filter target genes based on +expression correlation between the upstream ligand-receptor pair and the +downstream target gene. We will filter out correlated ligand-receptor +–> target links that both show high expression correlation (spearman +or pearson correlation > 0.50 in this example) and have some prior +knowledge to support their link. Note that you can only make these +visualization if you ran step 7 of the core MultiNicheNet analysis. + + group_oi = "M" + receiver_oi = "M_Monocyte_CD16" + lr_target_prior_cor_filtered = multinichenet_output$lr_target_prior_cor %>% + inner_join( + multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>% + distinct(ligand, target, direction_regulation, contrast) + ) %>% + inner_join(contrast_tbl) %>% filter(group == group_oi, receiver == receiver_oi) + + lr_target_prior_cor_filtered_up = lr_target_prior_cor_filtered %>% + filter(direction_regulation == "up") %>% + filter( (rank_of_target < top_n_target) & (pearson > 0.50 | spearman > 0.50)) + lr_target_prior_cor_filtered_down = lr_target_prior_cor_filtered %>% + filter(direction_regulation == "down") %>% + filter( (rank_of_target < top_n_target) & (pearson < -0.50 | spearman < -0.50)) # downregulation -- negative correlation + lr_target_prior_cor_filtered = bind_rows( + lr_target_prior_cor_filtered_up, + lr_target_prior_cor_filtered_down) Now we will visualize the top correlated target genes for the LR pairs that are also in the top 50 LR pairs discriminating the groups from each other: -``` r -prioritized_tbl_oi = get_top_n_lr_pairs(prioritization_tables, 50, groups_oi = group_oi, receivers_oi = receiver_oi) -``` + prioritized_tbl_oi = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + groups_oi = group_oi, + receivers_oi = receiver_oi) -``` r -lr_target_correlation_plot = make_lr_target_correlation_plot(multinichenet_output$prioritization_tables, prioritized_tbl_oi, lr_target_prior_cor_filtered , multinichenet_output$grouping_tbl, multinichenet_output$celltype_info, receiver_oi,plot_legend = FALSE) -lr_target_correlation_plot$combined_plot -``` + lr_target_correlation_plot = make_lr_target_correlation_plot( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi, + lr_target_prior_cor_filtered , + multinichenet_output$grouping_tbl, + multinichenet_output$celltype_info, + receiver_oi, + plot_legend = FALSE) + lr_target_correlation_plot$combined_plot -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-43-1.png) +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-63-1.png) You can also visualize the expression correlation in the following way for a selected LR pair and their targets: -``` r -ligand_oi = "IFNG" -receptor_oi = "IFNGR2" -sender_oi = "L_T_TIM3._CD38._HLADR." -receiver_oi = "M_Monocyte_CD16" -lr_target_scatter_plot = make_lr_target_scatter_plot(multinichenet_output$prioritization_tables, ligand_oi, receptor_oi, sender_oi, receiver_oi, multinichenet_output$celltype_info, multinichenet_output$grouping_tbl, lr_target_prior_cor_filtered) -lr_target_scatter_plot -``` + ligand_oi = "IFNG" + receptor_oi = "IFNGR2" + sender_oi = "L_T_TIM3._CD38._HLADR." + receiver_oi = "M_Monocyte_CD16" + lr_target_scatter_plot = make_lr_target_scatter_plot( + multinichenet_output$prioritization_tables, + ligand_oi, receptor_oi, sender_oi, receiver_oi, + multinichenet_output$celltype_info, + multinichenet_output$grouping_tbl, + lr_target_prior_cor_filtered) + lr_target_scatter_plot -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-44-1.png) +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-64-1.png) -## Intercellular regulatory network systems view +## Intercellular regulatory network inference and visualization In the plots before, we demonstrated that some DE genes have both expression correlation and prior knowledge support to be downstream of @@ -1286,44 +1318,119 @@ different cell populations involved. In this plot, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein -interactions! - -``` r -prioritized_tbl_oi = get_top_n_lr_pairs(prioritization_tables, 150, rank_per_group = FALSE) - -lr_target_prior_cor_filtered = prioritization_tables$group_prioritization_tbl$group %>% unique() %>% lapply(function(group_oi){ - lr_target_prior_cor_filtered = multinichenet_output$lr_target_prior_cor %>% inner_join(ligand_activities_targets_DEgenes$ligand_activities %>% distinct(ligand, target, direction_regulation, contrast)) %>% inner_join(contrast_tbl) %>% filter(group == group_oi) - lr_target_prior_cor_filtered_up = lr_target_prior_cor_filtered %>% filter(direction_regulation == "up") %>% filter( (rank_of_target < top_n_target) & (pearson > 0.50 | spearman > 0.50)) - lr_target_prior_cor_filtered_down = lr_target_prior_cor_filtered %>% filter(direction_regulation == "down") %>% filter( (rank_of_target < top_n_target) & (pearson < -0.50 | spearman < -0.50)) - lr_target_prior_cor_filtered = bind_rows(lr_target_prior_cor_filtered_up, lr_target_prior_cor_filtered_down) -}) %>% bind_rows() -``` - -``` r -colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the original yellow background color with white font is not very readable -graph_plot = make_ggraph_ligand_target_links(lr_target_prior_cor_filtered = lr_target_prior_cor_filtered, prioritized_tbl_oi = prioritized_tbl_oi, colors = colors_sender) -graph_plot$plot -``` - -![](basic_analysis_steps_MISC_files/figure-gfm/unnamed-chunk-46-1.png) - -``` r -graph_plot$source_df_lt %>% head() -## # A tibble: 6 x 6 -## sender receiver direction_regulation group type weight -## -## 1 M_Monocyte_CD16_S100A8 L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -## 2 M_Monocyte_CD16_LGALS3 L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -## 3 L_T_TIM3._CD38._HLADR._LGALS3BP L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -## 4 L_T_TIM3._CD38._HLADR._CD40LG L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -## 5 M_Monocyte_CD16_ADAM10 L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -## 6 L_NK_CD56._CD16._ADAM10 L_T_TIM3._CD38._HLADR._TNFSF10 up A Ligand-Target 1 -graph_plot$nodes_df %>% head() -## node celltype gene type_gene -## M_Monocyte_CD16_S100A8 M_Monocyte_CD16_S100A8 M_Monocyte_CD16 S100A8 ligand -## M_Monocyte_CD16_LGALS3 M_Monocyte_CD16_LGALS3 M_Monocyte_CD16 LGALS3 ligand -## L_T_TIM3._CD38._HLADR._LGALS3BP L_T_TIM3._CD38._HLADR._LGALS3BP L_T_TIM3._CD38._HLADR. LGALS3BP ligand -## L_T_TIM3._CD38._HLADR._CD40LG L_T_TIM3._CD38._HLADR._CD40LG L_T_TIM3._CD38._HLADR. CD40LG ligand -## M_Monocyte_CD16_ADAM10 M_Monocyte_CD16_ADAM10 M_Monocyte_CD16 ADAM10 ligand -## L_NK_CD56._CD16._ADAM10 L_NK_CD56._CD16._ADAM10 L_NK_CD56._CD16. ADAM10 ligand -``` +interactions! We will infer this intercellular regulatory network here +for the top250 interactions. + + prioritized_tbl_oi = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 250, + rank_per_group = FALSE) + + lr_target_prior_cor_filtered = + multinichenet_output$prioritization_tables$group_prioritization_tbl$group %>% unique() %>% + lapply(function(group_oi){ + lr_target_prior_cor_filtered = multinichenet_output$lr_target_prior_cor %>% + inner_join( + multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>% + distinct(ligand, target, direction_regulation, contrast) + ) %>% + inner_join(contrast_tbl) %>% filter(group == group_oi) + + lr_target_prior_cor_filtered_up = lr_target_prior_cor_filtered %>% + filter(direction_regulation == "up") %>% + filter( (rank_of_target < top_n_target) & (pearson > 0.50 | spearman > 0.50)) + + lr_target_prior_cor_filtered_down = lr_target_prior_cor_filtered %>% + filter(direction_regulation == "down") %>% + filter( (rank_of_target < top_n_target) & (pearson < -0.50 | spearman < -0.50)) + lr_target_prior_cor_filtered = bind_rows( + lr_target_prior_cor_filtered_up, + lr_target_prior_cor_filtered_down + ) + }) %>% bind_rows() + + lr_target_df = lr_target_prior_cor_filtered %>% + distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation) + + network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi) + network$links %>% head() + ## # A tibble: 6 × 6 + ## sender_ligand receiver_target direction_regulation group type weight + ## + ## 1 L_T_TIM3._CD38._HLADR._CLEC2B L_NK_CD56._CD16._CCL3 up M Ligand-Target 1 + ## 2 M_Monocyte_CD16_ICAM1 L_NK_CD56._CD16._CCL3 up M Ligand-Target 1 + ## 3 M_Monocyte_CD16_B2M M_Monocyte_CD16_IL1B up M Ligand-Target 1 + ## 4 M_Monocyte_CD16_B2M M_Monocyte_CD16_TIMP1 up M Ligand-Target 1 + ## 5 M_Monocyte_CD16_HLA.A M_Monocyte_CD16_IL1B up M Ligand-Target 1 + ## 6 M_Monocyte_CD16_HLA.A M_Monocyte_CD16_TIMP1 up M Ligand-Target 1 + network$nodes %>% head() + ## # A tibble: 6 × 4 + ## node celltype gene type_gene + ## + ## 1 L_T_TIM3._CD38._HLADR._ICAM3 L_T_TIM3._CD38._HLADR. ICAM3 ligand/receptor + ## 2 M_Monocyte_CD16_LILRB2 M_Monocyte_CD16 LILRB2 ligand/receptor + ## 3 M_Monocyte_CD16_ICAM3 M_Monocyte_CD16 ICAM3 ligand/receptor + ## 4 M_Monocyte_CD16_ITGB2 M_Monocyte_CD16 ITGB2 ligand/receptor + ## 5 M_Monocyte_CD16_SIGLEC9 M_Monocyte_CD16 SIGLEC9 ligand/receptor + ## 6 M_Monocyte_CD16_ENG M_Monocyte_CD16 ENG ligand/receptor + + colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the original yellow background with white font is not very readable + network_graph = visualize_network(network, colors_sender) + network_graph$plot + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-67-1.png) + +Interestingly, we can also use this network to further prioritize +differential CCC interactions. Here we will assume that the most +important LR interactions are the ones that are involved in this +intercellular regulatory network. We can get these interactions as +follows: + + network$prioritized_lr_interactions + ## # A tibble: 117 × 5 + ## group sender receiver ligand receptor + ## + ## 1 M L_T_TIM3._CD38._HLADR. L_NK_CD56._CD16. CLEC2B KLRB1 + ## 2 M M_Monocyte_CD16 L_NK_CD56._CD16. ICAM1 ITGAX + ## 3 M M_Monocyte_CD16 M_Monocyte_CD16 B2M LILRB1 + ## 4 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.A LILRB1 + ## 5 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.B LILRB1 + ## 6 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.C LILRB1 + ## 7 M L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 HLA.A LILRB1 + ## 8 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.A LILRB2 + ## 9 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.C LILRB2 + ## 10 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.DRA CD63 + ## # ℹ 107 more rows + + prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join( + network$prioritized_lr_interactions) + prioritized_tbl_oi_network + ## # A tibble: 117 × 8 + ## group sender receiver ligand receptor id prioritization_score prioritization_rank + ## + ## 1 M M_Monocyte_CD16 L_NK_CD56._CD16. HLA.E KLRC1 HLA.E_KLRC1_M_Monocyte_CD16_L_NK_… 0.956 1 + ## 2 M L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 IFNG IFNGR1 IFNG_IFNGR1_L_T_TIM3._CD38._HLADR… 0.948 2 + ## 3 M L_T_TIM3._CD38._HLADR. M_Monocyte_CD16 IFNG IFNGR2 IFNG_IFNGR2_L_T_TIM3._CD38._HLADR… 0.928 4 + ## 4 M M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. CXCL16 CXCR6 CXCL16_CXCR6_M_Monocyte_CD16_L_T_… 0.922 5 + ## 5 M M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. HLA.E CD8A HLA.E_CD8A_M_Monocyte_CD16_L_T_TI… 0.919 7 + ## 6 M M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. HLA.DRA LAG3 HLA.DRA_LAG3_M_Monocyte_CD16_L_T_… 0.910 9 + ## 7 M M_Monocyte_CD16 L_T_TIM3._CD38._HLADR. S100A8 CD69 S100A8_CD69_M_Monocyte_CD16_L_T_T… 0.907 10 + ## 8 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.F LILRB1 HLA.F_LILRB1_M_Monocyte_CD16_M_Mo… 0.903 11 + ## 9 M M_Monocyte_CD16 M_Monocyte_CD16 TNF LTBR TNF_LTBR_M_Monocyte_CD16_M_Monocy… 0.902 12 + ## 10 M M_Monocyte_CD16 M_Monocyte_CD16 HLA.F LILRB2 HLA.F_LILRB2_M_Monocyte_CD16_M_Mo… 0.898 15 + ## # ℹ 107 more rows + +Visualize now the expression and activity of these interactions for the +M group + + group_oi = "M" + + prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi) + + plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M %>% inner_join(lr_network_all) + ) + plot_oi + +![](basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-71-1.png) diff --git a/vignettes/basic_analysis_steps_MISC_SACL.Rmd b/vignettes/basic_analysis_steps_MISC_SACL.Rmd index e3f9bd0..c1ac35f 100644 --- a/vignettes/basic_analysis_steps_MISC_SACL.Rmd +++ b/vignettes/basic_analysis_steps_MISC_SACL.Rmd @@ -425,12 +425,10 @@ Because we don't have several samples per condition, we cannot apply pseudobulki ```{r} DE_info = get_DE_info_sampleAgnostic( sce = sce, - sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, - batches = batches, covariates = covariates, + group_id = group_id, celltype_id = celltype_id, contrasts_oi = contrasts_oi, - min_cells = min_cells, expressed_df = frq_list$expressed_df, - findMarkers = TRUE, + min_cells = min_cells, contrast_tbl = contrast_tbl) ``` @@ -792,3 +790,75 @@ combined_plot = make_ligand_activity_target_plot( plot_legend = FALSE) combined_plot ``` + +## Intercellular regulatory network inference and visualization + +In the plots before, we visualized target genes that are supported by prior knowledge to be downstream of ligand-receptor pairs. Interestingly, some target genes can be ligands or receptors themselves. This illustrates that cells can send signals to other cells, who as a response to these signals produce signals themselves to feedback to the original sender cells, or who will effect other cell types. + +As last plot, we can generate a 'systems' view of these intercellular feedback and cascade processes than can be occuring between the different cell populations involved. In this plot, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein interactions! We will infer this intercellular regulatory network here for the top50 interactions. + +__Important__ In the default MultiNicheNet workflow for datasets with multiple samples per condition, we further filter out target genes based on expression correlation before generating this plot. However, this would not be meaningful here since we don't have multiple samples. As a consequence, we will have many ligand-target links here in these plots, making the plots less readable if we would consider more than the top 50 or 100 interactions. + +```{r} +prioritized_tbl_oi = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + rank_per_group = FALSE) +``` + +```{r} +lr_target_prior = prioritized_tbl_oi %>% inner_join( + multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>% + distinct(ligand, target, direction_regulation, contrast) %>% inner_join(contrast_tbl) %>% ungroup() + ) + + +lr_target_df = lr_target_prior %>% distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation) +``` + +```{r} +network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi) +network$links %>% head() +network$nodes %>% head() +``` + +```{r, fig.width=30, fig.height=12} +colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the original yellow background with white font is not very readable +network_graph = visualize_network(network, colors_sender) +network_graph$plot +``` + +If you want to know the exact set of top-ranked ligands that are potentially regulated by one ligand of interest, you can run following code: + +```{r} +ligand_oi = "IFNG" +group_oi = "M" +lr_target_df %>% filter(ligand == ligand_oi & group == group_oi) %>% pull(target) %>% unique() %>% intersect(lr_network$ligand) +``` + +Interestingly, we can also use this network to further prioritize differential CCC interactions. Here we will assume that the most important LR interactions are the ones that are involved in this intercellular regulatory network. We can get these interactions as follows: + +```{r} +network$prioritized_lr_interactions +``` + +```{r, fig.width=30, fig.height=12} +prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join( + network$prioritized_lr_interactions) +prioritized_tbl_oi_network +``` + +Visualize now the expression and activity of these interactions for the M group +```{r} +group_oi = "M" +``` + +```{r, fig.height=11, fig.width=16} +prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi) + +plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M %>% inner_join(lr_network_all) + ) +plot_oi +``` diff --git a/vignettes/basic_analysis_steps_MISC_SACL.html b/vignettes/basic_analysis_steps_MISC_SACL.html new file mode 100644 index 0000000..6040bff --- /dev/null +++ b/vignettes/basic_analysis_steps_MISC_SACL.html @@ -0,0 +1,1692 @@ + + + + + + + + + + + + + + + +MultiNicheNet analysis: MIS-C threewise comparison - sample-agnostic/cell-level alternative + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +

Contents

+
+ +
+ + +

In this vignette, you can learn how to perform a sample-agnostic/cell-level MultiNicheNet analysis to compare cell-cell communication between conditions of interest. In this workflow, cells from the same condition will be pooled together similar to regular differential cell-cell communication workflows. We only recommend running this pipeline if you have less than 3 samples in each of the groups/conditions you want to compare. Do not run this workflow if you have more samples per condition.

+

As input you need a SingleCellExperiment object containing at least the raw count matrix and metadata providing the following information for each cell: the group, sample and cell type.

+

As example expression data of interacting cells, we will here use scRNAseq data of immune cells in MIS-C patients and healthy siblings from this paper of Hoste et al.: TIM3+ TRBV11-2 T cells and IFNγ signature in patrolling monocytes and CD16+ NK cells delineate MIS-C DOI. +MIS-C (multisystem inflammatory syndrome in children) is a novel rare immunodysregulation syndrome that can arise after SARS-CoV-2 infection in children. We will use MultiNicheNet to explore immune cell crosstalk enriched in MIS-C compared to healthy siblings and adult COVID-19 patients.

+

In this vignette, we will first prepare the MultiNicheNet core analysis, then run the several steps in the MultiNicheNet core analysis, and finally interpret the output.

+
+

1 Preparation of the MultiNicheNet core analysis

+
library(SingleCellExperiment)
+library(dplyr)
+library(ggplot2)
+library(nichenetr)
+library(multinichenetr)
+
+

1.1 Load NicheNet’s ligand-receptor network and ligand-target matrix

+

MultiNicheNet builds upon the NicheNet framework and uses the same prior knowledge networks (ligand-receptor network and ligand-target matrix, currently v2 version).

+

The Nichenet v2 networks and matrices for both mouse and human can be downloaded from Zenodo DOI.

+

We will read these object in for human because our expression data is of human patients. +Gene names are here made syntactically valid via make.names() to avoid the loss of genes (eg H2-M3) in downstream visualizations.

+
organism = "human"
+
options(timeout = 120)
+
+if(organism == "human"){
+  
+  lr_network_all = 
+    readRDS(url(
+      "https://zenodo.org/record/10229222/files/lr_network_human_allInfo_30112033.rds"
+      )) %>% 
+    mutate(
+      ligand = convert_alias_to_symbols(ligand, organism = organism), 
+      receptor = convert_alias_to_symbols(receptor, organism = organism))
+  
+  lr_network_all = lr_network_all  %>% 
+    mutate(ligand = make.names(ligand), receptor = make.names(receptor)) 
+  
+  lr_network = lr_network_all %>% 
+    distinct(ligand, receptor)
+  
+  ligand_target_matrix = readRDS(url(
+    "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final.rds"
+    ))
+  
+  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  
+  lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix))
+  ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()]
+  
+} else if(organism == "mouse"){
+  
+  lr_network_all = readRDS(url(
+    "https://zenodo.org/record/10229222/files/lr_network_mouse_allInfo_30112033.rds"
+    )) %>% 
+    mutate(
+      ligand = convert_alias_to_symbols(ligand, organism = organism), 
+      receptor = convert_alias_to_symbols(receptor, organism = organism))
+  
+  lr_network_all = lr_network_all  %>% 
+    mutate(ligand = make.names(ligand), receptor = make.names(receptor)) 
+  lr_network = lr_network_all %>% 
+    distinct(ligand, receptor)
+  
+  ligand_target_matrix = readRDS(url(
+    "https://zenodo.org/record/7074291/files/ligand_target_matrix_nsga2r_final_mouse.rds"
+    ))
+  
+  colnames(ligand_target_matrix) = colnames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  rownames(ligand_target_matrix) = rownames(ligand_target_matrix) %>% 
+    convert_alias_to_symbols(organism = organism) %>% make.names()
+  
+  lr_network = lr_network %>% filter(ligand %in% colnames(ligand_target_matrix))
+  ligand_target_matrix = ligand_target_matrix[, lr_network$ligand %>% unique()]
+  
+}
+
+
+

1.2 Read in SingleCellExperiment Object

+

In this vignette, we will load in a subset of the scRNAseq data of the MIS-C DOI. For the sake of demonstration, this subset only contains 3 cell types. These celltypes are some of the cell types that were found to be most interesting related to MIS-C according to Hoste et al. 

+

If you start from a Seurat object, you can convert it easily to a SingleCellExperiment object via sce = Seurat::as.SingleCellExperiment(seurat_obj, assay = "RNA").

+

Because the NicheNet 2.0. networks are in the most recent version of the official gene symbols, we will make sure that the gene symbols used in the expression data are also updated (= converted from their “aliases” to official gene symbols). Afterwards, we will make them again syntactically valid.

+
sce = readRDS(url(
+  "https://zenodo.org/record/8010790/files/sce_subset_misc.rds"
+  ))
+sce = alias_to_symbol_SCE(sce, "human") %>% makenames_SCE()
+

This dataset does not yet contain normalized counts:

+
sce = scuttle::logNormCounts(sce)
+
+
+

1.3 Prepare the settings of the MultiNicheNet cell-cell communication analysis

+

In this step, we will formalize our research question into MultiNicheNet input arguments.

+
+

1.3.1 Define in which metadata columns we can find the group, sample and cell type IDs

+

In this case study, we want to study differences in cell-cell communication patterns between MIS-C patients (M), their healthy siblings (S) and adult patients with severe covid (A). The meta data columns that indicate this disease status(=group/condition of interest) is MIS.C.AgeTier.

+

Cell type annotations are indicated in the Annotation_v2.0 column, and the sample is indicated by the ShortID column. +If your cells are annotated in multiple hierarchical levels, we recommend using a relatively high level in the hierarchy because MultiNicheNet focuses on differential expression and not differential abundance

+
group_id = "MIS.C.AgeTier"
+sample_id = "ShortID"
+celltype_id = "Annotation_v2.0"
+

In the sammple-agnostic / cell-level worklow, it is not possible to correct for batch effects or covariates. Therefore, you here have to use the following NA settings:

+
covariates = NA
+batches = NA
+

Important: The column names of group, sample, and cell type should be syntactically valid (make.names)

+

Important: All group, sample, and cell type names should be syntactically valid as well (make.names) (eg through SummarizedExperiment::colData(sce)$ShortID = SummarizedExperiment::colData(sce)$ShortID %>% make.names())

+
+
+

1.3.2 Define the contrasts of interest.

+

Here, we want to compare each patient group to the other groups, so the MIS-C (M) group vs healthy control siblings (S) and adult COVID19 patients (A) (= M vs S+A) and so on. We want to know which cell-cell communication patterns are specific for the M vs A+S group, the A vs M+S group and the S vs A+M group.

+

To perform this comparison, we need to set the following contrasts:

+
contrasts_oi = c("'M-(S+A)/2','S-(M+A)/2','A-(S+M)/2'")
+

Very Important Note the format to indicate the contrasts! This formatting should be adhered to very strictly, and white spaces are not allowed! Check ?get_DE_info for explanation about how to define this well. The most important points are that: +each contrast is surrounded by single quotation marks +contrasts are separated by a comma without any white space +*all contrasts together are surrounded by double quotation marks.

+

If you compare against two groups, you should divide by 2 (as demonstrated here), if you compare against three groups, you should divide by 3 and so on.

+

For downstream visualizations and linking contrasts to their main condition, we also need to run the following: +This is necessary because we will also calculate cell-type+condition specificity of ligands and receptors.

+
contrast_tbl = tibble(contrast = 
+                        c("M-(S+A)/2","S-(M+A)/2", "A-(S+M)/2"), 
+                      group = c("M","S","A"))
+

If you want to compare only two groups (eg M vs S), you can use the following: +contrasts_oi = c("'M-S','S-M'") +contrast_tbl = tibble(contrast = c("M-S","S-M"), group = c("M","S"))

+
+
+

1.3.3 Define the sender and receiver cell types of interest.

+

If you want to focus the analysis on specific cell types (e.g. because you know which cell types reside in the same microenvironments based on spatial data), you can define this here. If you have sufficient computational resources and no specific idea of cell-type colocalzations, we recommend to consider all cell types as potential senders and receivers. Later on during analysis of the output it is still possible to zoom in on the cell types that interest you most, but your analysis is not biased to them.

+

Here we will consider all cell types in the data:

+
senders_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
+receivers_oi = SummarizedExperiment::colData(sce)[,celltype_id] %>% unique()
+sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% 
+            c(senders_oi, receivers_oi)
+          ]
+

In case you would have samples in your data that do not belong to one of the groups/conditions of interest, we recommend removing them and only keeping conditions of interst:

+
conditions_keep = c("M", "S", "A")
+sce = sce[, SummarizedExperiment::colData(sce)[,group_id] %in% 
+            conditions_keep
+          ]
+
+
+
+
+

2 Running the MultiNicheNet core analysis

+

Now we will run the core of a MultiNicheNet analysis. This analysis consists of the following steps:

+
    +
    1. +
    2. Cell-type filtering: determine which cell types are sufficiently present
    3. +
  • +
    1. +
    2. Gene filtering: determine which genes are sufficiently expressed in each present cell type
    3. +
  • +
    1. +
    2. Pseudobulk expression calculation: determine and normalize per-sample pseudobulk expression levels for each expressed gene in each present cell type
    3. +
  • +
    1. +
    2. Differential expression (DE) analysis: determine which genes are differentially expressed
    3. +
  • +
    1. +
    2. Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes
    3. +
  • +
    1. +
    2. Prioritization: rank cell-cell communication patterns through multi-criteria prioritization
    3. +
  • +
+

Following these steps, one can optionally +* 7. Calculate the across-samples expression correlation between ligand-receptor pairs and target genes +* 8. Prioritize communication patterns involving condition-specific cell types through an alternative prioritization scheme

+

After these steps, the output can be further explored as we will demonstrate in the “Downstream analysis of the MultiNicheNet output” section.

+

In this vignette, we will demonstrate these steps one-by-one, which offers the most flexibility to the user to assess intermediary results. Other vignettes will demonstrate the use of the multi_nichenet_analysis wrapper function.

+
+

2.1 Cell-type filtering: determine which cell types are sufficiently present

+

In this step we will calculate and visualize cell type abundances. This will give an indication about which cell types will be retained in the analysis, and which cell types will be filtered out.

+

In the following analysis we will a required minimum number of cells per cell type per condition at 10. Conditions that have less than min_cells cells will be excluded from the analysis for that specific cell type.

+
min_cells = 10
+
abundance_info = get_abundance_info(
+  sce = sce, 
+  sample_id = group_id, group_id = group_id, celltype_id = celltype_id, 
+  min_cells = min_cells, 
+  senders_oi = senders_oi, receivers_oi = receivers_oi, 
+  batches = batches
+  )
+

First, we will check the cell type abundance diagnostic plots.

+
+

2.1.1 Interpretation of cell type abundance information

+

The first plot visualizes the number of cells per celltype-condition combination, and indicates which combinations are removed during the DE analysis because there are less than min_cells in the celltype-condition combination.

+
abundance_info$abund_plot_sample
+

+

The red dotted line indicates the required minimum of cells as defined above in min_cells. We can see here that all cell types are present in all conditions.

+
+
+

2.1.2 Cell type filtering based on cell type abundance information

+

In case this plot would indicate that not all cell types are present in all conditions: +running the following block of code can help you determine which cell types are condition-specific and which cell types are absent.

+
sample_group_celltype_df = abundance_info$abundance_data %>% 
+  filter(n > min_cells) %>% 
+  ungroup() %>% 
+  distinct(sample_id, group_id) %>% 
+  cross_join(
+    abundance_info$abundance_data %>% 
+      ungroup() %>% 
+      distinct(celltype_id)
+    ) %>% 
+  arrange(sample_id)
+
+abundance_df = sample_group_celltype_df %>% left_join(
+  abundance_info$abundance_data %>% ungroup()
+  )
+
+abundance_df$n[is.na(abundance_df$n)] = 0
+abundance_df$keep[is.na(abundance_df$keep)] = FALSE
+abundance_df_summarized = abundance_df %>% 
+  mutate(keep = as.logical(keep)) %>% 
+  group_by(group_id, celltype_id) %>% 
+  summarise(samples_present = sum((keep)))
+
+celltypes_absent_one_condition = abundance_df_summarized %>% 
+  filter(samples_present == 0) %>% pull(celltype_id) %>% unique() 
+# find truly condition-specific cell types by searching for cell types 
+# truely absent in at least one condition
+
+celltypes_present_one_condition = abundance_df_summarized %>% 
+  filter(samples_present >= 1) %>% pull(celltype_id) %>% unique() 
+# require presence in at least 1 samples  of one group so 
+# it is really present in at least one condition
+
+condition_specific_celltypes = intersect(
+  celltypes_absent_one_condition, 
+  celltypes_present_one_condition)
+
+total_nr_conditions = SummarizedExperiment::colData(sce)[,group_id] %>% 
+  unique() %>% length() 
+
+absent_celltypes = abundance_df_summarized %>% 
+  filter(samples_present < 1) %>% 
+  group_by(celltype_id) %>% 
+  count() %>% 
+  filter(n == total_nr_conditions) %>% 
+  pull(celltype_id)
+  
+print("condition-specific celltypes:")
+## [1] "condition-specific celltypes:"
+print(condition_specific_celltypes)
+## character(0)
+  
+print("absent celltypes:")
+## [1] "absent celltypes:"
+print(absent_celltypes)
+## character(0)
+

Absent cell types will be filtered out, condition-specific cell types can be filtered out if you as a user do not want to run the alternative workflow for condition-specific cell types at the end of the core MultiNicheNet analysis.

+
analyse_condition_specific_celltypes = FALSE
+
if(analyse_condition_specific_celltypes == TRUE){
+  senders_oi = senders_oi %>% setdiff(absent_celltypes)
+  receivers_oi = receivers_oi %>% setdiff(absent_celltypes)
+} else {
+  senders_oi = senders_oi %>% 
+    setdiff(union(absent_celltypes, condition_specific_celltypes))
+  receivers_oi = receivers_oi %>% 
+    setdiff(union(absent_celltypes, condition_specific_celltypes))
+}
+
+sce = sce[, SummarizedExperiment::colData(sce)[,celltype_id] %in% 
+            c(senders_oi, receivers_oi)
+          ]
+
+
+
+

2.2 Gene filtering: determine which genes are sufficiently expressed in each present cell type

+

Before running the DE analysis, we will determine which genes are not sufficiently expressed and should be filtered out. +We will perform gene filtering based on a similar procedure as used in edgeR::filterByExpr. However, we adapted this procedure to be more interpretable for single-cell datasets.

+

For each cell type, we will consider genes expressed if they are expressed in at least one condition. To do this, we need to set min_sample_prop = 1.

+
min_sample_prop = 1
+

But how do we define which genes are expressed in a condition? For this we will consider genes as expressed if they have non-zero expression values in a fraction_cutoff fraction of cells of that cell type in that condition By default, we set fraction_cutoff = 0.05, which means that genes should show non-zero expression values in at least 5% of cells in a condition

+
fraction_cutoff = 0.05
+

We recommend using these default values unless there is specific interest in prioritizing (very) weakly expressed interactions. In that case, you could lower the value of fraction_cutoff. We explicitly recommend against using fraction_cutoff > 0.10.

+

Now we will calculate the information required for gene filtering with the following command:

+
frq_list = get_frac_exprs_sampleAgnostic(
+  sce = sce, 
+  sample_id = sample_id, celltype_id = celltype_id, group_id = group_id, 
+  batches = batches, 
+  min_cells = min_cells, 
+  fraction_cutoff = fraction_cutoff, min_sample_prop = min_sample_prop)
+## # A tibble: 6 × 2
+##   MIS.C.AgeTier Annotation_v2.0       
+##   <chr>         <chr>                 
+## 1 M             L_T_TIM3._CD38._HLADR.
+## 2 M             L_T_TIM3._CD38._HLADR.
+## 3 M             L_T_TIM3._CD38._HLADR.
+## 4 M             L_T_TIM3._CD38._HLADR.
+## 5 M             L_T_TIM3._CD38._HLADR.
+## 6 M             L_T_TIM3._CD38._HLADR.
+##   MIS.C.AgeTier MIS.C.AgeTier        Annotation_v2.0
+## 1             M             M L_T_TIM3._CD38._HLADR.
+## 2             M             M L_T_TIM3._CD38._HLADR.
+## 3             M             M L_T_TIM3._CD38._HLADR.
+## 4             M             M L_T_TIM3._CD38._HLADR.
+## 5             M             M L_T_TIM3._CD38._HLADR.
+## 6             M             M L_T_TIM3._CD38._HLADR.
+##   sample group               celltype
+## 1      M     M L_T_TIM3._CD38._HLADR.
+## 2      M     M L_T_TIM3._CD38._HLADR.
+## 3      M     M L_T_TIM3._CD38._HLADR.
+## 4      M     M L_T_TIM3._CD38._HLADR.
+## 5      M     M L_T_TIM3._CD38._HLADR.
+## 6      M     M L_T_TIM3._CD38._HLADR.
+## # A tibble: 6 × 3
+##   sample group celltype              
+##   <chr>  <chr> <chr>                 
+## 1 M      M     L_T_TIM3._CD38._HLADR.
+## 2 M      M     L_T_TIM3._CD38._HLADR.
+## 3 M      M     L_T_TIM3._CD38._HLADR.
+## 4 M      M     L_T_TIM3._CD38._HLADR.
+## 5 M      M     L_T_TIM3._CD38._HLADR.
+## 6 M      M     L_T_TIM3._CD38._HLADR.
+## [1] "Groups are considered if they have more than 10 cells of the cell type of interest"
+## [1] "Genes with non-zero counts in at least 5% of cells of a cell type of interest in a particular group/condition will be considered as expressed in that group/condition"
+## [1] "Genes expressed in at least 1 group will considered as expressed in the cell type: L_NK_CD56._CD16."
+## [1] "Genes expressed in at least 1 group will considered as expressed in the cell type: L_T_TIM3._CD38._HLADR."
+## [1] "Genes expressed in at least 1 group will considered as expressed in the cell type: M_Monocyte_CD16"
+## [1] "5589 genes are considered as expressed in the cell type: L_NK_CD56._CD16."
+## [1] "7150 genes are considered as expressed in the cell type: L_T_TIM3._CD38._HLADR."
+## [1] "7794 genes are considered as expressed in the cell type: M_Monocyte_CD16"
+

Now only keep genes that are expressed by at least one cell type:

+
genes_oi = frq_list$expressed_df %>% 
+  filter(expressed == TRUE) %>% pull(gene) %>% unique() 
+sce = sce[genes_oi, ]
+
+
+

2.3 Pseudobulk expression calculation: determine and normalize per-sample pseudobulk expression levels for each expressed gene in each present cell type

+

After filtering out absent cell types and genes, we will continue the analysis by calculating the different prioritization criteria that we will use to prioritize cell-cell communication patterns.

+

First, we will determine and normalize per-condition pseudobulk expression levels for each expressed gene in each present cell type. The function process_abundance_expression_info will link this expression information for ligands of the sender cell types to the corresponding receptors of the receiver cell types. This will later on allow us to define the cell-type specicificy criteria for ligands and receptors.

+
abundance_expression_info = process_abundance_expression_info(
+  sce = sce, 
+  sample_id = group_id, group_id = group_id, celltype_id = celltype_id, 
+  min_cells = min_cells, 
+  senders_oi = senders_oi, receivers_oi = receivers_oi, 
+  lr_network = lr_network, 
+  batches = batches, 
+  frq_list = frq_list, 
+  abundance_info = abundance_info)
+

Normalized pseudobulk expression values per gene/celltype/condition can be inspected by:

+
abundance_expression_info$celltype_info$pb_df %>% head()
+## # A tibble: 6 × 4
+##   gene  sample pb_sample celltype              
+##   <chr> <chr>      <dbl> <fct>                 
+## 1 A1BG  A           3.97 L_T_TIM3._CD38._HLADR.
+## 2 AAAS  A           5.00 L_T_TIM3._CD38._HLADR.
+## 3 AAGAB A           4.77 L_T_TIM3._CD38._HLADR.
+## 4 AAK1  A           6.78 L_T_TIM3._CD38._HLADR.
+## 5 AAMDC A           4.11 L_T_TIM3._CD38._HLADR.
+## 6 AAMP  A           5.84 L_T_TIM3._CD38._HLADR.
+

Inspecting these values for ligand-receptor interactions can be done by:

+
abundance_expression_info$sender_receiver_info$pb_df %>% head()
+## # A tibble: 6 × 8
+##   sample sender                 receiver   ligand receptor pb_ligand pb_receptor
+##   <chr>  <chr>                  <chr>      <chr>  <chr>        <dbl>       <dbl>
+## 1 M      L_NK_CD56._CD16.       M_Monocyt… B2M    LILRB1        14.1       10.3 
+## 2 M      L_T_TIM3._CD38._HLADR. M_Monocyt… B2M    LILRB1        13.9       10.3 
+## 3 M      M_Monocyte_CD16        M_Monocyt… B2M    LILRB1        13.8       10.3 
+## 4 A      L_NK_CD56._CD16.       L_NK_CD56… B2M    KLRD1         14.4        9.73
+## 5 S      L_NK_CD56._CD16.       L_NK_CD56… B2M    KLRD1         14.4        9.55
+## 6 M      L_NK_CD56._CD16.       L_NK_CD56… B2M    KLRD1         14.1        9.62
+## # ℹ 1 more variable: ligand_receptor_pb_prod <dbl>
+
+
+

2.4 Differential expression (DE) analysis: determine which genes are differentially expressed

+

In this step, we will perform genome-wide differential expression analysis of receiver and sender cell types to define DE genes between the conditions of interest (as formalized by the contrasts_oi). Based on this analysis, we later can define the levels of differential expression of ligands in senders and receptors in receivers, and define the set of affected target genes in the receiver cell types (which will be used for the ligand activity analysis).

+

Because we don’t have several samples per condition, we cannot apply pseudobulking followed by EdgeR as done in the regular MultiNicheNet workflow. Instead, we will here perform a classic FindMarkers approach. This has as consequence that you cannot perform DE on multifactorial experimental designs. You can only compare one group vs other group(s).

+
DE_info = get_DE_info_sampleAgnostic(
+  sce = sce, 
+  group_id = group_id, celltype_id = celltype_id, 
+  contrasts_oi = contrasts_oi, 
+  expressed_df = frq_list$expressed_df, 
+  min_cells = min_cells, 
+  contrast_tbl = contrast_tbl)
+
+

2.4.1 Check DE results

+

Check DE output information in table with logFC and p-values for each gene-celltype-contrast:

+
celltype_de = DE_info$celltype_de_findmarkers
+
celltype_de %>% arrange(-logFC) %>% head()
+## # A tibble: 6 × 6
+##   gene     cluster_id             logFC     p_val     p_adj contrast 
+##   <chr>    <chr>                  <dbl>     <dbl>     <dbl> <chr>    
+## 1 TRBV11.2 L_T_TIM3._CD38._HLADR.  2.63 0         0         M-(S+A)/2
+## 2 MTRNR2L8 M_Monocyte_CD16         2.22 1.24e- 34 1.16e- 32 S-(M+A)/2
+## 3 IFI27    M_Monocyte_CD16         2.12 5.86e- 23 5.07e- 20 A-(S+M)/2
+## 4 NKG7     L_T_TIM3._CD38._HLADR.  2.02 3.16e- 87 2.51e- 84 M-(S+A)/2
+## 5 GZMB     L_T_TIM3._CD38._HLADR.  1.88 1.59e-122 3.80e-119 M-(S+A)/2
+## 6 GNLY     L_T_TIM3._CD38._HLADR.  1.67 6.38e- 50 2.07e- 47 M-(S+A)/2
+
+
+

2.4.2 Combine DE information for ligand-senders and receptors-receivers

+

To end this step, we will combine the DE information of senders and receivers by linking their ligands and receptors together based on the prior knowledge ligand-receptor network.

+
sender_receiver_de = combine_sender_receiver_de(
+  sender_de = celltype_de,
+  receiver_de = celltype_de,
+  senders_oi = senders_oi,
+  receivers_oi = receivers_oi,
+  lr_network = lr_network
+)
+
sender_receiver_de %>% head(20)
+## # A tibble: 20 × 12
+##    contrast  sender             receiver ligand receptor lfc_ligand lfc_receptor
+##    <chr>     <chr>              <chr>    <chr>  <chr>         <dbl>        <dbl>
+##  1 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.A  LILRB1        0.880       1.32  
+##  2 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… GZMB   MCL1          1.88        0.184 
+##  3 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… TIMP1  CD63          1.02        1.03  
+##  4 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… B2M    LILRB1        0.706       1.32  
+##  5 A-(S+M)/2 M_Monocyte_CD16    M_Monoc… TIMP1  CD63          1.05        0.958 
+##  6 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… HLA.C  LILRB1        0.673       1.32  
+##  7 M-(S+A)/2 L_T_TIM3._CD38._H… L_T_TIM… GZMB   MCL1          1.88        0.0908
+##  8 M-(S+A)/2 L_T_TIM3._CD38._H… L_NK_CD… GZMB   IGF2R         1.88        0.0769
+##  9 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… GZMB   IGF2R         1.88        0.0723
+## 10 M-(S+A)/2 L_T_TIM3._CD38._H… M_Monoc… HLA.A  LILRB1        0.628       1.32  
+## 11 M-(S+A)/2 L_T_TIM3._CD38._H… L_NK_CD… GZMB   MCL1          1.88        0.0618
+## 12 M-(S+A)/2 L_T_TIM3._CD38._H… L_T_TIM… GZMB   IGF2R         1.88        0.0384
+## 13 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.C  LILRB1        0.520       1.32  
+## 14 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.B  LILRB1        0.470       1.32  
+## 15 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… S100A9 CD68          1.11        0.675 
+## 16 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.A  LILRB2        0.880       0.897 
+## 17 M-(S+A)/2 M_Monocyte_CD16    L_T_TIM… S100A9 ITGB2         1.11        0.594 
+## 18 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.F  LILRB1        0.347       1.32  
+## 19 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… S100A8 CD68          0.986       0.675 
+## 20 M-(S+A)/2 M_Monocyte_CD16    M_Monoc… HLA.D… CD63          0.569       1.03  
+## # ℹ 5 more variables: ligand_receptor_lfc_avg <dbl>, p_val_ligand <dbl>,
+## #   p_adj_ligand <dbl>, p_val_receptor <dbl>, p_adj_receptor <dbl>
+
+
+
+

2.5 Ligand activity prediction: use the DE analysis output to predict the activity of ligands in receiver cell types and infer their potential target genes

+

In this step, we will predict NicheNet ligand activities and NicheNet ligand-target links based on these differential expression results. We do this to prioritize interactions based on their predicted effect on a receiver cell type. We will assume that the most important group-specific interactions are those that lead to group-specific gene expression changes in a receiver cell type.

+

Similarly to base NicheNet (https://github.com/saeyslab/nichenetr), we use the DE output to create a “geneset of interest”: here we assume that DE genes within a cell type may be DE because of differential cell-cell communication processes. In the ligand activity prediction, we will assess the enrichment of target genes of ligands within this geneset of interest. In case high-probabiliy target genes of a ligand are enriched in this set compared to the background of expressed genes, we predict that this ligand may have a high activity.

+

Because the ligand activity analysis is an enrichment procedure, it is important that this geneset of interest should contain a sufficient but not too large number of genes. The ratio geneset_oi/background should ideally be between 1/200 and 1/10 (or close to these ratios).

+

To determine the genesets of interest based on DE output, we need to define some logFC and/or p-value thresholds per cell type/contrast combination. In general, we recommend inspecting the nr. of DE genes for all cell types based on the default thresholds and adapting accordingly.

+
+

2.5.1 Assess geneset_oi-vs-background ratios for different DE output tresholds prior to the NicheNet ligand activity analysis

+

We will first inspect the geneset_oi-vs-background ratios for the default tresholds:

+
logFC_threshold = 0.25 # lower here for FindMarkers than for Pseudobulk-EdgeR
+p_val_threshold = 0.05
+
p_val_adj = TRUE 
+
geneset_assessment = contrast_tbl$contrast %>% 
+  lapply(
+    process_geneset_data, 
+    celltype_de, logFC_threshold, p_val_adj, p_val_threshold
+  ) %>% 
+  bind_rows() 
+geneset_assessment
+## # A tibble: 9 × 12
+##   cluster_id            n_background n_geneset_up n_geneset_down prop_geneset_up
+##   <chr>                        <int>        <int>          <int>           <dbl>
+## 1 L_NK_CD56._CD16.              5589          214             23         0.0383 
+## 2 L_T_TIM3._CD38._HLAD…         7150          151             30         0.0211 
+## 3 M_Monocyte_CD16               7794          435             36         0.0558 
+## 4 L_NK_CD56._CD16.              5589           75            145         0.0134 
+## 5 L_T_TIM3._CD38._HLAD…         7150           33            141         0.00462
+## 6 M_Monocyte_CD16               7794           81            352         0.0104 
+## 7 L_NK_CD56._CD16.              5589           49            144         0.00877
+## 8 L_T_TIM3._CD38._HLAD…         7150           40            125         0.00559
+## 9 M_Monocyte_CD16               7794           85            441         0.0109 
+## # ℹ 7 more variables: prop_geneset_down <dbl>, in_range_up <lgl>,
+## #   in_range_down <lgl>, contrast <chr>, logFC_threshold <dbl>,
+## #   p_val_threshold <dbl>, adjusted <lgl>
+

We can see here that for most cell type / contrast combinations, all geneset/background ratio’s are within the recommended range (in_range_up and in_range_down columns). When these geneset/background ratio’s would not be within the recommended ranges, we should interpret ligand activity results for these cell types with more caution, or use different thresholds (for these or all cell types). Here, a few celltype-contrast combination are not in the recommended range for up- and or-down genes but they are close (recommend ranges between 0.005 and 0.1).

+
+
+

2.5.2 Perform the ligand activity analysis and ligand-target inference

+

After the ligand activity prediction, we will also infer the predicted target genes of these ligands in each contrast. For this ligand-target inference procedure, we also need to select which top n of the predicted target genes will be considered (here: top 250 targets per ligand). This parameter will not affect the ligand activity predictions. It will only affect ligand-target visualizations and construction of the intercellular regulatory network during the downstream analysis. We recommend users to test other settings in case they would be interested in exploring fewer, but more confident target genes, or vice versa.

+
top_n_target = 250
+

The NicheNet ligand activity analysis can be run in parallel for each receiver cell type, by changing the number of cores as defined here. Using more cores will speed up the analysis at the cost of needing more memory. This is only recommended if you have many receiver cell types of interest.

+
verbose = TRUE
+cores_system = 8
+n.cores = min(cores_system, celltype_de$cluster_id %>% unique() %>% length()) 
+

Running the ligand activity prediction will take some time (the more cell types and contrasts, the more time)

+
ligand_activities_targets_DEgenes = suppressMessages(suppressWarnings(
+  get_ligand_activities_targets_DEgenes(
+    receiver_de = celltype_de,
+    receivers_oi = intersect(receivers_oi, celltype_de$cluster_id %>% unique()),
+    ligand_target_matrix = ligand_target_matrix,
+    logFC_threshold = logFC_threshold,
+    p_val_threshold = p_val_threshold,
+    p_val_adj = p_val_adj,
+    top_n_target = top_n_target,
+    verbose = verbose, 
+    n.cores = n.cores
+  )
+))
+

You can check the output of the ligand activity and ligand-target inference here:

+
ligand_activities_targets_DEgenes$ligand_activities %>% head(20)
+## # A tibble: 20 × 8
+## # Groups:   receiver, contrast [1]
+##    ligand activity contrast  target  ligand_target_weight receiver        
+##    <chr>     <dbl> <chr>     <chr>                  <dbl> <chr>           
+##  1 A2M      0.0279 A-(S+M)/2 ALOX5AP              0.00727 L_NK_CD56._CD16.
+##  2 A2M      0.0279 A-(S+M)/2 ANXA2                0.00643 L_NK_CD56._CD16.
+##  3 A2M      0.0279 A-(S+M)/2 AREG                 0.00638 L_NK_CD56._CD16.
+##  4 A2M      0.0279 A-(S+M)/2 BST2                 0.00662 L_NK_CD56._CD16.
+##  5 A2M      0.0279 A-(S+M)/2 CXCR4                0.00967 L_NK_CD56._CD16.
+##  6 A2M      0.0279 A-(S+M)/2 DDIT4                0.0114  L_NK_CD56._CD16.
+##  7 A2M      0.0279 A-(S+M)/2 IRF7                 0.00777 L_NK_CD56._CD16.
+##  8 A2M      0.0279 A-(S+M)/2 ISG20                0.00738 L_NK_CD56._CD16.
+##  9 A2M      0.0279 A-(S+M)/2 MT2A                 0.00639 L_NK_CD56._CD16.
+## 10 A2M      0.0279 A-(S+M)/2 TSC22D3              0.00661 L_NK_CD56._CD16.
+## 11 A2M      0.0279 A-(S+M)/2 TXNIP                0.00939 L_NK_CD56._CD16.
+## 12 A2M      0.0279 A-(S+M)/2 VIM                  0.00857 L_NK_CD56._CD16.
+## 13 A2M      0.0279 A-(S+M)/2 ZFP36                0.00732 L_NK_CD56._CD16.
+## 14 A2M      0.0279 A-(S+M)/2 ZFP36L2              0.00660 L_NK_CD56._CD16.
+## 15 AANAT    0.0239 A-(S+M)/2 ALOX5AP              0.00384 L_NK_CD56._CD16.
+## 16 AANAT    0.0239 A-(S+M)/2 AREG                 0.00379 L_NK_CD56._CD16.
+## 17 AANAT    0.0239 A-(S+M)/2 CXCR4                0.00589 L_NK_CD56._CD16.
+## 18 AANAT    0.0239 A-(S+M)/2 DDIT4                0.00659 L_NK_CD56._CD16.
+## 19 AANAT    0.0239 A-(S+M)/2 ISG20                0.00446 L_NK_CD56._CD16.
+## 20 AANAT    0.0239 A-(S+M)/2 TSC22D3              0.00370 L_NK_CD56._CD16.
+## # ℹ 2 more variables: direction_regulation <fct>, activity_scaled <dbl>
+
+
+
+

2.6 Prioritization: rank cell-cell communication patterns through multi-criteria prioritization

+

In the previous steps, we calculated expression, differential expression and NicheNet ligand activity. In the final step, we will now combine all calculated information to rank all sender-ligand—receiver-receptor pairs according to group/condition specificity. We will use the following criteria to prioritize ligand-receptor interactions:

+
    +
  • Upregulation of the ligand in a sender cell type and/or upregulation of the receptor in a receiver cell type - in the condition of interest.
  • +
  • Cell-type specific expression of the ligand in the sender cell type and receptor in the receiver cell type in the condition of interest (to mitigate the influence of upregulated but still relatively weakly expressed ligands/receptors).
  • +
  • Sufficiently high expression levels of ligand and receptor in many samples of the same group.
  • +
  • High NicheNet ligand activity, to further prioritize ligand-receptor pairs based on their predicted effect of the ligand-receptor interaction on the gene expression in the receiver cell type.
  • +
+

We will combine these prioritization criteria in a single aggregated prioritization score. In the default setting, we will weigh each of these criteria equally (scenario = "regular"). This setting is strongly recommended. However, we also provide some additional setting to accomodate different biological scenarios. The setting scenario = "lower_DE" halves the weight for DE criteria and doubles the weight for ligand activity. This is recommended in case your hypothesis is that the differential CCC patterns in your data are less likely to be driven by DE (eg in cases of differential migration into a niche). The setting scenario = "no_frac_LR_expr" ignores the criterion “Sufficiently high expression levels of ligand and receptor in many samples of the same group”. This may be interesting for users that have data with a limited number of samples and don’t want to penalize interactions if they are not sufficiently expressed in some samples.

+

Finally, we still need to make one choice. For NicheNet ligand activity we can choose to prioritize ligands that only induce upregulation of target genes (ligand_activity_down = FALSE) or can lead potentially lead to both up- and downregulation (ligand_activity_down = TRUE). The benefit of ligand_activity_down = FALSE is ease of interpretability: prioritized ligand-receptor pairs will be upregulated in the condition of interest, just like their target genes. ligand_activity_down = TRUE can be harder to interpret because target genes of some interactions may be upregulated in the other conditions compared to the condition of interest. This is harder to interpret, but may help to pick up interactions that can also repress gene expression.

+

Here we will choose for setting ligand_activity_down = FALSE and focus specifically on upregulating ligands.

+
ligand_activity_down = FALSE
+
sender_receiver_tbl = sender_receiver_de %>% distinct(sender, receiver)
+
+metadata_combined = SummarizedExperiment::colData(sce) %>% tibble::as_tibble()
+
+  if(!is.na(batches)){
+    grouping_tbl = metadata_combined[,c(group_id, batches)] %>% tibble::as_tibble() %>% dplyr::distinct()
+    colnames(grouping_tbl) = c("group",batches)
+    grouping_tbl = grouping_tbl %>% mutate(sample = group)
+    grouping_tbl = grouping_tbl %>% tibble::as_tibble()
+  } else {
+    grouping_tbl = metadata_combined[,c(group_id)] %>% tibble::as_tibble() %>% dplyr::distinct()
+    colnames(grouping_tbl) = c("group")
+    grouping_tbl = grouping_tbl %>% mutate(sample = group) %>% select(sample, group)
+    
+  }
+
+prioritization_tables = suppressMessages(generate_prioritization_tables(
+    sender_receiver_info = abundance_expression_info$sender_receiver_info,
+    sender_receiver_de = sender_receiver_de,
+    ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
+    contrast_tbl = contrast_tbl,
+    sender_receiver_tbl = sender_receiver_tbl,
+    grouping_tbl = grouping_tbl,
+    scenario = "regular", # all prioritization criteria will be weighted equally
+    fraction_cutoff = fraction_cutoff, 
+    abundance_data_receiver = abundance_expression_info$abundance_data_receiver,
+    abundance_data_sender = abundance_expression_info$abundance_data_sender,
+    ligand_activity_down = ligand_activity_down
+  ))
+

Check the output tables

+

First: group-based summary table

+
prioritization_tables$group_prioritization_tbl %>% head(20)
+## # A tibble: 20 × 18
+##    contrast  group sender          receiver ligand receptor lr_interaction id   
+##    <chr>     <chr> <chr>           <chr>    <chr>  <chr>    <chr>          <chr>
+##  1 A-(S+M)/2 A     L_NK_CD56._CD1… L_NK_CD… HLA.A  KIR3DL1  HLA.A_KIR3DL1  HLA.…
+##  2 A-(S+M)/2 A     L_NK_CD56._CD1… L_NK_CD… HLA.A  KLRD1    HLA.A_KLRD1    HLA.…
+##  3 M-(S+A)/2 M     L_T_TIM3._CD38… M_Monoc… IFNG   IFNGR1   IFNG_IFNGR1    IFNG…
+##  4 M-(S+A)/2 M     L_T_TIM3._CD38… M_Monoc… IFNG   IFNGR2   IFNG_IFNGR2    IFNG…
+##  5 M-(S+A)/2 M     L_NK_CD56._CD1… L_T_TIM… CCL4   CCR5     CCL4_CCR5      CCL4…
+##  6 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… TNF    LTBR     TNF_LTBR       TNF_…
+##  7 A-(S+M)/2 A     L_NK_CD56._CD1… L_NK_CD… HLA.A  KIR3DL2  HLA.A_KIR3DL2  HLA.…
+##  8 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.D… LAG3     HLA.DRB5_LAG3  HLA.…
+##  9 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.D… LAG3     HLA.DPB1_LAG3  HLA.…
+## 10 M-(S+A)/2 M     M_Monocyte_CD16 L_T_TIM… HLA.D… LAG3     HLA.DRA_LAG3   HLA.…
+## 11 M-(S+A)/2 M     M_Monocyte_CD16 M_Monoc… TIMP1  CD63     TIMP1_CD63     TIMP…
+## 12 A-(S+M)/2 A     L_NK_CD56._CD1… L_NK_CD… HLA.B  KIR3DL1  HLA.B_KIR3DL1  HLA.…
+## 13 A-(S+M)/2 A     L_NK_CD56._CD1… L_NK_CD… HLA.B  KLRD1    HLA.B_KLRD1    HLA.…
+## 14 M-(S+A)/2 M     L_NK_CD56._CD1… M_Monoc… CCL3   CCR1     CCL3_CCR1      CCL3…
+## 15 M-(S+A)/2 M     L_NK_CD56._CD1… M_Monoc… IFNG   IFNGR1   IFNG_IFNGR1    IFNG…
+## 16 M-(S+A)/2 M     L_NK_CD56._CD1… M_Monoc… IFNG   IFNGR2   IFNG_IFNGR2    IFNG…
+## 17 M-(S+A)/2 M     M_Monocyte_CD16 L_NK_CD… TYROBP KLRD1    TYROBP_KLRD1   TYRO…
+## 18 M-(S+A)/2 M     L_NK_CD56._CD1… M_Monoc… CD99   PILRA    CD99_PILRA     CD99…
+## 19 M-(S+A)/2 M     L_NK_CD56._CD1… L_T_TIM… CCL3   CCR5     CCL3_CCR5      CCL3…
+## 20 M-(S+A)/2 M     L_T_TIM3._CD38… M_Monoc… CD99   PILRA    CD99_PILRA     CD99…
+## # ℹ 10 more variables: scaled_lfc_ligand <dbl>,
+## #   scaled_p_val_ligand_adapted <dbl>, scaled_lfc_receptor <dbl>,
+## #   scaled_p_val_receptor_adapted <dbl>, max_scaled_activity <dbl>,
+## #   scaled_pb_ligand <dbl>, scaled_pb_receptor <dbl>,
+## #   fraction_expressing_ligand_receptor <dbl>, prioritization_score <dbl>,
+## #   top_group <chr>
+

This table gives the final prioritization score of each interaction, and the values of the individual prioritization criteria.

+

With this step, all required steps are finished. Now, we can optionally still run the following steps +* Prioritize communication patterns involving condition-specific cell types through an alternative prioritization scheme +However, this is not relevant for this dataset since there are no condition-specific cell types here.

+
+
+

2.7 Save all the output of MultiNicheNet

+

To avoid needing to redo the analysis later, we will here to save an output object that contains all information to perform all downstream analyses.

+
path = "./"
+
+multinichenet_output = list(
+    celltype_info = abundance_expression_info$celltype_info,
+    celltype_de = celltype_de,
+    sender_receiver_info = abundance_expression_info$sender_receiver_info,
+    sender_receiver_de =  sender_receiver_de,
+    ligand_activities_targets_DEgenes = ligand_activities_targets_DEgenes,
+    prioritization_tables = prioritization_tables,
+    grouping_tbl = grouping_tbl,
+    lr_target_prior_cor = tibble()
+  ) 
+multinichenet_output = make_lite_output(multinichenet_output)
+
+save = FALSE
+if(save == TRUE){
+  saveRDS(multinichenet_output, paste0(path, "multinichenet_output.rds"))
+
+}
+
+
+
+

3 Interpreting the MultiNicheNet analysis output

+
+

3.1 Visualization of differential cell-cell interactions

+
+

3.1.1 Summarizing ChordDiagram circos plots

+

In a first instance, we will look at the broad overview of prioritized interactions via condition-specific Chordiagram circos plots.

+

We will look here at the top 50 predictions across all contrasts, senders, and receivers of interest.

+
prioritized_tbl_oi_all = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  top_n = 50, 
+  rank_per_group = FALSE
+  )
+
prioritized_tbl_oi = 
+  multinichenet_output$prioritization_tables$group_prioritization_tbl %>%
+  filter(id %in% prioritized_tbl_oi_all$id) %>%
+  distinct(id, sender, receiver, ligand, receptor, group) %>% 
+  left_join(prioritized_tbl_oi_all)
+prioritized_tbl_oi$prioritization_score[is.na(prioritized_tbl_oi$prioritization_score)] = 0
+
+senders_receivers = union(prioritized_tbl_oi$sender %>% unique(), prioritized_tbl_oi$receiver %>% unique()) %>% sort()
+
+colors_sender = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
+colors_receiver = RColorBrewer::brewer.pal(n = length(senders_receivers), name = 'Spectral') %>% magrittr::set_names(senders_receivers)
+
+circos_list = make_circos_group_comparison(prioritized_tbl_oi, colors_sender, colors_receiver)
+

+
+
+

3.1.2 Interpretable bubble plots

+

Whereas these ChordDiagrams show the most specific interactions per group, they don’t give insights into the data behind these predictions. Therefore we will now look at visualizations that indicate the different prioritization criteria used in MultiNicheNet.

+

In the next type of plots, we will 1) visualize the per-sample scaled product of normalized ligand and receptor pseudobulk expression, 2) visualize the scaled ligand activities, 3) cell-type specificity.

+

We will now check the top 50 interactions specific for the MIS-C group

+
group_oi = "M"
+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  top_n = 50, 
+  groups_oi = group_oi)
+
plot_oi = make_sample_lr_prod_activity_plots(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50)
+plot_oi
+

+Samples that were left out of the DE analysis are indicated with a smaller dot (this helps to indicate the samples that did not contribute to the calculation of the logFC, and thus not contributed to the final prioritization)

+

As a further help for further prioritization, we can assess the level of curation of these LR pairs as defined by the Intercellular Communication part of the Omnipath database

+
prioritized_tbl_oi_M_50_omnipath = prioritized_tbl_oi_M_50 %>% 
+  inner_join(lr_network_all)
+

Now we add this to the bubble plot visualization:

+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50_omnipath)
+plot_oi
+

+

As you can see, the HEBP1-FPR2 interaction has no Omnipath DB scores. This is because this LR pair was not documented by the Omnipath LR database. Instead it was documented by the original NicheNet LR network (source: Guide2Pharmacology) as can be seen in the table.

+

Further note: Typically, there are way more than 50 differentially expressed and active ligand-receptor pairs per group across all sender-receiver combinations. Therefore it might be useful to zoom in on specific cell types as senders/receivers:

+

Eg M_Monocyte_CD16 as receiver:

+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi, 
+  receivers_oi = "M_Monocyte_CD16")
+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all))
+plot_oi
+

+

Eg M_Monocyte_CD16 as sender:

+
prioritized_tbl_oi_M_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi, 
+  senders_oi = "M_Monocyte_CD16")
+
plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_50 %>% inner_join(lr_network_all))
+plot_oi
+

+

You can make these plots also for the other groups, like we will illustrate now for the S group

+
group_oi = "S"
+
prioritized_tbl_oi_S_50 = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  groups_oi = group_oi)
+
+plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_S_50 %>% inner_join(lr_network_all))
+plot_oi
+

+
+
+ +
+

3.3 Intercellular regulatory network inference and visualization

+

In the plots before, we visualized target genes that are supported by prior knowledge to be downstream of ligand-receptor pairs. Interestingly, some target genes can be ligands or receptors themselves. This illustrates that cells can send signals to other cells, who as a response to these signals produce signals themselves to feedback to the original sender cells, or who will effect other cell types.

+

As last plot, we can generate a ‘systems’ view of these intercellular feedback and cascade processes than can be occuring between the different cell populations involved. In this plot, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein interactions! We will infer this intercellular regulatory network here for the top50 interactions.

+

Important In the default MultiNicheNet workflow for datasets with multiple samples per condition, we further filter out target genes based on expression correlation before generating this plot. However, this would not be meaningful here since we don’t have multiple samples. As a consequence, we will have many ligand-target links here in these plots, making the plots less readable if we would consider more than the top 50 or 100 interactions.

+
prioritized_tbl_oi = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  50, 
+  rank_per_group = FALSE)
+
lr_target_prior = prioritized_tbl_oi %>% inner_join(
+        multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>%
+          distinct(ligand, target, direction_regulation, contrast) %>% inner_join(contrast_tbl) %>% ungroup() 
+        ) 
+      
+
+lr_target_df = lr_target_prior %>% distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation) 
+
network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi)
+network$links %>% head()
+## # A tibble: 6 × 6
+##   sender_ligand          receiver_target direction_regulation group type  weight
+##   <chr>                  <chr>           <fct>                <chr> <chr>  <dbl>
+## 1 L_NK_CD56._CD16._HLA.A L_NK_CD56._CD1… up                   A     Liga…      1
+## 2 L_T_TIM3._CD38._HLADR… M_Monocyte_CD1… up                   M     Liga…      1
+## 3 L_T_TIM3._CD38._HLADR… M_Monocyte_CD1… up                   M     Liga…      1
+## 4 L_T_TIM3._CD38._HLADR… M_Monocyte_CD1… up                   M     Liga…      1
+## 5 L_T_TIM3._CD38._HLADR… M_Monocyte_CD1… up                   M     Liga…      1
+## 6 L_T_TIM3._CD38._HLADR… M_Monocyte_CD1… up                   M     Liga…      1
+network$nodes %>% head()
+## # A tibble: 6 × 4
+##   node                        celltype               gene  type_gene      
+##   <chr>                       <chr>                  <chr> <chr>          
+## 1 L_NK_CD56._CD16._CD99       L_NK_CD56._CD16.       CD99  ligand/receptor
+## 2 L_T_TIM3._CD38._HLADR._CD99 L_T_TIM3._CD38._HLADR. CD99  ligand/receptor
+## 3 L_NK_CD56._CD16._ITGB2      L_NK_CD56._CD16.       ITGB2 ligand/receptor
+## 4 L_NK_CD56._CD16._HLA.A      L_NK_CD56._CD16.       HLA.A ligand         
+## 5 L_T_TIM3._CD38._HLADR._IFNG L_T_TIM3._CD38._HLADR. IFNG  ligand         
+## 6 L_NK_CD56._CD16._CCL4       L_NK_CD56._CD16.       CCL4  ligand
+
colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the  original yellow background with white font is not very readable
+network_graph = visualize_network(network, colors_sender)
+network_graph$plot
+

+

If you want to know the exact set of top-ranked ligands that are potentially regulated by one ligand of interest, you can run following code:

+
ligand_oi = "IFNG"
+group_oi = "M"
+lr_target_df %>% filter(ligand == ligand_oi & group == group_oi) %>% pull(target) %>% unique() %>% intersect(lr_network$ligand)
+##  [1] "B2M"      "C1QB"     "CD55"     "CXCL16"   "HLA.A"    "HLA.B"   
+##  [7] "HLA.C"    "HLA.DPB1" "HLA.DQB1" "HLA.DRA"  "HLA.DRB1" "HLA.E"   
+## [13] "HLA.F"    "IFITM1"   "IL32"     "NAMPT"    "S100A4"   "S100A8"  
+## [19] "TNF"      "TNFSF10"
+

Interestingly, we can also use this network to further prioritize differential CCC interactions. Here we will assume that the most important LR interactions are the ones that are involved in this intercellular regulatory network. We can get these interactions as follows:

+
network$prioritized_lr_interactions
+## # A tibble: 50 × 5
+##    group sender                 receiver               ligand   receptor
+##    <chr> <chr>                  <chr>                  <chr>    <chr>   
+##  1 A     L_NK_CD56._CD16.       L_NK_CD56._CD16.       HLA.A    KIR3DL1 
+##  2 A     L_NK_CD56._CD16.       L_NK_CD56._CD16.       HLA.A    KLRD1   
+##  3 M     L_T_TIM3._CD38._HLADR. M_Monocyte_CD16        IFNG     IFNGR1  
+##  4 M     L_T_TIM3._CD38._HLADR. M_Monocyte_CD16        IFNG     IFNGR2  
+##  5 M     L_NK_CD56._CD16.       L_T_TIM3._CD38._HLADR. CCL4     CCR5    
+##  6 M     M_Monocyte_CD16        M_Monocyte_CD16        TNF      LTBR    
+##  7 A     L_NK_CD56._CD16.       L_NK_CD56._CD16.       HLA.A    KIR3DL2 
+##  8 M     M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. HLA.DRB5 LAG3    
+##  9 M     M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. HLA.DPB1 LAG3    
+## 10 M     M_Monocyte_CD16        L_T_TIM3._CD38._HLADR. HLA.DRA  LAG3    
+## # ℹ 40 more rows
+
prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join(
+  network$prioritized_lr_interactions)
+prioritized_tbl_oi_network
+## # A tibble: 50 × 8
+##    group sender              receiver ligand receptor id    prioritization_score
+##    <chr> <chr>               <chr>    <chr>  <chr>    <chr>                <dbl>
+##  1 A     L_NK_CD56._CD16.    L_NK_CD… HLA.A  KIR3DL1  HLA.…                0.958
+##  2 A     L_NK_CD56._CD16.    L_NK_CD… HLA.A  KLRD1    HLA.…                0.956
+##  3 M     L_T_TIM3._CD38._HL… M_Monoc… IFNG   IFNGR1   IFNG…                0.950
+##  4 M     L_T_TIM3._CD38._HL… M_Monoc… IFNG   IFNGR2   IFNG…                0.948
+##  5 M     L_NK_CD56._CD16.    L_T_TIM… CCL4   CCR5     CCL4…                0.940
+##  6 M     M_Monocyte_CD16     M_Monoc… TNF    LTBR     TNF_…                0.940
+##  7 A     L_NK_CD56._CD16.    L_NK_CD… HLA.A  KIR3DL2  HLA.…                0.936
+##  8 M     M_Monocyte_CD16     L_T_TIM… HLA.D… LAG3     HLA.…                0.936
+##  9 M     M_Monocyte_CD16     L_T_TIM… HLA.D… LAG3     HLA.…                0.935
+## 10 M     M_Monocyte_CD16     L_T_TIM… HLA.D… LAG3     HLA.…                0.935
+## # ℹ 40 more rows
+## # ℹ 1 more variable: prioritization_rank <dbl>
+

Visualize now the expression and activity of these interactions for the M group

+
group_oi = "M"
+
prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi)
+
+plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M %>% inner_join(lr_network_all)
+  )
+plot_oi
+

+
+
+ + + + +
+ + + + + + + + + + + + + + + + + + diff --git a/vignettes/basic_analysis_steps_MISC_files/.DS_Store b/vignettes/basic_analysis_steps_MISC_files/.DS_Store new file mode 100644 index 0000000..153a7d5 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/.DS_Store differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-12-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-12-1.png new file mode 100644 index 0000000..0022906 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-12-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-26-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-26-1.png new file mode 100644 index 0000000..e221761 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-26-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-1.png new file mode 100644 index 0000000..9454cb5 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-2.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-2.png new file mode 100644 index 0000000..c77a731 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-2.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-3.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-3.png new file mode 100644 index 0000000..1370fbf Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-3.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-4.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-4.png new file mode 100644 index 0000000..e69ef97 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-45-4.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-48-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-48-1.png new file mode 100644 index 0000000..520a8f6 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-48-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-50-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-50-1.png new file mode 100644 index 0000000..b397e95 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-50-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-52-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-52-1.png new file mode 100644 index 0000000..dce0371 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-52-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-54-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-54-1.png new file mode 100644 index 0000000..ea7cffc Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-54-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-56-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-56-1.png new file mode 100644 index 0000000..205ebcf Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-56-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-1.png new file mode 100644 index 0000000..ad79ff0 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-2.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-2.png new file mode 100644 index 0000000..c73054c Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-58-2.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-1.png new file mode 100644 index 0000000..5b8e182 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-2.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-2.png new file mode 100644 index 0000000..531c590 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-60-2.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-63-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-63-1.png new file mode 100644 index 0000000..15d3a9a Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-63-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-64-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-64-1.png new file mode 100644 index 0000000..4c57c2c Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-64-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-67-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-67-1.png new file mode 100644 index 0000000..de71d9e Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-67-1.png differ diff --git a/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-71-1.png b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-71-1.png new file mode 100644 index 0000000..b6f4694 Binary files /dev/null and b/vignettes/basic_analysis_steps_MISC_files/figure-markdown_strict/unnamed-chunk-71-1.png differ diff --git a/vignettes/multifactorial_analysis_BreastCancer.Rmd b/vignettes/multifactorial_analysis_BreastCancer.Rmd index 76fcafd..b54a4cb 100644 --- a/vignettes/multifactorial_analysis_BreastCancer.Rmd +++ b/vignettes/multifactorial_analysis_BreastCancer.Rmd @@ -1,17 +1,25 @@ --- title: "MultiNicheNet analysis: anti-PD1 Breast cancer multifactorial comparison - wrapper function" author: "Robin Browaeys" -date: "2023-06-06" -output: rmarkdown::html_vignette +package: "`r BiocStyle::pkg_ver('multinichenetr')`" +output: + BiocStyle::html_document + keep_md: + true + self_contained: + true +output_dir: "/Users/robinb/Work/multinichenetr/vignettes" vignette: > %\VignetteIndexEntry{MultiNicheNet analysis: anti-PD1 Breast cancer multifactorial comparison - wrapper function} %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} + %\VignetteEncoding{UTF-8} --- - + ```{r setup, include = FALSE} knitr::opts_chunk$set( @@ -20,6 +28,7 @@ knitr::opts_chunk$set( warning = FALSE, message = FALSE ) +library(BiocStyle) ``` In this vignette, you can learn how to perform a MultiNicheNet analysis comparing cell-cell communication between multiple groups of patients/conditions, from data with complex multifcatorial experimental designs. In this vignette, we start from one SingleCellExperiment object containing cells from both sender and receiver cell types and from different patients. @@ -147,22 +156,28 @@ min_cells = 10 For this analysis, we want to compare how cell-cell communication changes On-vs-Pre anti-PD1 therapy are different between responder/expander patients vs non-responder/expander patients. In other words, we want to study how both patient groups react differently to the therapy. -To do this comparison, we need to set the following contrasts: +To perform this comparison, we need to set the following contrasts: ```{r} -contrasts_oi = c("'(OnE-PreE)-(OnNE-PreNE)','(OnNE-PreNE)-(OnE-PreE)'") +contrasts_oi = c("'(OnE-PreE)-(OnNE-PreNE)','(OnNE-PreNE)-(OnE-PreE)','(PreE-OnE)-(PreNE-OnNE)','(PreNE-OnNE)-(PreE-OnE)'") ``` To understand this, let's take a look at the first contrasts of interest: `(OnE-PreE)-(OnNE-PreNE)`. As you can see, the first part of the expression: `(OnE-PreE)` will cover differences on-vs-pre therapy in the E group, the second part `(OnNE-PreNE)` in the NE group. By adding the minus sign, we can compare these differences between the E and NE group. -__Very Important__ Note the format to indicate the contrasts! This formatting should be adhered to very strictly, and white spaces are not allowed! Check `?get_DE_info` for explanation about how to define this well. The most important things are that: each contrast is surrounded by single quotation marks, contrasts are separated by a comma without any whitespace, and alle contrasts together are surrounded by double quotation marks. If you compare against two groups, you should divide by 2, if you compare against three groups, you should divide by 3 etcetera. See some of the other MultiNicheNet tutorials that demonstrate how to do this. +__Very Important__ Note the format to indicate the contrasts! This formatting should be adhered to very strictly, and white spaces are not allowed! Check `?get_DE_info` for explanation about how to define this well. The most important points are that: +*each contrast is surrounded by single quotation marks +*contrasts are separated by a comma without any white space +*all contrasts together are surrounded by double quotation marks. + +If you compare against two groups, you should divide by 2 (as demonstrated here), if you compare against three groups, you should divide by 3 and so on. -For downstream visualizations and linking contrasts to their main group, you need to run the following: +For downstream visualizations and linking contrasts to their main condition, we also need to run the following: +This is necessary because we will also calculate cell-type+condition specificity of ligands and receptors. ```{r} contrast_tbl = tibble(contrast = - c("(OnE-PreE)-(OnNE-PreNE)", "(OnNE-PreNE)-(OnE-PreE)"), - group = c("OnE","OnNE")) + c("(OnE-PreE)-(OnNE-PreNE)", "(OnNE-PreNE)-(OnE-PreE)", "(PreE-OnE)-(PreNE-OnNE)", "(PreNE-OnNE)-(PreE-OnE)"), + group = c("OnE","OnNE","PreE","PreNE")) ``` ### Define the parameters for the NicheNet ligand activity analysis diff --git a/vignettes/multifactorial_analysis_BreastCancer_SACL.Rmd b/vignettes/multifactorial_analysis_BreastCancer_SACL.Rmd index 65da377..c7aef62 100644 --- a/vignettes/multifactorial_analysis_BreastCancer_SACL.Rmd +++ b/vignettes/multifactorial_analysis_BreastCancer_SACL.Rmd @@ -904,4 +904,75 @@ combined_plot = make_ligand_activity_target_plot( combined_plot ``` +## Intercellular regulatory network inference and visualization + +In the plots before, we visualized target genes that are supported by prior knowledge to be downstream of ligand-receptor pairs. Interestingly, some target genes can be ligands or receptors themselves. This illustrates that cells can send signals to other cells, who as a response to these signals produce signals themselves to feedback to the original sender cells, or who will effect other cell types. + +As last plot, we can generate a 'systems' view of these intercellular feedback and cascade processes than can be occuring between the different cell populations involved. In this plot, we will draw links between ligands of sender cell types their ligand/receptor-annotated target genes in receiver cell types. So links are ligand-target links (= gene regulatory links) and not ligand-receptor protein-protein interactions! We will infer this intercellular regulatory network here for the top50 interactions. + +__Important__ In the default MultiNicheNet workflow for datasets with multiple samples per condition, we further filter out target genes based on expression correlation before generating this plot. However, this would not be meaningful here since we don't have multiple samples. As a consequence, we will have many ligand-target links here in these plots, making the plots less readable if we would consider more than the top 50 or 100 interactions. + +```{r} +prioritized_tbl_oi = get_top_n_lr_pairs( + multinichenet_output$prioritization_tables, + 50, + rank_per_group = FALSE) +``` + +```{r} +lr_target_prior = prioritized_tbl_oi %>% inner_join( + multinichenet_output$ligand_activities_targets_DEgenes$ligand_activities %>% + distinct(ligand, target, direction_regulation, contrast) %>% inner_join(contrast_tbl) %>% ungroup() + ) + + +lr_target_df = lr_target_prior %>% distinct(group, sender, receiver, ligand, receptor, id, target, direction_regulation) +``` + +```{r} +network = infer_intercellular_regulatory_network(lr_target_df, prioritized_tbl_oi) +network$links %>% head() +network$nodes %>% head() +``` + +```{r, fig.width=30, fig.height=12} +colors_sender["L_T_TIM3._CD38._HLADR."] = "pink" # the original yellow background with white font is not very readable +network_graph = visualize_network(network, colors_sender) +network_graph$plot +``` + +If you want to know the exact set of top-ranked ligands that are potentially regulated by one ligand of interest, you can run following code: + +```{r} +ligand_oi = "IFNG" +group_oi = "M" +lr_target_df %>% filter(ligand == ligand_oi & group == group_oi) %>% pull(target) %>% unique() %>% intersect(lr_network$ligand) +``` + +Interestingly, we can also use this network to further prioritize differential CCC interactions. Here we will assume that the most important LR interactions are the ones that are involved in this intercellular regulatory network. We can get these interactions as follows: + +```{r} +network$prioritized_lr_interactions +``` + +```{r, fig.width=30, fig.height=12} +prioritized_tbl_oi_network = prioritized_tbl_oi %>% inner_join( + network$prioritized_lr_interactions) +prioritized_tbl_oi_network +``` + +Visualize now the expression and activity of these interactions for the M group +```{r} +group_oi = "M" +``` + +```{r, fig.height=11, fig.width=16} +prioritized_tbl_oi_M = prioritized_tbl_oi_network %>% filter(group == group_oi) + +plot_oi = make_sample_lr_prod_activity_plots_Omnipath( + multinichenet_output$prioritization_tables, + prioritized_tbl_oi_M %>% inner_join(lr_network_all) + ) +plot_oi +```