-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initiate "guide" notebook for validating cell type assignments #1001
Merged
allyhawkins
merged 9 commits into
AlexsLemonade:main
from
allyhawkins:allyhawkins/initiate-final-annotation-template
Jan 27, 2025
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
792e3b3
setup function for data frame prep
allyhawkins 4b6581e
initiate "guide" notebook
allyhawkins 65d4bdd
add to readme
allyhawkins a5154a5
Apply suggestions from code review
allyhawkins ae48004
remove instructions from readme
allyhawkins 2fe5827
Merge remote-tracking branch 'AlexsLemonade/main' into allyhawkins/in…
allyhawkins 44f5692
note about using leiden
allyhawkins c789033
some minor review responses
allyhawkins 5fec540
Update instructions
allyhawkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
198 changes: 198 additions & 0 deletions
198
analyses/cell-type-ewings/template_notebooks/celltype-exploration.Rmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
--- | ||
title: "Template notebook for validating cell type assignments for an individual library in SCPCP000015" | ||
author: Ally Hawkins | ||
date: "`r Sys.Date()`" | ||
output: | ||
html_document: | ||
toc: true | ||
toc_depth: 3 | ||
code_folding: "hide" | ||
params: | ||
sample_id: "SCPCS000490" | ||
library_id: "SCPCL000822" | ||
cluster_nn: 20 | ||
cluster_res: 0.5 | ||
--- | ||
|
||
This notebook is meant to be a guide for compiling "final" cell type annotations for an individual library in `SCPCP000015`. | ||
Results from `aucell-singler-annotation.sh`, `evaulate-clusters.sh`, and `run-aucell-ews-signatures.sh` are all combined and used to assign cell type annotations on a case by case basis. | ||
|
||
Instructions for using this guide: | ||
|
||
1. Ensure that you have a local copy of the results from `aucell-singler-annotation.sh`, `evaluate-clusters.sh` and `run-aucell-ews-signatures.sh` saved to `results`. | ||
2. Copy the contents of this notebook to a new notebook titled `<library_id>_celltype-exploration.Rmd` and save in `exploratory_analysis/final_annotation_notebooks`. | ||
3. Update the `title` in the `yaml` section and replace the `sample_id` and `library_id` with the correct IDs in the `params` list. | ||
4. Optionally, you may choose to update the choices for clustering based on the results from `evaluate-clusters.sh`. | ||
All clusters used will be calculated with the Leiden algorithm and the modularity objective function. | ||
To modify the nearest neighbors (default: 20) and resolution (default: 0.5) chosen use the `cluster_nn` and `cluster_res` params. | ||
5. Run through the notebook and update any sections of the notebook marked with the `{.manual-exploration}` tag. | ||
6. Render the completed notebook which will produce the rendered `html` file and a TSV with cell type annotations for that library. | ||
|
||
## Setup | ||
|
||
```{r packages} | ||
suppressPackageStartupMessages({ | ||
# load required packages | ||
library(SingleCellExperiment) | ||
library(ggplot2) | ||
}) | ||
|
||
# Set default ggplot theme | ||
theme_set( | ||
theme_classic() | ||
) | ||
|
||
# set seed | ||
set.seed(2024) | ||
``` | ||
|
||
|
||
```{r base paths} | ||
# The base path for the OpenScPCA repository, found by its (hidden) .git directory | ||
repository_base <- rprojroot::find_root(rprojroot::is_git_root) | ||
|
||
# The current data directory, found within the repository base directory | ||
data_dir <- file.path(repository_base, "data", "current", "SCPCP000015") | ||
|
||
# The path to this module | ||
module_base <- file.path(repository_base, "analyses", "cell-type-ewings") | ||
``` | ||
|
||
```{r} | ||
# path to sce | ||
sce_file <- file.path(data_dir, params$sample_id, glue::glue("{params$library_id}_processed.rds")) | ||
|
||
# path to workflow results | ||
workflow_results_dir <- file.path(module_base, "results") | ||
|
||
singler_results_dir <- file.path(workflow_results_dir, "aucell_singler_annotation", params$sample_id) | ||
singler_results_file <- file.path(singler_results_dir, | ||
glue::glue("{params$library_id}_singler-classifications.tsv")) | ||
|
||
cluster_results_dir <- file.path(workflow_results_dir, "clustering", params$sample_id) | ||
cluster_results_file <- file.path(cluster_results_dir, | ||
glue::glue("{params$library_id}_cluster-results.tsv")) | ||
|
||
aucell_results_dir <- file.path(workflow_results_dir, "aucell-ews-signatures", params$sample_id) | ||
aucell_results_file <- file.path(aucell_results_dir, | ||
glue::glue("{params$library_id}_auc-ews-gene-signatures.tsv")) | ||
|
||
# small gene sets | ||
visser_marker_genes_file <- file.path(module_base, "references", "visser-all-marker-genes.tsv") | ||
cell_state_genes_file <- file.path(module_base, "references", "tumor-cell-state-markers.tsv") | ||
``` | ||
|
||
```{r} | ||
# output file to save final annotations | ||
results_dir <- file.path(module_base, "results", "final-annotations") | ||
output_file <- file.path(results_dir, glue::glue("{params$library_id}_celltype-annotations.tsv")) | ||
``` | ||
|
||
|
||
```{r} | ||
# source in setup functions prep_results() | ||
setup_functions <- file.path(module_base, "template_notebooks", "utils", "setup-functions.R") | ||
source(setup_functions) | ||
|
||
# source in validation functions calculate_mean_markers() | ||
validation_functions <- file.path(module_base, "scripts", "utils", "tumor-validation-helpers.R") | ||
source(validation_functions) | ||
``` | ||
|
||
```{r} | ||
stopifnot( | ||
"sce file does not exist" = file.exists(sce_file), | ||
"singler results file does not exist" = file.exists(singler_results_file), | ||
"cluster results file does not exist" = file.exists(cluster_results_file), | ||
"aucell results file does not exist" = file.exists(aucell_results_file) | ||
) | ||
``` | ||
|
||
|
||
```{r, message=FALSE} | ||
# read in sce | ||
sce <- readr::read_rds(sce_file) | ||
|
||
# read in workflow results | ||
singler_df <- readr::read_tsv(singler_results_file) | ||
cluster_df <- readr::read_tsv(cluster_results_file) | ||
aucell_df <- readr::read_tsv(aucell_results_file) | ||
|
||
# read in marker genes and combine into one list | ||
visser_markers_df <- readr::read_tsv(visser_marker_genes_file) |> | ||
dplyr::select(cell_type, ensembl_gene_id) |> | ||
unique() | ||
|
||
cell_state_markers_df <- readr::read_tsv(cell_state_genes_file) |> | ||
dplyr::select(cell_type = cell_state, ensembl_gene_id) | ||
|
||
all_markers_df <- dplyr::bind_rows(list(visser_markers_df, cell_state_markers_df)) | ||
``` | ||
|
||
## Prepare data for plotting | ||
|
||
```{r} | ||
all_results_df <- prep_results( | ||
sce, | ||
singler_df, | ||
cluster_df, | ||
aucell_df, | ||
cluster_nn = params$cluster_nn, | ||
cluster_res = params$cluster_res | ||
) | ||
|
||
cell_types <- unique(all_markers_df$cell_type) | ||
|
||
# get the mean expression of all genes for each cell state | ||
gene_exp_df <- cell_types |> | ||
purrr::map(\(type){ | ||
calculate_mean_markers(all_markers_df, sce, type, cell_type) | ||
}) |> | ||
purrr::reduce(dplyr::inner_join, by = "barcodes") | ||
|
||
all_info_df <- all_results_df |> | ||
dplyr::left_join(gene_exp_df, by = "barcodes") | ||
``` | ||
|
||
## Summary of workflow results | ||
|
||
TODO: Insert plots that will summarize findings from each of the workflows | ||
- UMAPs of SingleR, clusters, AUC values and custom gene set means | ||
- Density plots by cluster of AUC values and custom gene set means | ||
- Maybe heatmaps with cluster annotation of AUC scores and custom gene set means | ||
|
||
## Re-cluster tumor cells {.manual-exploration} | ||
|
||
<!-- Use this section to label tumor cells based on the above findings. | ||
Any cells that are labeled as tumor will then be re-clustered and plots showing only tumor cells | ||
can be created to identify tumor cell states --> | ||
|
||
TODO: Functions for re-clustering tumor cells | ||
Show the same plots across the tumor only clusters and assign tumor cell states to each cluster | ||
|
||
## Additional exploration {.manual-exploration} | ||
|
||
<!--This section is for any additional exploration that may be needed to finalize annotations for this library. | ||
If not using, please delete this section. | ||
For example, here you may want to dive into the normal cell types and make adjustments as needed | ||
--> | ||
|
||
## Validate final tumor and normal annotation {.manual-exploration} | ||
|
||
<!-- This section should be used to update the assignments --> | ||
|
||
TODO: Insert plots that will be useful for validation (UMAPs, heatmaps, density plots) | ||
|
||
## Prepare annotations {.manual-exploration} | ||
|
||
<!-- This section should be used to create the final.final table with cell type annotations for export --> | ||
|
||
TODO: Code and instructions for exporting such as what columns should be named. | ||
|
||
## Session info | ||
|
||
```{r session info} | ||
# record the versions of the packages used in this analysis and other environment information | ||
sessionInfo() | ||
``` | ||
|
80 changes: 80 additions & 0 deletions
80
analyses/cell-type-ewings/template_notebooks/utils/setup-functions.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# These functions are used in `celltype-exploration.Rmd` | ||
# They are used for reading in and setting up the cell type results | ||
|
||
#' Combine workflow results into a single data frame | ||
#' | ||
#' Note that this function will only include clustering results from Leiden with modularity in the output | ||
#' | ||
#' @param sce Processed SingleCellExperiment object with UMAP embeddings | ||
#' @param singler_df Data frame with results from `aucell-singler-annotation.sh` workflow | ||
#' @param cluster_df Data frame with results from `evaluate-clusters.sh` workflow | ||
#' @param aucell_df Data frame with results from `run-aucell-ews-signatures.sh` workflow | ||
#' @param cluster_nn Value of nearest neighbors to use for cluster results. Default is 20. | ||
#' @param cluster_res Value of resolution to use for cluster results. Default is 20. | ||
#' | ||
prep_results <- function( | ||
sce, | ||
singler_df, | ||
cluster_df, | ||
aucell_df, | ||
cluster_nn = 20, | ||
cluster_res = 0.5 | ||
) { | ||
|
||
## grab UMAP | ||
umap_df <- sce |> | ||
scuttle::makePerCellDF(use.dimred = "UMAP") |> | ||
# replace UMAP.1 with UMAP1 and get rid of excess columns | ||
dplyr::select(barcodes, UMAP1 = UMAP.1, UMAP2 = UMAP.2) | ||
|
||
## prep singler data | ||
singler_df <- singler_df |> | ||
dplyr::mutate( | ||
# first grab anything that is tumor and label it tumor | ||
# NA should be unknown | ||
singler_annotation = dplyr::case_when( | ||
stringr::str_detect(singler_annotation, "tumor") ~ "tumor", | ||
is.na(singler_annotation) ~ "unknown", # make sure to separate out unknown labels | ||
.default = singler_annotation | ||
) |> | ||
forcats::fct_relevel("tumor", after = 0), | ||
# get the top cell types for plotting later | ||
singler_lumped = singler_annotation |> | ||
forcats::fct_lump_n(7, other_level = "All remaining cell types", ties.method = "first") |> | ||
forcats::fct_infreq() |> | ||
forcats::fct_relevel("All remaining cell types", after = Inf) | ||
) | ||
|
||
## prep cluster data | ||
cluster_df <- cluster_df |> | ||
# filter to the clustering results we want to use | ||
dplyr::filter( | ||
cluster_method == "leiden_mod", | ||
nn == cluster_nn, | ||
resolution == cluster_res | ||
) |> | ||
dplyr::select( | ||
barcodes = cell_id, | ||
cluster | ||
) | ||
|
||
## prep aucell | ||
aucell_wide_df <- aucell_df |> | ||
dplyr::mutate( | ||
assignment = auc > auc_threshold | ||
) |> | ||
tidyr::pivot_wider( | ||
id_cols = "barcodes", | ||
names_from = "gene_set", | ||
values_from = c(auc, assignment) | ||
) | ||
|
||
## combine into one data frame | ||
all_results_df <- umap_df |> | ||
dplyr::left_join(singler_df, by = c("barcodes")) |> | ||
dplyr::left_join(cluster_df, by = c("barcodes")) |> | ||
dplyr::left_join(aucell_wide_df, by = c("barcodes")) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a check would be worthwhile here before returning.. maybe check the column names are as expected? |
||
return(all_results_df) | ||
|
||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is documented for the notebook, but not specifically for the function that it's only going to consider Leiden with modularity. I'd add into the function docs somewhere.