Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to merge altExps #243

Merged
merged 35 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
04bdfcd
add comment for future us that this merge script is not about _mergin…
sjspielman Dec 5, 2023
b9b7625
lay the groundwork for new include_altexp argument, currently set to …
sjspielman Dec 6, 2023
abaa73c
implement functions for updating altExp with NA features where needed
sjspielman Dec 6, 2023
0e9e1c4
Update some variable names and comments, ensure altExp is prepared fo…
sjspielman Dec 6, 2023
a33229d
fix typo
sjspielman Dec 6, 2023
76d5ec2
lay some groundwork for forthcoming altexp merging tests, but no test…
sjspielman Dec 6, 2023
cf0cf82
For now, turn off altexp merging in the tests to get them passing
sjspielman Dec 6, 2023
382da05
clean up a comment
sjspielman Dec 6, 2023
4a44d65
batch ->sce_name argument
sjspielman Dec 6, 2023
ed2cb9a
no need to re data.frame this, since we're not adding rownames here
sjspielman Dec 6, 2023
5a60129
one more test spot to turn off altexps for now
sjspielman Dec 6, 2023
73cc8b4
Apply suggestions from code review
sjspielman Dec 7, 2023
bd21cec
One purrr syntax bump, and changes in response to review: update new …
sjspielman Dec 7, 2023
bfa6846
WIP towards updated altexps strategy
sjspielman Dec 7, 2023
8149ed6
Revert "WIP towards updated altexps strategy"
sjspielman Dec 11, 2023
3d70229
Fundamental merge altExp code overhaul: Build an empty matrix first a…
sjspielman Dec 11, 2023
c0e9da5
rename function merged_altexps -> create_merged_altexps to avoid over…
sjspielman Dec 11, 2023
f63c268
but make it singular
sjspielman Dec 11, 2023
5553543
run devtools::document
sjspielman Dec 11, 2023
66eff6b
Remove testing definition, and actually add the merged altExp into th…
sjspielman Dec 11, 2023
4689d94
Apply suggestions from code review
sjspielman Dec 11, 2023
bdaa9cc
Apply suggestions from code review
sjspielman Dec 11, 2023
87c7647
merged_colnames -> all_merged_barcodes
sjspielman Dec 11, 2023
5220754
union and comment
sjspielman Dec 11, 2023
4506626
respond to variable naming reviews and add missing pipe
sjspielman Dec 11, 2023
288d61b
run document
sjspielman Dec 11, 2023
4eea2bf
use regular map to avoid failures if there are no altexps; behavior e…
sjspielman Dec 13, 2023
6090faf
Use an actual number, no altexps in sce_list, and remove all include_…
sjspielman Dec 13, 2023
921b0bc
Update test names, no altexps. Add some comment structure for navigat…
sjspielman Dec 13, 2023
70655dc
Add a test for merging with altexps
sjspielman Dec 13, 2023
bc23a88
a bit of code rearrangement
sjspielman Dec 13, 2023
265f3a8
add test for include_altexp=FALSE when altexps are present
sjspielman Dec 13, 2023
c6afdeb
Apply suggestions from code review
sjspielman Dec 13, 2023
ce257ef
Update tests in response to review
sjspielman Dec 13, 2023
b16805d
Update tests/testthat/test-merge_sce_list.R
sjspielman Dec 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R/merge_altexp.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#' If the two experiments do not have the same set of cells, this will use the first
#' SCE as the base set, adding columns (cells) to the AltExp so that the matrices match on that dimension.
#' Columns (cells) that are not found in the base experiment are discarded.
#' Note that this function is not related to merging multiple SCE objects.
#'
#' @param sce a SingleCellExperiment object
#' @param alt_exp a second SummarizedExperiment to add as an alternative experiment
Expand Down
233 changes: 197 additions & 36 deletions R/merge_sce_list.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Merge a list of SCEs as preparation for formal integration
#' Merge a list of SCEs into one SCE object
#'
#' This function takes an optionally-named (if named, ideally by a form of
#' library ID) list of SingleCellExperiment (SCE) objects and merges them into
Expand Down Expand Up @@ -36,27 +36,32 @@
#' @param cell_id_column A character value giving the resulting colData column name
#' to hold unique cell IDs formatted as their original row name. Default
#' value is `cell_id`.
#' @param include_altexp Boolean for whether or not any present alternative experiments
#' should be included in the final merged object. Default is TRUE.
#'
#' @return A SingleCellExperiment object containing all SingleCellExperiment objects
#' present in the inputted list
#' @export
#'
#' @import SingleCellExperiment
merge_sce_list <- function(sce_list = list(),
batch_column = "library_id",
retain_coldata_cols = c(
"sum",
"detected",
"total",
"subsets_mito_sum",
"subsets_mito_detected",
"subsets_mito_percent",
"miQC_pass",
"prob_compromised",
"barcode"
),
preserve_rowdata_cols = NULL,
cell_id_column = "cell_id") {
merge_sce_list <- function(
sce_list = list(),
batch_column = "library_id",
retain_coldata_cols = c(
"sum",
"detected",
"total",
"subsets_mito_sum",
"subsets_mito_detected",
"subsets_mito_percent",
"miQC_pass",
"prob_compromised",
"barcodes"
),
preserve_rowdata_cols = NULL,
cell_id_column = "cell_id",
include_altexp = TRUE) {

# Check `sce_list`----------------------
if (is.null(names(sce_list))) {
warning(
Expand Down Expand Up @@ -94,9 +99,11 @@ merge_sce_list <- function(sce_list = list(),

# Second, determine all the column names that are present in any SCE so it can
# be created in any missing SCEs with `NA` values
all_colnames <- purrr::map(sce_list, ~ names(colData(.))) |>
all_colnames <- sce_list |>
purrr::map(
\(sce) names(colData(sce))
) |>
unlist() |>
unname() |>
unique()

# Check that the `retain_coldata_cols` are present in at least one SCE, and
Expand All @@ -116,16 +123,19 @@ merge_sce_list <- function(sce_list = list(),
stop("The metadata for each SCE object must contain `library_id` and `sample_id`.")
}

# Prepare SCEs
# Prepare main experiment of SCEs for merging --------------------
sce_list <- sce_list |>
purrr::imap(prepare_sce_for_merge,
purrr::imap(
prepare_sce_for_merge,
batch_column = batch_column,
cell_id_column = cell_id_column,
shared_features = shared_features,
retain_coldata_cols = retain_coldata_cols,
preserve_rowdata_cols = preserve_rowdata_cols
)


## Handle metadata ---------------------------------------------
# get a list of metadata from the list of sce objects
# each library becomes an element within the metadata components
metadata_list <- sce_list |>
Expand Down Expand Up @@ -156,12 +166,67 @@ merge_sce_list <- function(sce_list = list(),
metadata_list$sample_metadata <- sample_metadata
}

# Create the merged SCE from the processed list ------------------

## Handle altExps ------------------------------------------------------

# If we are including altExps, process them and save to list to add to merged SCE
merged_altexps <- list()
if (include_altexp) {

# First we need to determine the final column names of the merged_sce (not yet made)
# for use in altExp code. Later we'll apply this order to the merged_sce itself.
# These values are cell ids: `{sce_name}-{barcode}`
all_merged_barcodes <- sce_list |>
purrr::map(colnames) |>
purrr::reduce(union)

# Find all altExp names present in the SCE objects.
# We will prepare a merged altExp for each of these.
altexp_names <- sce_list |>
purrr::map(
\(sce) altExpNames(sce)
) |>
purrr::reduce(union)

for (altexp_name in altexp_names) {
# Determine which SCEs contain this altExp, and create list of those altExps
altexp_list <- sce_list |>
purrr::keep(\(sce) altexp_name %in% altExpNames(sce)) |>
purrr::map(altExp)

# Create and save the merged altExp for this altexp_name
merged_altexps[[altexp_name]] <- create_merged_altexp(
altexp_list,
all_merged_barcodes
)

}
}

# Remove altExps from SCEs prior to main experiment merge
# If none are present, this code has no effect.
sce_list <- sce_list |>
purrr::map(removeAltExps)

# Create the merged SCE from the processed list
merged_sce <- do.call(cbind, sce_list)

# replace existing metadata list with merged metadata
# Replace existing metadata list with merged metadata
metadata(merged_sce) <- metadata_list

# Add the merged altE into the main merged_sce
if (include_altexp) {

# Ensure compatible column names
# (this is probably not necessary but doesn't hurt...)
merged_sce <- merged_sce[,all_merged_barcodes]

# Add the merged altexps into the merged sce
for (altexp_name in names(merged_altexps)) {
altExp(merged_sce, altexp_name) <- merged_altexps[[altexp_name]]
}
}

return(merged_sce)
}

Expand All @@ -185,15 +250,14 @@ merge_sce_list <- function(sce_list = list(),
#' renamed
#'
#' @return An updated SCE that is prepared for merging
prepare_sce_for_merge <- function(sce,
sce_name,
batch_column,
cell_id_column,
shared_features,
retain_coldata_cols,
preserve_rowdata_cols) {
# Current functionality does not retain any present altExps
sce <- removeAltExps(sce)
prepare_sce_for_merge <- function(
sce,
sce_name,
batch_column,
cell_id_column,
shared_features,
retain_coldata_cols,
preserve_rowdata_cols) {

# Subset to shared features
sce <- sce[shared_features, ]
Expand All @@ -216,7 +280,10 @@ prepare_sce_for_merge <- function(sce,
observed_coldata_names <- names(colData(sce))

# Ensure all columns are present in all SCEs by adding `NA` columns as needed
missing_columns <- setdiff(retain_coldata_cols, observed_coldata_names)
missing_columns <- setdiff(
retain_coldata_cols,
observed_coldata_names
)
for (missing_col in missing_columns) {
# Create the missing column only if it should be retained
if (missing_col %in% retain_coldata_cols) {
Expand All @@ -237,24 +304,25 @@ prepare_sce_for_merge <- function(sce,
# Add `sce_name` to colnames so cell ids can be mapped to originating SCE
colnames(sce) <- glue::glue("{sce_name}-{colnames(sce)}")

# get metadata list
# get metadata list for updating it
metadata_list <- metadata(sce)

# first check that this library hasn't already been merged
if ("library_metadata" %in% names(metadata_list)) {
stop("This SCE object appears to be a merged object. We do not support merging objects with objects that have already been merged.")
}

# create library and sample metadata
# create library and sample metadata.
# library metadata will hold all the previous metadata fields, to avoid conflicts
library_metadata <- metadata_list[names(metadata_list) != "sample_metadata"]
sample_metadata <- metadata_list$sample_metadata

# combine into one list
metadata_list <- list(
library_id = metadata(sce)[["library_id"]],
sample_id = metadata(sce)[["sample_id"]],
library_metadata = library_metadata,
sample_metadata = sample_metadata
library_metadata = library_metadata, # this will be all previous metadata for the given library
sample_metadata = sample_metadata # this will be the same as the previous sample_metadata
)

# replace existing metadata
Expand All @@ -263,3 +331,96 @@ prepare_sce_for_merge <- function(sce,
# return the processed SCE
return(sce)
}



#' Prepare altExps for merge and create a list of merged altExps for each altExp name
#'
#'
#' @param altexp_list List of altexps to merge
#' @param all_merged_barcodes Vector of column names (`{sce_name}-{barcode}`) to include
#' in the final merged altExp. This vector includes _all_ SCEs, not only those
#' with this altExp name.
#'
#' @return A list of merged altExps to include the final merged SCE object
create_merged_altexp <- function(
altexp_list,
all_merged_barcodes) {

# Create vector of all features
# this order will be used for the final assay matrix/ces
altexp_features <- altexp_list |>
purrr::map(rownames) |>
purrr::reduce(union)

# Determine which assays are present for this altexp_name. We'll need a matrix
# for each of these in the final merged object
altexp_assay_names <- altexp_list |>
purrr::map(assayNames) |>
purrr::reduce(union)

# Create new merged assay matrices
new_assays <- altexp_assay_names |>
purrr::map(
build_new_altexp_assay,
altexp_list,
altexp_features,
all_merged_barcodes
)
names(new_assays) <- altexp_assay_names


# Create merged altExp
merged_altexp <- SingleCellExperiment(assays = new_assays)

# TODO: Add rowData and colData to merged_altexp

return(merged_altexp)

}




#' Build a new sparse matrix for merging altExps
#'
#' @param assay_name Name of assay of interest (e.g., "counts")
#' @param altexp_list List of altExps which should be included in the new matrix
#' @param all_merged_features Vector of matrix row names, corresponding to the full
#' set of features for this altExp
#' @param all_merged_barcodes Vector of matrix column names, corresponding to all cells
#' which will be in the final merged altExp
#'
#' @return Sparse matrix
build_new_altexp_assay <- function(
assay_name,
altexp_list,
all_merged_features,
all_merged_barcodes) {

# Establish new matrix with all NA values
new_matrix <- matrix(
data = NA,
nrow = length(all_merged_features),
ncol = length(all_merged_barcodes),
dimnames = list(
all_merged_features,
all_merged_barcodes
)
)

# Substitute existing assays into the matrix, if they exist
for (altexp in altexp_list) {
# Note that column names were already formatted as `{sce_name}-{barcode}` by
# the main SCE merging code
if (assay_name %in% assayNames(altexp)) {
# as.matrix() is needed here
new_matrix[rownames(altexp), colnames(altexp)] <- as.matrix( assay(altexp, assay_name) )
}
sjspielman marked this conversation as resolved.
Show resolved Hide resolved

}
# sparsify
new_matrix <- as(new_matrix, "CsparseMatrix")

return(new_matrix)
}
30 changes: 30 additions & 0 deletions man/build_new_altexp_assay.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions man/create_merged_altexp.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/merge_altexp.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading