Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update altexp merging #251

Merged
merged 32 commits into from
Dec 21, 2023
Merged

Update altexp merging #251

merged 32 commits into from
Dec 21, 2023

Conversation

sjspielman
Copy link
Member

@sjspielman sjspielman commented Dec 19, 2023

Closes #150

We're getting close I think! Noting that I've set this to close out #150, since I think we should open further issues to tackle potential (likely?) next steps (e.g. #250 and/or doing more things with altExp rowdata/coldata/metadata).

First, some good news: It appears that we do not need to create dummy rowData or colData for any new altExps that get created, which is excellent. Notably, I've reached a point where I seriously need to refresh my eyeballs from this code, but I am 100% sure we would benefit from additional tests. Please let me know what test low-hanging-fruit I've missed! I know we still need a test for SCEs with multiple altexps, but wanted to get this out for review now since, as mentioned, my eyes need a break before writing more things!

  • Note one recreational revision where I moved some metadata code into its own function because the script was getting a tad unwieldy.
  • I added a function check_altexps() to check for altExp compatibility and die otherwise. This function also returns a list which contains the expected features and assays that need to go into any new altExps that get created, and gets used down the line. Seemed prudent to just save this information while I was checking altExps anyways.
    • I left a TODO in there for a future-PR next step of subsetting assays rather than dying if any are missing.
  • The function prepare_altexp_for_merge() does the altExp preparation, and it calls two other functions -
    • build_na_matrix() (called as needed for any missing altExps)
    • prepare_sce_for_merge(). I added an argument to this function is_altexp with a default of FALSE. When we process altExps, this gets set to TRUE. This argument controls whether sce colnames should get a name prefix added; we don't want to do this if altExp is true since we'd end up with duplicate name prefixes in the end. I made this argument more general in case there are other aspects of this function we want to skip for altExps.

Edit - one additional note for review. You probably want to use split (not unified) view, since git's detection of diffs is a yet another 🎢 .

@sjspielman sjspielman requested a review from jashapiro December 19, 2023 17:40
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good!

I had some not huge suggestions, including a few extra tests.

The one bigger thought is about how we should handle the altexp_{name}_{stat} columns in the merge. I think we want to retain them, but we need to add some kind of logic there to get the correct list. Building from altexp_attributes this should not be too bad... something like:

tidyr::expand_grid(
  name = names(altexp_attributes), 
  stat = c("sum", "detected", "percent")
) |> 
  glue::glue_data("altexp_{name}_{stat}")

I reserve the right to have more thoughts later, but I think we are on the right track.

R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
R/merge_sce_list.R Outdated Show resolved Hide resolved
Comment on lines 406 to 408
expect_true(
all(stringr::str_starts(colnames(rowData(prepared_altexp)), "test-"))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not check the actual values here?

Suggested change
expect_true(
all(stringr::str_starts(colnames(rowData(prepared_altexp)), "test-"))
)
expect_equal(
colnames(rowData(prepared_altexp))
paste0("test-", colnames(rowData(test_altexp))
)

But this does not check whether we are correctly handling preserve_rowdata_cols = "target_type", which we probably should... But this input doesn't have that rowdata column, so other things would have to change to check that too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs a test! I'll add target_type to the test object.

tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
@@ -417,39 +498,61 @@ test_that("merging SCEs with different altExp features works as expected, with a
preserve_rowdata_cols = c("gene_names")
)

merged_altexp <- altExp(merged_sce)
expect_true(altExpNames(merged_sce) == "adt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would also add a few more tests here that the altExp matrices are correct: Are the values from sce4 all NA? Are the values from other sces the same as what went in?
Maybe also check that the dimensions are what is expected?

tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

I reserve the right to have more thoughts later, but I think we are on the right track.

🎉 I'll take it!

@sjspielman
Copy link
Member Author

sjspielman commented Dec 20, 2023

I am totally stuck on a failing test that I cannot figure out after bc6f82e -

── Error (test-merge_sce_list.R:563:3): merging SCEs where 1 altExp is missing works as expected, with altexps ──
Error in `value[[3L]](cond)`: failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
  failed to rbind column 'altExps' across DataFrame objects: failed to
  rbind column 'adt' across DataFrame objects: subscript contains invalid

I've tracked down the specific error back to this part of the combine code -

[tryCatch](https://rdrr.io/r/base/conditions.html)({
        int_cd <- [do.call](https://rdrr.io/r/base/do.call.html)([rbind](https://rdrr.io/r/base/cbind.html), [lapply](https://rdrr.io/r/base/lapply.html)([args](https://rdrr.io/r/base/args.html), [int_colData](https://rdrr.io/bioc/SingleCellExperiment/man/internals.html)))
    }, error=[function](https://rdrr.io/r/base/function.html)(err) {
        [stop](https://rdrr.io/r/base/stop.html)(
            "failed to combine 'int_colData' in 'cbind(<", [class](https://rdrr.io/r/base/class.html)([args](https://rdrr.io/r/base/args.html)[[](https://rdrr.io/r/base/Extract.html)[1]]), ">)':\n  ",
            [conditionMessage](https://rdrr.io/r/base/conditions.html)(err))
    })

But I can't figure out why this error is being thrown. I am able to run rbind() directly on the colData slots, and there aren't any columns here called "adt" or "altExps"... Time for me to take a little break from this code and circle back fresh. Any help spelunking to track down why this is being thrown is appreciated too.

EDIT 😭 make it make sense...
(sce_list[[4]] is the one created with NAs)

> do.call(cbind, list(sce_list[[1]], sce_list[[4]]))
class: SingleCellExperiment 
dim: 12 16 
metadata(8): library_id sample_id ... library_metadata sample_metadata
assays(2): counts logcounts
rownames(12): GENE0001 GENE0002 ... GENE0011 GENE0012
rowData names(3): gene_names sce1-other_column sce4-other_column
colnames(16): sce1-TCGACTTTTTTT sce1-GTTATTGAGCCG ... sce4-GTTATGGTCACT sce4-GGGAACCCTCCA
colData names(7): sum detected ... batch cell_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(1): adt
> do.call(cbind, list(sce_list[[1]], sce_list[[2]], sce_list[[4]]))
Error in value[[3L]](cond) : 
  failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
  failed to rbind column 'altExps' across DataFrame objects: failed to rbind column 'adt' across DataFrame objects:
  subscript contains invalid names

@jashapiro
Copy link
Member

Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break.

Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.

@sjspielman
Copy link
Member Author

Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break.
Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.

💯 all around. I tried more and the "edit" I added in that comment above blew my mind, so I do need to actually take that break today...!

reformatting from precommit
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, assuming this is passing tests, because I am confident, I think we still want a few more tests for the missing altExp case, basically to match the complete altExp case.

We can go one step further to check that the filled in values are what we expected (NA) too.

tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
Comment on lines 517 to 519
# from cbind docs:
# The colnames in colData(SummarizedExperiment) must match or an error is thrown.
# Duplicate columns of rowData(SummarizedExperiment) must contain the same data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments should probably be updated.

tests/testthat/test-merge_sce_list.R Show resolved Hide resolved
@sjspielman
Copy link
Member Author

Tests added here! 2f12c6c

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM... just a few suggestions on the added tests.

tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
tests/testthat/test-merge_sce_list.R Outdated Show resolved Hide resolved
@sjspielman
Copy link
Member Author

Noting e4a7399 is passing locally!

@sjspielman sjspielman merged commit dbe3e82 into main Dec 21, 2023
2 checks passed
@sjspielman sjspielman deleted the sjspielman/150-next-steps branch December 21, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Account for altExps when merging SCEs
2 participants