-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update altexp merging #251
Conversation
… function checks for feature and assay compatibility and currently dies if either condition is not met. Function also returns a list of features & assays for each altexp_name for later use when preparing altexps
…mproved script readability/modularity, as the script is getting long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good!
I had some not huge suggestions, including a few extra tests.
The one bigger thought is about how we should handle the altexp_{name}_{stat}
columns in the merge. I think we want to retain them, but we need to add some kind of logic there to get the correct list. Building from altexp_attributes
this should not be too bad... something like:
tidyr::expand_grid(
name = names(altexp_attributes),
stat = c("sum", "detected", "percent")
) |>
glue::glue_data("altexp_{name}_{stat}")
I reserve the right to have more thoughts later, but I think we are on the right track.
tests/testthat/test-merge_sce_list.R
Outdated
expect_true( | ||
all(stringr::str_starts(colnames(rowData(prepared_altexp)), "test-")) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check the actual values here?
expect_true( | |
all(stringr::str_starts(colnames(rowData(prepared_altexp)), "test-")) | |
) | |
expect_equal( | |
colnames(rowData(prepared_altexp)) | |
paste0("test-", colnames(rowData(test_altexp)) | |
) |
But this does not check whether we are correctly handling preserve_rowdata_cols = "target_type"
, which we probably should... But this input doesn't have that rowdata column, so other things would have to change to check that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this needs a test! I'll add target_type
to the test object.
tests/testthat/test-merge_sce_list.R
Outdated
@@ -417,39 +498,61 @@ test_that("merging SCEs with different altExp features works as expected, with a | |||
preserve_rowdata_cols = c("gene_names") | |||
) | |||
|
|||
merged_altexp <- altExp(merged_sce) | |||
expect_true(altExpNames(merged_sce) == "adt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would also add a few more tests here that the altExp matrices are correct: Are the values from sce4
all NA? Are the values from other sces the same as what went in?
Maybe also check that the dimensions are what is expected?
🎉 I'll take it! |
…pdate sparse matrix test" This reverts commit d661d12.
Co-authored-by: Joshua Shapiro <[email protected]>
…ix. One test is now failing
I am totally stuck on a failing test that I cannot figure out after bc6f82e -
I've tracked down the specific error back to this part of the
But I can't figure out why this error is being thrown. I am able to run EDIT 😭 make it make sense...
|
Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break. Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right. |
💯 all around. I tried more and the "edit" I added in that comment above blew my mind, so I do need to actually take that break today...! |
reformatting from precommit
Use combineCols for merging SCEs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, assuming this is passing tests, because I am confident, I think we still want a few more tests for the missing altExp case, basically to match the complete altExp case.
We can go one step further to check that the filled in values are what we expected (NA) too.
tests/testthat/test-merge_sce_list.R
Outdated
# from cbind docs: | ||
# The colnames in colData(SummarizedExperiment) must match or an error is thrown. | ||
# Duplicate columns of rowData(SummarizedExperiment) must contain the same data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments should probably be updated.
Tests added here! 2f12c6c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM... just a few suggestions on the added tests.
Co-authored-by: Joshua Shapiro <[email protected]>
Noting e4a7399 is passing locally! |
Closes #150
We're getting close I think! Noting that I've set this to close out #150, since I think we should open further issues to tackle potential (likely?) next steps (e.g. #250 and/or doing more things with altExp rowdata/coldata/metadata).
First, some good news: It appears that we do not need to create dummy rowData or colData for any new altExps that get created, which is excellent. Notably, I've reached a point where I seriously need to refresh my eyeballs from this code, but I am 100% sure we would benefit from additional tests. Please let me know what test low-hanging-fruit I've missed! I know we still need a test for SCEs with multiple altexps, but wanted to get this out for review now since, as mentioned, my eyes need a break before writing more things!
check_altexps()
to check for altExp compatibility and die otherwise. This function also returns a list which contains the expected features and assays that need to go into any new altExps that get created, and gets used down the line. Seemed prudent to just save this information while I was checking altExps anyways.TODO
in there for a future-PR next step of subsetting assays rather than dying if any are missing.prepare_altexp_for_merge()
does the altExp preparation, and it calls two other functions -build_na_matrix()
(called as needed for any missing altExps)prepare_sce_for_merge()
. I added an argument to this functionis_altexp
with a default ofFALSE
. When we process altExps, this gets set toTRUE
. This argument controls whether sce colnames should get a name prefix added; we don't want to do this if altExp is true since we'd end up with duplicate name prefixes in the end. I made this argument more general in case there are other aspects of this function we want to skip for altExps.Edit - one additional note for review. You probably want to use
split
(notunified
) view, since git's detection of diffs is a yet another 🎢 .