Add calculate_cell_cluster_metrics() function #23

cansavvy · 2024-12-18T14:58:19Z

Background

This PR is for #10. Tried to follow the context there. But since this is my first PR on this project I might be missing some insights so Im just posting this as a draft first so that someone can check that I'm in the generally right direction.

Summary

This is a function that takes output from sweep_clusters() and runs evals on it. It can run calculate_silhouette() and/or calculate_purity() on all elements of the list of data frames that are outputed from sweep_clusters().

An additional very nit picky thing I have in here but I was very minorly thrown off by the examples in sweep_clusters() being named cluster_df when they are actually lists of data frames and not data frames out right. If you don't like this change no worries. The documentation itself is very clear but I'm just a person who kinda goes straight for the examples first.

Requested feedback

Am I understanding this right? It doesn't seem like this function will be that useful but I trust ya'll have more context and knowledge about the needs of the project than I who just started looking at this stuff last week lol.

Side side question that is also very minor, can we name the data frames in the list according to the combo of their parameters or do we find that would be too clunky? I like names in my lists but this is maybe a cansavvy quirk we don't need to subject everyone else to.

for more information, see https://pre-commit.ci

cansavvy · 2024-12-18T14:59:38Z

R/evaluate-clusters.R

+#'   The cell id column's values should match either the PC matrix row names, or the
+#'   SingleCellExperiment/Seurat object cell ids. Typically this data frame will be
+#'   output from the `rOpenScPCA::calculate_clusters()` function.
+#' @param ... Additional argument are passed on to the respective `calculate_purity()` and


I recognize that I didn't look to closely into these arguments and if someone wanted to specify a different argument for purity versus silhouette they would not be able to do so this way because all arguments get pass to both functions. If we think this will be a common use case I can go back and adjust.

Since these arguments are passed to different functions with different expectations, these are likely to conflict. I think it is probably better for this convenience function not to use .... But I don't know that we need to pass further options; if something more complex is needed, the user can run purrr::map on their own.

cansavvy · 2024-12-18T15:00:10Z

R/evaluate-clusters.R

+#'
+#' set.seed(2024)
+#'
+#' sce_object <- splatter::simpleSimulate(nGenes = 1000, verbose = FALSE) |>


I added these steps just because I wanted to illustrating how I was testing this but if this is too much detail for this example we can trim this down.

I think it's probably too much detail just in the sense that a novice might look at this and say, "oh no, do i need splatter?"

I would simplify to by assuming an sce_object variable is already known/exists. Consistent with other evaluation function examples, you don't need to pull out the PCA either; just pass in the object directly. Let's have the example therefore just "run" (i.e., keep the \dontrun{} construct!) sweep_clusters() and calculate_cell_cluster_metrics()

Is there an example sce_object that already exists I can pull from? If so how do I call it?

Since these examples are not run, it is fine to just "assume" an sce_object exists for this section, and you can start from the sweep_clusters() step (skipping the prep).

R/evaluate-clusters.R

cansavvy · 2024-12-18T15:18:33Z

Still left to do here:

Polishing docs (I copied and pasted and edited but IDK) -- did this recently
Writing tests - This is in Tests for calculate_cell_cluster_metrics() function #29
Incorporating any feedback - Did this in my most recent commits

jashapiro

Thanks for this contribution! My main comment here (aside from my earlier misread) is that I still wonder if we want this function to work on a list, rather than just on a single clustering data frame. The convenience of calculating all the metrics at once makes sense to me, but if we have a function that evaluates a single data frame, then turning that into evaluating the list is a very simple addition of a purrr wrapper, and I think that seems more transparent. But others may disagree!

R/evaluate-clusters.R

jashapiro · 2024-12-18T15:16:01Z

R/evaluate-clusters.R

+#'   The cell id column's values should match either the PC matrix row names, or the
+#'   SingleCellExperiment/Seurat object cell ids. Typically this data frame will be
+#'   output from the `rOpenScPCA::calculate_clusters()` function.
+#' @param ... Additional argument are passed on to the respective `calculate_purity()` and


Since these arguments are passed to different functions with different expectations, these are likely to conflict. I think it is probably better for this convenience function not to use .... But I don't know that we need to pass further options; if something more complex is needed, the user can run purrr::map on their own.

sjspielman

Thanks for starting this!! I left some initial comments here, but before I look more, I have a thought about the use cases for this function - currently it's written to only run on a sweep list, but it makes sense to me to also make this flexible enough to run on single data frame (e.g., not a list of data frames). This means updates related to the input argument sweep_list:

First, give it a more flexible name... Maybe something like cluster_results? I don't love it, but I'm not sure of how else to communicate that it might be either a list of dfs or df, so really I don't hate it either!
Second, add a check if it's a data frame and if so, make it a list of length 1 with the given data frame in it. So the code might look like:

if (data frame) { listify it}
else { run the existing stopifnot checks}

I suppose we'd need another check to determine whether to return the list or the first index from the eval'd df, too, so you might actually structure this code by first defining an is_df variable or so, and using that for both checks (the opening sanity check to transform it into a list to play nicely with purrr, and the final check for what to return.

sjspielman · 2024-12-18T15:40:12Z

R/evaluate-clusters.R

+#'
+#' This wrapper function can be used to evaluate clusters calculated using `sweep_clusters()` function.
+#' Input should be be a list of data frames with the resulting clusters from all parameter combinations provided to
+#' the `sweep_clusters()` function. Output


"Output"...? (missing rest of sentence)

R/evaluate-clusters.R

sjspielman · 2024-12-18T15:43:22Z

R/evaluate-clusters.R

+#'
+calculate_cell_cluster_metrics <- function(x,
+                                           sweep_list,
+                                           evals = c("purity", "silhouette"),


To match the function title more closely (and you'll want to replace this variable name when used in the function, too):

Suggested change

evals = c("purity", "silhouette"),

metric = c("purity", "silhouette"),

R/evaluate-clusters.R

sjspielman · 2024-12-18T15:56:09Z

R/evaluate-clusters.R

+#'
+#' set.seed(2024)
+#'
+#' sce_object <- splatter::simpleSimulate(nGenes = 1000, verbose = FALSE) |>


I think it's probably too much detail just in the sense that a novice might look at this and say, "oh no, do i need splatter?"

I would simplify to by assuming an sce_object variable is already known/exists. Consistent with other evaluation function examples, you don't need to pull out the PCA either; just pass in the object directly. Let's have the example therefore just "run" (i.e., keep the \dontrun{} construct!) sweep_clusters() and calculate_cell_cluster_metrics()

Co-authored-by: Joshua Shapiro <[email protected]>

Co-authored-by: Stephanie Spielman <[email protected]>

for more information, see https://pre-commit.ci

…savvy/multi_sweep

for more information, see https://pre-commit.ci

…savvy/multi_sweep

for more information, see https://pre-commit.ci

Co-authored-by: Stephanie Spielman <[email protected]>

for more information, see https://pre-commit.ci

cansavvy and others added 2 commits December 18, 2024 09:48

multi eval function

12e072e

[pre-commit.ci] auto fixes from pre-commit.com hooks

52bcb38

for more information, see https://pre-commit.ci

cansavvy commented Dec 18, 2024

View reviewed changes

Merge branch 'main' into cansavvy/multi_sweep

22aa816

cansavvy requested a review from sjspielman December 18, 2024 15:00

jashapiro reviewed Dec 18, 2024

View reviewed changes

R/evaluate-clusters.R Show resolved Hide resolved

jashapiro reviewed Dec 18, 2024

View reviewed changes

sjspielman reviewed Dec 18, 2024

View reviewed changes

cansavvy and others added 11 commits January 24, 2025 13:24

Update R/evaluate-clusters.R

b120d4a

Co-authored-by: Joshua Shapiro <[email protected]>

Update R/evaluate-clusters.R

5b54da8

Co-authored-by: Stephanie Spielman <[email protected]>

Update R/evaluate-clusters.R

51ea688

Co-authored-by: Stephanie Spielman <[email protected]>

Merge branch 'main' into cansavvy/multi_sweep

04237ec

[pre-commit.ci] auto fixes from pre-commit.com hooks

c717d79

for more information, see https://pre-commit.ci

Updates based on reviews

0985f59

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

b18e960

…savvy/multi_sweep

Throw in some tests

fcd96d2

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1c7274

for more information, see https://pre-commit.ci

Update example

e68286f

Merge remote-tracking branch 'cansavvy/cansavvy/multi_sweep' into can…

b9677fa

…savvy/multi_sweep

cansavvy marked this pull request as ready for review January 24, 2025 18:58

cansavvy and others added 8 commits January 24, 2025 14:12

Add tests!

bc7955c

Merge branch 'cansavvy/tests' into cansavvy/multi_sweep

e1206a2

Put tests in another branch

9e355d5

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccd2b3b

for more information, see https://pre-commit.ci

Update docs

81f3e07

[pre-commit.ci] auto fixes from pre-commit.com hooks

e91eaf3

for more information, see https://pre-commit.ci

Remove artifact test

49351b8

Oh no don't get rid of that file.

d16127e

cansavvy mentioned this pull request Jan 24, 2025

Tests for calculate_cell_cluster_metrics() function #29

Draft

cansavvy and others added 3 commits January 24, 2025 14:38

devtools::document() to please the testthat gods

78b0bd9

Update R/evaluate-clusters.R

ad3e57c

Co-authored-by: Stephanie Spielman <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e19561a

for more information, see https://pre-commit.ci

cansavvy changed the title ~~DRAFT: calculate_cell_cluster_metrics() function~~ Add calculate_cell_cluster_metrics() function Jan 24, 2025

cansavvy added 2 commits January 24, 2025 14:47

Appease linter

c7870df

Forgot to remove ...

6539ac9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add calculate_cell_cluster_metrics() function #23

Add calculate_cell_cluster_metrics() function #23

cansavvy commented Dec 18, 2024

cansavvy Dec 18, 2024

jashapiro Dec 18, 2024

cansavvy Dec 18, 2024

sjspielman Dec 18, 2024

cansavvy Jan 24, 2025

jashapiro Jan 24, 2025

cansavvy commented Dec 18, 2024 •

edited

Loading

jashapiro left a comment

jashapiro Dec 18, 2024

sjspielman left a comment

sjspielman Dec 18, 2024

sjspielman Dec 18, 2024

sjspielman Dec 18, 2024

	evals = c("purity", "silhouette"),
	metric = c("purity", "silhouette"),

Add calculate_cell_cluster_metrics() function #23

Are you sure you want to change the base?

Add calculate_cell_cluster_metrics() function #23

Conversation

cansavvy commented Dec 18, 2024

Background

Summary

Requested feedback

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy commented Dec 18, 2024 • edited Loading

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy commented Dec 18, 2024 •

edited

Loading