Precompute clustering #18

rcannood · 2024-12-10T14:23:57Z

Describe your changes

Add .obsm["clustering"] to the solution so it can be used by the metrics

Example:

input_solution$obsm[["clustering"]] |> head()
                            leiden_r0.2 leiden_r0.4 leiden_r0.3 leiden_r0.7 leiden_r0.6 leiden_r0.8 leiden_r0.5
Sample_787                            1           1           1           1           1           1           1
D74_82                                0           3           0           3           3           3           3
D1713_42                              2           2           2           2           2           2           2
human4_lib3.final_cell_0092           0           0           0           0           0           0           0
Sample_716                            1           1           1           1           1           1           1
Sample_834                            1           1           1           1           1           1           1

Running the process_dataset workflow results in:

$ scripts/create_resources/test_resources.sh
Nextflow 24.10.2 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.0
Launching `target/nextflow/workflows/process_datasets/main.nf` [agitated_lamarr] DSL2 - revision: 9ea5522fa8
executor >  local (11)
[af/ae7c7b] process > process_datasets:run_wf:check_dataset_with_schema:processWf:check_dataset_with_schema_process (run)      [100%] 1 of 1 ✔
[52/3c6e9b] process > process_datasets:run_wf:precompute_clustering_run:processWf:precompute_clustering_run_process (run_r0.7) [100%] 7 of 7 ✔
[87/3fe18f] process > process_datasets:run_wf:precompute_clustering_merge:processWf:precompute_clustering_merge_process (run)  [100%] 1 of 1 ✔
[0a/db4c96] process > process_datasets:run_wf:process_dataset:processWf:process_dataset_process (run)                          [100%] 1 of 1 ✔
[d1/df9244] process > process_datasets:publishStatesSimpleWf:publishStatesProc (run)                                           [100%] 1 of 1 ✔

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

…ters

mumichae

Clustering code looks good, with just minor requests. In order for the metrics to recognise the precomputed clusters, I updated the scib version and adjusted the metrics to look for the precomputed clusters, asuming they are stored in .obs

src/data_processors/precompute_clustering_run/script.py

src/data_processors/process_dataset/script.py

src/data_processors/precompute_clustering_run/script.py

src/data_processors/precompute_clustering_merge/script.py

src/workflows/process_datasets/config.vsh.yaml

src/data_processors/precompute_clustering_merge/script.py

mumichae · 2024-12-20T14:01:15Z

I addressed most of my comments. The only thing that is missing are tests for the clustering component and for checking that the metrics actually use the precomputed values

lazappi · 2025-01-07T14:01:00Z

Hi @mumichae. Thanks for you help with this! We would like to try and merge it this week. I can help with the code but I wanted to make sure I understood everything first. At the moment the clustering is being pre-computed on the raw, unintegrated data. Is this correct or should we be doing it on the embedding output?

mumichae · 2025-01-09T10:59:54Z

Hi @lazappi , thanks for catching that detail, I didn't notice it when I was going through the code. The clustering should of course be done for each integration, i.e. after the transformation step, when the kNN graph has been computed.

lazappi · 2025-01-09T12:22:18Z

Thanks! I thought that would be the cases so I've made some changes already. Should be able to push them soon.

lazappi · 2025-01-09T13:57:18Z

@rcannood @mumichae I think this should be ready now (for review at least). The CI is failing because I haven't synced the new test resources yet (didn't want to do that without checking in case it messed up something else).

README.md

rcannood added 2 commits December 10, 2024 15:23

add clustering data frame to the solution

026e765

update script

9392ab6

rcannood requested a review from mumichae December 10, 2024 14:37

rcannood and others added 5 commits December 10, 2024 15:39

add comments

013b54c

add clustering key prefix for cluster-based metrics

b075b3c

add resolutions parameters to metrics to make use of precomputed clus…

0f99a8d

…ters

fix clustering key for nmi and ari

a4404ca

set correct version of scib to make using precomputed clusters possible

54c0fd9

mumichae requested changes Dec 13, 2024

View reviewed changes

mumichae added 6 commits December 20, 2024 13:54

add resolutions argument to cluster-based metrics

f80d939

use igraph for clustering on CPU

5234d3c

use partial reading for clustering

17d436c

rename cluster keys to be consistent with scib metrics

e77ad55

fix import and reading missing slot

810507f

get clustering from obsm

391e4b2

mumichae self-requested a review December 20, 2024 14:00

lazappi added 3 commits January 8, 2025 14:09

Add config to create test resources script

4b95c18

Add clustering to benchmark workflow

6c56070

Remove clustering from process dataset workflow

f3dc116

lazappi added 5 commits January 9, 2025 13:23

Move output processing to subworkflow

b285dbe

Update API with processing subworkflow

81f9649

Re-enable all methods/metrics

3ff4797

Remove clustering from fil_solution.yaml API file

5ee87fd

Add processing to test resources script

19b6e52

rcannood commented Jan 10, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

update readme

38552af

rcannood merged commit 4b67f90 into main Jan 10, 2025
7 checks passed

rcannood deleted the precompute_clustering branch January 10, 2025 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute clustering #18

Precompute clustering #18

rcannood commented Dec 10, 2024 •

edited

Loading

mumichae left a comment

mumichae commented Dec 20, 2024

lazappi commented Jan 7, 2025

mumichae commented Jan 9, 2025

lazappi commented Jan 9, 2025

lazappi commented Jan 9, 2025

Precompute clustering #18

Precompute clustering #18

Conversation

rcannood commented Dec 10, 2024 • edited Loading

Describe your changes

Checklist before requesting a review

mumichae left a comment

Choose a reason for hiding this comment

mumichae commented Dec 20, 2024

lazappi commented Jan 7, 2025

mumichae commented Jan 9, 2025

lazappi commented Jan 9, 2025

lazappi commented Jan 9, 2025

rcannood commented Dec 10, 2024 •

edited

Loading