Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute clustering #18

Merged
merged 22 commits into from
Jan 10, 2025
Merged

Precompute clustering #18

merged 22 commits into from
Jan 10, 2025

Conversation

rcannood
Copy link
Member

@rcannood rcannood commented Dec 10, 2024

Describe your changes

  • Add .obsm["clustering"] to the solution so it can be used by the metrics

Example:

input_solution$obsm[["clustering"]] |> head()
                            leiden_r0.2 leiden_r0.4 leiden_r0.3 leiden_r0.7 leiden_r0.6 leiden_r0.8 leiden_r0.5
Sample_787                            1           1           1           1           1           1           1
D74_82                                0           3           0           3           3           3           3
D1713_42                              2           2           2           2           2           2           2
human4_lib3.final_cell_0092           0           0           0           0           0           0           0
Sample_716                            1           1           1           1           1           1           1
Sample_834                            1           1           1           1           1           1           1

Running the process_dataset workflow results in:

$ scripts/create_resources/test_resources.sh
Nextflow 24.10.2 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.0
Launching `target/nextflow/workflows/process_datasets/main.nf` [agitated_lamarr] DSL2 - revision: 9ea5522fa8
executor >  local (11)
[af/ae7c7b] process > process_datasets:run_wf:check_dataset_with_schema:processWf:check_dataset_with_schema_process (run)      [100%] 1 of 1 ✔
[52/3c6e9b] process > process_datasets:run_wf:precompute_clustering_run:processWf:precompute_clustering_run_process (run_r0.7) [100%] 7 of 7 ✔
[87/3fe18f] process > process_datasets:run_wf:precompute_clustering_merge:processWf:precompute_clustering_merge_process (run)  [100%] 1 of 1 ✔
[0a/db4c96] process > process_datasets:run_wf:process_dataset:processWf:process_dataset_process (run)                          [100%] 1 of 1 ✔
[d1/df9244] process > process_datasets:publishStatesSimpleWf:publishStatesProc (run)                                           [100%] 1 of 1 ✔

Checklist before requesting a review

  • I have performed a self-review of my code

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI Tests succeed and look good!

@rcannood rcannood requested a review from mumichae December 10, 2024 14:37
Copy link
Contributor

@mumichae mumichae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clustering code looks good, with just minor requests. In order for the metrics to recognise the precomputed clusters, I updated the scib version and adjusted the metrics to look for the precomputed clusters, asuming they are stored in .obs

@mumichae mumichae self-requested a review December 20, 2024 14:00
@mumichae
Copy link
Contributor

I addressed most of my comments. The only thing that is missing are tests for the clustering component and for checking that the metrics actually use the precomputed values

@lazappi
Copy link
Contributor

lazappi commented Jan 7, 2025

Hi @mumichae. Thanks for you help with this! We would like to try and merge it this week. I can help with the code but I wanted to make sure I understood everything first. At the moment the clustering is being pre-computed on the raw, unintegrated data. Is this correct or should we be doing it on the embedding output?

@mumichae
Copy link
Contributor

mumichae commented Jan 9, 2025

Hi @lazappi , thanks for catching that detail, I didn't notice it when I was going through the code. The clustering should of course be done for each integration, i.e. after the transformation step, when the kNN graph has been computed.

@lazappi
Copy link
Contributor

lazappi commented Jan 9, 2025

Thanks! I thought that would be the cases so I've made some changes already. Should be able to push them soon.

@lazappi
Copy link
Contributor

lazappi commented Jan 9, 2025

@rcannood @mumichae I think this should be ready now (for review at least). The CI is failing because I haven't synced the new test resources yet (didn't want to do that without checking in case it messed up something else).

README.md Outdated Show resolved Hide resolved
@rcannood rcannood merged commit 4b67f90 into main Jan 10, 2025
7 checks passed
@rcannood rcannood deleted the precompute_clustering branch January 10, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants