Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initiate "guide" notebook for validating cell type assignments #1001

Conversation

allyhawkins
Copy link
Member

@allyhawkins allyhawkins commented Jan 23, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Starts on the journey of #993

What is the goal of this pull request?

Here I'm initiating a template or guide notebook of sorts that we plan to use to evaluate the results from 3 workflows and "finalize" cell type annotations for each library. We want to be able to produce a notebook for each library that compiles the results from SingleR, clustering, and AUCell and uses those results to get cell type annotations. Since there is going to be some manual work when going through each of these libraries, we don't want to just render the same notebook for all libraries. But we will use a lot of the same plots and generally the same flow, so my thought is that we can use this guide to do most of the leg work code-wise and then fill in with biology that's specific to each library.

This involves first creating the "guide" notebook and any associated functions and then use that notebook to create the actual notebook for each library.
If you need any clarification on the plan proposed here please let me know!

Briefly describe the general approach you took to achieve this goal.

This PR includes the initiation of the notebook and all the code needed for setting up the data.

  • I wrote some instructions on how I'm envisioning this notebook to be used at the top. I also outlined what I think it should look like and filled in some info that will get updated in subsequent PRs.
  • I added all the setup code which includes defining paths to all input files (output from SingleR, clustering, and AUCell) and then creating a data frame with all information that will be used for plotting. To simplify this, I wrote a function to combine all the results and save that in a separate file. The function probably isn't totally necessary since it's only used here, but I think it will help the notebook be less crowded in the long run.
  • I added some information to the README in the template_notebooks folder, but most of this is copied from the intro in the notebook. I figured as we finish the notebook and actually start using it the instructions might get expanded here.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes, next up will be filling out the next section with summary plots.

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins requested review from sjspielman and removed request for jaclyn-taroni January 23, 2025 22:44
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, left a couple comments but nothing too big!

The function probably isn't totally necessary since it's only used here, but I think it will help the notebook be less crowded in the long run.

💯

Yes, next up will be filling out the next section with summary plots.

To clarify, is this the spots marked TODO?

analyses/cell-type-ewings/template_notebooks/README.md Outdated Show resolved Hide resolved

1. Ensure that you have a local copy of the results from `aucell-singler-annotation.sh`, `evaluate-clusters.sh` and `run-aucell-ews-signatures.sh` saved to `results`.
2. Copy the contents of this notebook to a new notebook titled `<library_id>_celltype-exploration.Rmd` and save in `exploratory_analysis/final_annotation_notebooks`.
3. Replace the `sample_id` and `library_id` with the correct IDs in the `params` list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and update the notebook title too in the yaml

Comment on lines 11 to 12
The `celltype-exploration.Rmd` notebook is meant to be used as a guide for assigning and evaluating the final cell type annotations for each library in `SCPCP000015`.
Instructions for using this guide:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be fine to say here "instructions are in the template notebook" instead of duplicating instructions, but that's up to you!

- Density plots by cluster of AUC values and custom gene set means
- Maybe heatmaps with cluster annotation of AUC scores and custom gene set means

## Re-cluster tumor cells **Manual exploration**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought that you might use section tags for these indicators instead of real text, like:

Suggested change
## Re-cluster tumor cells **Manual exploration**
## Re-cluster tumor cells {.manual-exploration}

Those tags don't appear in the output (which may or may not be what you want?)

cluster_df <- cluster_df |>
# filter to the clustering results we want to use
dplyr::filter(
cluster_method == "leiden_mod",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documented for the notebook, but not specifically for the function that it's only going to consider Leiden with modularity. I'd add into the function docs somewhere.

dplyr::left_join(singler_df, by = c("barcodes")) |>
dplyr::left_join(cluster_df, by = c("barcodes")) |>
dplyr::left_join(aucell_wide_df, by = c("barcodes"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a check would be worthwhile here before returning.. maybe check the column names are as expected?

analyses/cell-type-ewings/template_notebooks/README.md Outdated Show resolved Hide resolved
@allyhawkins
Copy link
Member Author

@sjspielman I incorporated most of your reviews including removing the instructions in the README and just made a note to check the template for full instructions. The one thing I did not do is add the check for the columns. There are quite a lot of columns (multiple annotation and a lot of AUCell results) so I didn't think it was entirely necessary at this point.

Yes, next up will be filling out the next section with summary plots.

To clarify, is this the spots marked TODO?

Yes that's correct! Planning on filling out the TODOs before proceeding with making the notebooks fore each sample.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Co-authored-by: Stephanie Spielman <[email protected]>
@allyhawkins allyhawkins merged commit e7da87c into AlexsLemonade:main Jan 27, 2025
3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/initiate-final-annotation-template branch January 27, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants