Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add celltyping to external instructions #502

Merged
merged 47 commits into from
Nov 15, 2023

Conversation

sjspielman
Copy link
Member

@sjspielman sjspielman commented Oct 11, 2023

Closes #499

This PR adds external cell type documentation. There is a decent amount here, so I expect >1 rounds of review! (reminder that I am out this Thursday+Friday, which may affect when reviewers want to schedule their time to review this!).

I have added documentation throughout for adding submitter annotations and for performing cell typing, including an example celltype metadata file. Any organizational or wording feedback let me know! There is also one TODO in there about whether we want to say "want a different organ for cellassign? let us know," so please make sure to comment there.

Worth noting that the internal docs are still under review (#500) and this PR is not stacked; we may want to add more relative links between these docs files, but I'd like to wait until #500 is all set first. Any more relative links can be added later in this PR or in a subsequent PR.

While working on these docs, I had some thoughts about how we have set up the cell type annotation metadata file - specifically, do actually want this file to include the columns singler_ref_file and cellassign_ref_file? You'll see the docs are a bit weird (although I think fairly clear) for the content of these columns. We may wish to alter the cell type process to only accept singler & cell assign reference names, and we would create the meta variables for the reference paths themselves in the celltype annotation subworkflow. If we want to take that approach, I'd update docs here accordingly and then in a separate PR we can update how we handle the metadata in the workflow.

@sjspielman sjspielman requested review from allyhawkins and jashapiro and removed request for allyhawkins October 11, 2023 15:00
@jashapiro
Copy link
Member

I wonder if we want to wait a bit on this PR until we have implemented #470? The reason is that we will be able to better model "good" file names and version tracking at that point.

@sjspielman
Copy link
Member Author

I wonder if we want to wait a bit on this PR until we have implemented #470? The reason is that we will be able to better model "good" file names and version tracking at that point.

Yeah, that seems fine! I think we can leave this PR open and I can come back to it later. I don't foresee many, if any, file conflicts.

external-instructions.md Outdated Show resolved Hide resolved
@sjspielman sjspielman marked this pull request as draft October 27, 2023 12:11
@sjspielman sjspielman marked this pull request as ready for review November 13, 2023 16:48
@sjspielman
Copy link
Member Author

Time to revive this PR! I struggled a little bit with the level of detail to provide here vs provide in #555 (updates to internal instructions; please have a peek there for context!).
So, as part of leaving a first fresh review, please share any thoughts about how into the weeds we want to get with these docs. I am also not convinced that all of my phrasing is clear, so please flag any spots that make no/little sense!

Related to what level of detail we want to achieve here - this comment from before remains unaddressed, in part because we changed PanglaoDB references to be <thing>-compartment (I resolved the comment to reduce clutter, but it's up there!).

We define these from the organ groupings in celltype-reference-metadata.tsv, but they could be built differently. I think we want to be sure to include the information about that file and how we created it/how it might be used to generate alternative references.

@sjspielman sjspielman requested review from allyhawkins and removed request for allyhawkins November 13, 2023 17:10
@sjspielman
Copy link
Member Author

Ready for another look! Again, let's aim to get this in soon since this PR is getting too big, and we can follow up with a subsequent PR where necessary. Note that I left some TODO text here for where we can point users to available references.
It also seems that I uhh accidentally tackled #440 as well; can revert some of those change here or just push forward!

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

The `feature_barcode_file` itself is a tab separated file with one line per barcode and no header.
The first column will contain the barcode or antibody ID and the second column the barcode nucleotide sequence.
For example:
By default, `SingleR` annotation uses references from the [`celldex` package](https://bioconductor.org/packages/release/data/experiment/html/celldex.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file an issue to update this section on cell type references, after #565 goes in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi im on this, have a note waiting for it :)

--celltype_project_metafile examples/example_project_celltype_metadata.tsv
```

### Providing existing cell type labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I did mean to move it down here, but also that we could tackle it separately. I think what you have now is fine though!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external instructions look good, but I do want https://github.com/AlexsLemonade/scpca-nf/pull/502/files#r1394368472 completed before we merge this in (or an issue about those changes, but it seems small enough to sneak in here).

@sjspielman
Copy link
Member Author

sjspielman commented Nov 15, 2023

Actually updated the README.md in e8b857d, but I'm not sure about how I've written it. The other way I might have gone was, for example -

- `example_run_metadata.tsv`: An example [run metadata file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.

What do you think?

@sjspielman sjspielman requested a review from jashapiro November 15, 2023 21:41
Comment on lines 7 to 12
- An example [Nextflow configuration file (`user_template.config`)](../external-instructions.md#configuration-files) for the `scpca-nf` workflow.
- An example [`run_metadata.tsv` file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.
- An example [`sample_metadata.tsv` file](../external-instructions.md#prepare-the-sample-metadata-file) for the `scpca-nf` workflow.
- An example [`multiplex_pools.tsv` file](../external-instructions.md#multiplexed-cellhash-libraries) for the `scpca-nf` workflow.
- An example [`project_celltype_metadata.tsv` file](../external-instructions.md#preparing-the-cell-type-project-metadata-file) for performing optional cell type annotation in the `scpca-nf` workflow.
- An example [submitter cell type annotation file](../external-instructions.md#providing-existing-cell-type-labels) for optionally providing previously-obtained cell type annotations to the `scpca-nf` workflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said maybe the file names, but if we do that, we should use the actual file names in this directory. And because of that, I'm rethinking that idea... maybe just the description is sufficient for people to infer the name of the file?

@jashapiro
Copy link
Member

The other way I might have gone was, for example -

- `example_run_metadata.tsv`: An example [run metadata file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.

What do you think?

Oh, I missed that comment. I like this version!

@sjspielman
Copy link
Member Author

Updated, but in table form which gives better spacing, now that we're using colons.

I also removed some changes that went in but in retrospect I think were overkill, and also we decided not to do them - #469 (comment)

There is no longer an example submitter file. I think if we circle back later and decide we want it included in the example output, it should be a separate issue/PR discussion. But, there is still a section that says you could provide a submitter file if you want.

@sjspielman
Copy link
Member Author

@allyhawkins tagging you back in too if you have more thoughts on submitter example files..sorry for all the back and forth on that!!!

@sjspielman sjspielman requested a review from jashapiro November 15, 2023 22:00
@allyhawkins
Copy link
Member

@allyhawkins tagging you back in too if you have more thoughts on submitter example files..sorry for all the back and forth on that!!!

Look good to me, I don't have any other suggestions on that front.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@sjspielman
Copy link
Member Author

👍

@sjspielman sjspielman merged commit 74b744b into development Nov 15, 2023
3 checks passed
@sjspielman sjspielman deleted the sjspielman/499-external-celltype-docs branch November 15, 2023 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants