Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add celltyping to external instructions #502

Merged
merged 47 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
4d8c4b3
submitter docs and started annotation docs
sjspielman Oct 11, 2023
2407746
add example celltype metadata file
sjspielman Oct 11, 2023
1f2a543
full paths and add rel link to submitter file prep instructions
sjspielman Oct 11, 2023
c414d94
section on repeating
sjspielman Oct 11, 2023
3228b4f
add some links
sjspielman Oct 11, 2023
390d6cc
doctoc
sjspielman Oct 11, 2023
10a510a
caps for table descriptions
sjspielman Oct 11, 2023
6298807
A couple cleanups
sjspielman Oct 11, 2023
3b2aff9
spelling
sjspielman Oct 11, 2023
d7d7ea1
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Oct 27, 2023
69de0f2
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Nov 7, 2023
638c761
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Nov 13, 2023
92a9927
Update external instructions with current naming scheme, with a littl…
sjspielman Nov 13, 2023
6c9f182
update example file
sjspielman Nov 13, 2023
2467ef7
spelling
sjspielman Nov 13, 2023
fb42f6f
reorg and flesh some reference bullets out
sjspielman Nov 13, 2023
2cf5384
doctoc
sjspielman Nov 13, 2023
1072ec9
delete duplicate text that was rewritten here but original remained. …
sjspielman Nov 13, 2023
ffdc7b8
Update some comments with more contextual information about overall w…
sjspielman Nov 14, 2023
057df40
merge in development and fix conflict, and precommit hook tweaked ext…
sjspielman Nov 14, 2023
920e6ae
bullet text and a little rephrasing
sjspielman Nov 14, 2023
2326a58
third bullet
sjspielman Nov 14, 2023
24df4d9
Update a bunch of relative links
sjspielman Nov 14, 2023
84802a6
fix weird underscore
sjspielman Nov 14, 2023
856ba33
submitter file is no longer required; remove it
sjspielman Nov 14, 2023
7361ab9
catch some small fixes from review, and move section to be above spec…
sjspielman Nov 15, 2023
f6ec451
Updates and rearrangements based on review comments
sjspielman Nov 15, 2023
3acec7f
need internet for default cell typing files, and reframe the cell typ…
sjspielman Nov 15, 2023
209b6e5
rewording and remove some essentially duplicated text
sjspielman Nov 15, 2023
5e8aafd
Changed my mind, better indeed to have references first even in this …
sjspielman Nov 15, 2023
fd26234
Apply suggestions from code review
sjspielman Nov 15, 2023
95e5d17
Clean up tables, add toc title, move output section up, and fix some …
sjspielman Nov 15, 2023
891a1e5
now actually add toc title post styling
sjspielman Nov 15, 2023
55627ba
Add button hack for celltype file, and update other buttons to use fi…
sjspielman Nov 15, 2023
38dba37
Add example submitter cell types file and link with button in externa…
sjspielman Nov 15, 2023
94ac08a
remove s3 reference paths
sjspielman Nov 15, 2023
83bec1a
move repeat mapping into a new section for additional settings, and d…
sjspielman Nov 15, 2023
49e210a
remove colon after a verb where it shouldnt be, and leave TODOs to ci…
sjspielman Nov 15, 2023
ffa8edf
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Nov 15, 2023
447f0c9
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Nov 15, 2023
2df686b
the triumphant return of the toc title
sjspielman Nov 15, 2023
e8b857d
update README bullets with all example files
sjspielman Nov 15, 2023
864a1df
rephrase and make it a table for cleaner spacing
sjspielman Nov 15, 2023
4246e78
we did not mean to have this, as discussed in #469
sjspielman Nov 15, 2023
df95988
Merge branch 'development' into sjspielman/499-external-celltype-docs
sjspielman Nov 15, 2023
8c9c99b
We do not provide an example submitter file. Also harmonize some tabl…
sjspielman Nov 15, 2023
f946280
Merge branch 'sjspielman/499-external-celltype-docs' of github.com:Al…
sjspielman Nov 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions config/reference_paths.config
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,16 @@ celltype_organism = "Homo_sapiens.GRCh38.104"

// cell type references directories
celltype_ref_dir = "${params.ref_rootdir}/celltype"
// output from save_singler_refs() process

// populated by `build-celltype-ref.nf`, this stores the unaltered reference datasets
// from `celldex`. This is not used in `main.nf`.
singler_references_dir = "${params.celltype_ref_dir}/singler_references"
// output from train_singler_models() process, and input to classify_singler()

// These directories hold the default SingleR and CellAssign reference files to use during cell typing
// they are populated by `build-celltype-ref.nf` and consumed by `main.nf` during cell typing
singler_models_dir = "${params.celltype_ref_dir}/singler_models"
// output from generating cell assign reference matrices, and input to classify_cellassign()
cellassign_ref_dir = "${params.celltype_ref_dir}/cellassign_references"

// cell type metadata for building references
// cell type metadata for building references in `build-celltype-ref.nf`; not used by main workflow
celltype_ref_metadata = "${projectDir}/references/celltype-reference-metadata.tsv"
panglao_marker_genes_file = "${projectDir}/references/PanglaoDB_markers_2020-03-27.tsv"
34 changes: 19 additions & 15 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,41 @@

## Example files

This directory contains an example [metadata file](../external-data-instructions.md#prepare-the-metadata-file) and [configuration file](../external-data-instructions.md#configuration-files) for the `scpca-nf` workflow.
These files should be used as an example of formats and content, but note that the values in these files may not be applicable or sufficient to allow running `scpca-nf` to be used directly on your system.
This directory contains the following example files:

- An example [metadata file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the directory, we also have a sample metadata file and multiplex pools file. So we probably want to note all of these. And maybe we want to give the file names themselves?

Suggested change
- An example [metadata file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.
- An example [run metadata file](../external-instructions.md#prepare-the-run-metadata-file) for the `scpca-nf` workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you accidentally marked this as resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, no I intentionally resolved it... but for some reason my brain read your comment and, naturally, made this change (and similar) instead 😱

| [View example `run_metadata.tsv` file](examples/example_run_metadata.tsv) |
| ------------------------------------------------------------------------- |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not mad about the changes I made, but they are just not at all what your comment was....

- An example [configuration file](../external-instructions.md#configuration-files) for the `scpca-nf` workflow.
- An example [cell type annotation metadata file](../external-instructions.md#performing-cell-type-annotation) for performing optional cell type annotation in the `scpca-nf` workflow.

These files provide examples of expected formatting and content, but note that the specific values in these files may not be applicable or sufficient for running `scpca-nf` directly on your system.

## Testing your setup with example data

:warning: These instructions are only intended to be used to test accurate set up of a configuration file.
Before following these instructions, please ensure that you have already set up your own [configuration file](../external-data-instructions.md#configuration-files) and have [created and named a profile to use](../external-data-instructions.md#setting-up-a-profile-in-the-configuration-file).
Before following these instructions, please ensure that you have already set up your own [configuration file](../external-instructions.md#configuration-files) and have [created and named a profile to use](../external-instructions.md#setting-up-a-profile-in-the-configuration-file).

You can test your configuration setup by performing a test run with the example data that we have provided.

We recommend using the example 10X dataset from a [human glioblastoma donor that was processed using the 10X Genomics' Next GEM Single Cell 3' Reagent Kits v3.1](https://www.10xgenomics.com/resources/datasets/2-k-sorted-cells-from-human-glioblastoma-multiforme-3-v-3-1-3-1-standard-6-0-0)(note: you may be prompted to provide an email and register upon navigating to the 10X downloads site).
The fastq files for this example data can be downloaded from the following link (**note:** These files will take approximately 10 GB of disk space upon download and expanding the tar file): [Brain_Tumor_3p_fastqs.tar](https://cf.10xgenomics.com/samples/cell-exp/6.0.0/Brain_Tumor_3p/Brain_Tumor_3p_fastqs.tar).


Following download and unzipping of the fastq files, you will need to create a tab-separated values **run** metadata file that looks like the following:

| scpca_run_id | scpca_library_id | scpca_sample_id | scpca_project_id | technology | assay_ontology_term_id | seq_unit | sample_reference | files_directory | submitter_cell_types_file |
| ------------ | ---------------- | --------------- | ---------------- | ---------- | ---------------------- | -------- | ---------------- | --------------- | ------------------------ |
| run01 | library01 | sample01 | project01 | 10Xv3.1 | EFO:XXX | cell | Homo_sapiens.GRCh38.104 | /path/to/example_fastq_files | /path/to/annotated_cell_types_file
| scpca_run_id | scpca_library_id | scpca_sample_id | scpca_project_id | technology | assay_ontology_term_id | seq_unit | sample_reference | files_directory |
| ------------ | ---------------- | --------------- | ---------------- | ---------- | ---------------------- | -------- | ----------------------- | ---------------------------- |
| run01 | library01 | sample01 | project01 | 10Xv3.1 | EFO:XXX | cell | Homo_sapiens.GRCh38.104 | /path/to/example_fastq_files |

Be sure to enter the **full** path to the directory containing the fastq files in the `files_directory` column, and similarly the full path for the `submitter_cell_types_file` column.
Be sure to enter the **full** path to the directory containing the fastq files in the `files_directory` column.

You will also need to create a tab-separated values **sample** metadata file.
At a minimum, the sample metadata file must contain a column with `scpca_sample_id` as the header_.
At a minimum, the sample metadata file must contain a column with `scpca_sample_id` as the header.
The contents of this column should contain all unique sample ids that are present in the `scpca_sample_id` column of the run metadata file.

Below is an example of a sample metadata file:

| scpca_sample_id | diagnosis | age |
| --------------- | --------- | --- |
| sample01 | glioblastoma | 71 |
| scpca_sample_id | diagnosis | age |
| --------------- | ------------ | --- |
| sample01 | glioblastoma | 71 |

**Note that the `diagnosis` and `age` columns are shown as example sample metadata one might include in the sample metadata file.
The metadata file that you create does not need to match this exactly, but it must contain the required `scpca_sample_id` column.**
Expand All @@ -47,11 +51,11 @@ nextflow run AlexsLemonade/scpca-nf \
--sample_metafile <path to sample metadata file>
```

Where `<path to config file>` is the **relative** path to the configuration file that you have setup after following the instructions on [creating a configuration file](../external-data-instructions.md#configuration-files), `<name of profile>` is the name of the profile that you chose when creating a profile, `<path to run metadata file>` is the **full** path to the run metadata TSV you created, and `<path to sample metadata file>` is the **full** path to the sample metadata TSV you created.
Where `<path to config file>` is the **relative** path to the configuration file that you have setup after following the instructions on [creating a configuration file](../external-instructions.md#configuration-files), `<name of profile>` is the name of the profile that you chose when creating a profile, `<path to run metadata file>` is the **full** path to the run metadata TSV you created, and `<path to sample metadata file>` is the **full** path to the sample metadata TSV you created.
For the [example configuration file that we provided](./user_template.config), we used the profile name `cluster` and would indicate that we would like to use that profile at the command line with `-profile cluster`.
For more detailed information on setting up the metadata file for your own data, see instructions on [preparing the run metadata file](../external-data-instructions.md#prepare-the-run-metadata-file) and [preparing the sample metadata file](../external-instructions.md/#prepare-the-sample-metadata-file).
For more detailed information on setting up the metadata file for your own data, see instructions on [preparing the run metadata file](../external-instructions.md#prepare-the-run-metadata-file) and [preparing the sample metadata file](../external-instructions.md/#prepare-the-sample-metadata-file).

## Example output

You can download an example of the expected output files here: [`scpca_out.zip`](https://s3.amazonaws.com/scpca-references/example-data/scpca_out.zip).
For more information on the file structure and what to expect see the description of the [output files](../external-data-instructions.md#output-files).
For more information on the file structure and what to expect see the description of the [output files](../external-instructions.md#output-files).
4 changes: 4 additions & 0 deletions examples/example_project_celltype_metadata.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
scpca_project_id singler_ref_name singler_ref_file cellassign_ref_name cellassign_ref_file
project01 BlueprintEncodeData BlueprintEncodeData_celldex_1-10-1_model.rds blood-compartment blood-compartment_PanglaoDB_2020-03-27.tsv
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
project02 HumanPrimaryCellAtlasData HumanPrimaryCellAtlasData_celldex_1-10-1_model.rds NA NA
project03 NA NA blood-compartment blood-compartment_PanglaoDB_2020-03-27.tsv
Loading