From 668627f49725252b85c1369b1e6d65ca761984f3 Mon Sep 17 00:00:00 2001 From: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com> Date: Tue, 13 Aug 2024 13:36:10 -0400 Subject: [PATCH 1/8] Uncomment portal-wide metadata lines --- docs/download_files.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/download_files.md b/docs/download_files.md index fa76f02..5895db7 100644 --- a/docs/download_files.md +++ b/docs/download_files.md @@ -164,10 +164,8 @@ Metadata for all samples on the Portal is available to download separately from Each project page has an option to download metadata for all of its samples as a single zip file containing the `metadata.tsv` file and a `README.md` file. Project-specific metadata will contain all columns listed in [the above table](#metadata) and any additional project-specific columns, such as treatment or outcome. - ## Multiplexed sample libraries From dc8bc1d0ae041c3dd432211da539688aea26eae6 Mon Sep 17 00:00:00 2001 From: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com> Date: Tue, 13 Aug 2024 13:37:29 -0400 Subject: [PATCH 2/8] Add back portal-wide metadata placeholder to CHANGELOG --- docs/CHANGELOG.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 29687a5..bf861ba 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -12,6 +12,11 @@ For more information about `AlexsLemonade/scpca-nf` versions, please see [the re +## PLACEHOLDER FOR PORTAL WIDE METADATA + +* Metadata for all samples from all projects on the Portal can now be downloaded in a single tab-separated values file. +* For more information on what to expect in the metadata file, see the {ref}`metadata section of the Downloadable files page `. + ## 2024.08.13 * A new column, `age_timing`, is now present in the sample metadata tables included with each download. From 74e8b96be250f4ecd22c943ce5c9b27d93af9f89 Mon Sep 17 00:00:00 2001 From: Dongze He <32473855+DongzeHE@users.noreply.github.com> Date: Sun, 15 Sep 2024 15:07:51 -0700 Subject: [PATCH 3/8] update alevin-fry and salmon references --- docs/faq.md | 4 ++-- docs/processing_information.md | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/faq.md b/docs/faq.md index 46b34b0..6e9ec60 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -3,7 +3,7 @@ ## Why did we use Alevin-fry for processing? We aimed to process all of the data in the portal such that it is comparable to widely used pipelines, namely Cell Ranger from 10x Genomics. -In our own benchmarking, we found that [Alevin-fry](https://github.com/COMBINE-lab/alevin-fry) produces very similar results to [Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count), while allowing faster, more memory efficient processing of single-cell and single-nuclei RNA-sequencing data. +In our own benchmarking, we found that [Alevin-fry](https://github.com/COMBINE-lab/alevin-fry) ([He _et al._ (2022)](https://doi.org/10.1038/s41592-022-01408-3)) produces very similar results to [Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count), while allowing faster, more memory efficient processing of single-cell and single-nuclei RNA-sequencing data. In the configuration that we are using ("selective alignment" mapping to a human transcriptome that includes introns), Alevin-fry uses approximately 12-16 GB of memory per sample and completes mapping and quantification in less than an hour. By contrast, Cell Ranger uses up to 25-30 GB of memory per sample and takes anywhere from 2-8 hours to align and quantify one sample. Quantification of samples processed with both Alevin-fry and Cell Ranger resulted in similar distributions of mapped UMI count per cell and genes detected per cell for both tools. @@ -17,7 +17,7 @@ We also compared the mean gene expression reported for each gene by both methods ![](https://github.com/AlexsLemonade/alsf-scpca/blob/c0c2442d7242f6e06a5ac6d1e45bd1951780da14/analysis/docs-figures/plots/gene_exp_correlation.png?raw=true) Recent reports from others support our findings. -[He _et al._ (2021)](https://doi.org/10.1101/2021.06.29.450377) demonstrated that `alevin-fry` can process single-cell and single-nuclei data more quickly and efficiently then other available methods, while also decreasing the false positive rate of gene detection that is commonly seen in methods that utilize transcriptome alignment. +[He _et al._ (2022)](https://doi.org/10.1038/s41592-022-01408-3) demonstrated that `alevin-fry` can process single-cell and single-nuclei data more quickly and efficiently then other available methods, while also decreasing the false positive rate of gene detection that is commonly seen in methods that utilize transcriptome alignment. [You _et al._ (2021)](https://doi.org/10.1101/2021.06.17.448895) and [Tian _et al._ (2019)](https://doi.org/10.1038/s41592-019-0425-8) have also noted that results from different pre-processing workflows for single-cell RNA-sequencing analysis tend to result in compatible results downstream. ## How do I use the provided RDS files in R? diff --git a/docs/processing_information.md b/docs/processing_information.md index 80e059f..456bc94 100644 --- a/docs/processing_information.md +++ b/docs/processing_information.md @@ -4,7 +4,7 @@ ### Mapping and quantification using alevin-fry -We used [`salmon alevin`](https://salmon.readthedocs.io/en/latest/alevin.html) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/) to generate gene by cell counts matrices for all single-cell and single-nuclei samples. +We used [`salmon`](https://salmon.readthedocs.io/en/latest) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/) to generate gene by cell counts matrices for all single-cell and single-nuclei samples. In brief, we utilized [selective alignment](#selective-alignment) to the [`splici` index](#reference-transcriptome-index) for all single-cell and single-nuclei samples. #### Reference transcriptome index @@ -12,7 +12,7 @@ In brief, we utilized [selective alignment](#selective-alignment) to the [`splic For all samples, we aligned FASTQ files to a reference transcriptome index referred to as the `splici` index. The [`splici` index](https://combine-lab.github.io/alevin-fry-tutorials/2021/improving-txome-specificity/) is built using transcripts from both spliced cDNA and intronic regions. Inclusion of intronic regions in the index used for alignment allowed us to capture both reads from mature, spliced cDNA and nascent, unspliced cDNA. -Alignment of RNA-seq data to an index containing intronic regions has been shown to reduce spuriously detected genes ([He _et al._ 2021](https://doi.org/10.1101/2021.06.29.450377), [Kaminow _et al._ 2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1.full#sec-5)). +Alignment of RNA-seq data to an index containing intronic regions has been shown to reduce spuriously detected genes ([He _et al._ (2022)](https://doi.org/10.1038/s41592-022-01408-3), [Kaminow _et al._ 2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1.full#sec-5)). In our hands, we have found that use of the `splici` index led to a more comparable distribution of unique genes found per cell to Cell Ranger than did use of an index obtained from spliced cDNA transcripts only. #### Selective alignment @@ -21,7 +21,7 @@ We mapped reads to the transcriptome index using `salmon` with the default "sele Briefly, selective alignment uses a mapping score validated approach to identify maximal exact matches between reads and the provided index. For all samples, we used selective alignment to the `splici` index. -A more detailed description of the mapping strategy invoked by `salmon` in conjunction with `alevin-fry` can be found in [Srivastava _et al._ (2020)](https://doi.org/10.1186/s13059-020-02151-8). +More detailed descriptions of the mapping strategy invoked by `salmon` in conjunction with `alevin-fry` can be found in [Srivastava _et al._ (2020)](https://doi.org/10.1186/s13059-020-02151-8) and [He _et al._ (2022)](https://doi.org/10.1038/s41592-022-01408-3). #### Alevin-fry parameters @@ -99,7 +99,7 @@ In these cases, the cell type annotations obtained from the submitter will be pr ## ADT quantification from CITE-seq experiments -CITE-seq libraries with reads from antibody-derived tags (ADTs) were also quantified using [`salmon alevin`](https://salmon.readthedocs.io/en/latest/alevin.html) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/), rounded to integer values. +CITE-seq libraries with reads from antibody-derived tags (ADTs) were also quantified using [`salmon`](https://salmon.readthedocs.io/en/latest) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/), rounded to integer values. Reference indices were constructed from the submitter-provided list of antibody barcode sequences corresponding to each library using the `--features` flag of `salmon index`. Mapping to these indices followed the same procedures as for RNA-seq data, including mapping with [selective alignment](#selective-alignment) and subsequent [quantification via alevin-fry](#alevin-fry-parameters). @@ -130,7 +130,7 @@ Multiplexed libraries, or libraries with cells or nuclei from more than one biol ### Hashtag oligonucleotide (HTO) quantification -HTO reads were also quantified using [`salmon alevin`](https://salmon.readthedocs.io/en/latest/alevin.html) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/), rounded to integer values. +HTO reads were also quantified using [`salmon`](https://salmon.readthedocs.io/en/latest) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/), rounded to integer values. Reference indices were constructed from the submitter-provided list of HTO sequences corresponding to each library using the `--features` flag of `salmon index`. Mapping to these indices followed the same procedures as for RNA-seq data, including mapping with [selective alignment](#selective-alignment) and subsequent [quantification via alevin-fry](#alevin-fry-parameters). From 02b45b0c2948fd2aeaec49811e35822e680f8f4e Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Mon, 16 Sep 2024 09:07:04 -0400 Subject: [PATCH 4/8] update spellcheck action --- .github/workflows/spell-check.yml | 40 ++++++++++++++++--------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/.github/workflows/spell-check.yml b/.github/workflows/spell-check.yml index cdb0812..11589d1 100644 --- a/.github/workflows/spell-check.yml +++ b/.github/workflows/spell-check.yml @@ -1,4 +1,3 @@ - name: Spell check Markdown files # Controls when the action will run. @@ -14,29 +13,32 @@ jobs: # This workflow contains a single job called "spell check" spell-check: runs-on: ubuntu-latest - container: - image: rocker/tidyverse:4.3.2 # Steps represent a sequence of tasks that will be executed as part of the job steps: - - uses: actions/checkout@v2 - - - name: Install packages - run: Rscript --vanilla -e "install.packages('spelling', repos = c(CRAN = 'https://cloud.r-project.org'))" + - name: Checkout + uses: actions/checkout@v4 - - name: Run spell check - id: spell_check_run + - name: Remove files that do not need to be spellchecked run: | - results=$(Rscript --vanilla "scripts/spell-check.R") - echo "::set-output name=sp_chk_results::$results" - cat spell_check_errors.tsv - - name: Archive spelling errors - uses: actions/upload-artifact@v2 + rm ./LICENSE + + - name: Spell check action + uses: alexslemonade/spellcheck@v0 + id: spell with: - name: spell-check-results + dictionary: components/dictionary.txt + + - name: Upload spell check errors + uses: actions/upload-artifact@v4 + id: artifact-upload-step + with: + name: spell_check_errors path: spell_check_errors.tsv - # If there are too many spelling errors, this will stop the workflow - - name: Check spell check results - fail if too many errors - if: ${{ steps.spell_check_run.outputs.sp_chk_results > 0 }} - run: exit 1 + - name: Fail if there are spelling errors + if: steps.spell.outputs.error_count > 0 + run: | + echo "There were ${{ steps.spell.outputs.error_count }} errors" + column -t spell_check_errors.tsv + exit 1 From 140b23fb6d3752047d6a79236473f0db73701392 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Mon, 16 Sep 2024 09:16:08 -0400 Subject: [PATCH 5/8] remove scripts --- scripts/spell-check.R | 27 --------------------------- 1 file changed, 27 deletions(-) delete mode 100644 scripts/spell-check.R diff --git a/scripts/spell-check.R b/scripts/spell-check.R deleted file mode 100644 index 469d76c..0000000 --- a/scripts/spell-check.R +++ /dev/null @@ -1,27 +0,0 @@ -#!/usr/bin/env Rscript -# -# Run spell check and save results -# Adapted from: https://github.com/AlexsLemonade/refinebio-examples/blob/33cdeff66d57f9fe8ee4fcb5156aea4ac2dce07f/scripts/spell-check.R -# and https://github.com/AlexsLemonade/training-modules/blob/04bea3d2707975e04b57b714ba8b709c77594706/scripts/spell-check.R - -# Find .git root directory -root_dir <- rprojroot::find_root(rprojroot::has_dir(".git")) - -# Read in dictionary -dictionary <- readLines(file.path(root_dir, 'components', 'dictionary.txt')) - -# The only files we want to check are Markdown files -files <- list.files(root_dir, pattern = '\\.md$', recursive = TRUE, full.names = TRUE) - - -# Run spell check -spelling_errors <- spelling::spell_check_files(files, ignore = dictionary) |> - data.frame() |> - tidyr::unnest(cols = found) |> - tidyr::separate(found, into = c("file", "lines"), sep = ":") - -# Print out how many spell check errors -write(nrow(spelling_errors), stdout()) - -# Save spell errors to file temporarily -readr::write_tsv(spelling_errors, 'spell_check_errors.tsv') From 5e252f17c56ee35c4eb47b9e28ea2694765f1e34 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Tue, 17 Sep 2024 11:50:59 -0400 Subject: [PATCH 6/8] Update CHANGELOG.md heading Add partial date of metadata changes --- docs/CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index bf861ba..a0aa03f 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -12,7 +12,7 @@ For more information about `AlexsLemonade/scpca-nf` versions, please see [the re -## PLACEHOLDER FOR PORTAL WIDE METADATA +## 2024.09.XX * Metadata for all samples from all projects on the Portal can now be downloaded in a single tab-separated values file. * For more information on what to expect in the metadata file, see the {ref}`metadata section of the Downloadable files page `. From 10e043353b8e78ed2b097e0c51d98c753e0da080 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Mon, 23 Sep 2024 17:25:35 -0400 Subject: [PATCH 7/8] Add note about column names for multiplexed --- docs/download_files.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/download_files.md b/docs/download_files.md index b9f00f9..425190c 100644 --- a/docs/download_files.md +++ b/docs/download_files.md @@ -183,6 +183,7 @@ Because we do not perform demultiplexing to separate cells from multiplexed libr For more on the specific contents of multiplexed library `SingleCellExperiment` objects, see the {ref}`Additional SingleCellExperiment components for multiplexed libraries ` section. The [metadata file](#metadata) for multiplexed libraries (`single_cell_metadata.tsv`) will have the same format as for individual samples, but each row will represent a particular sample/library pair, meaning that there may be multiple rows for each `scpca_library_id`, one for each `scpca_sample_id` within that library. +In addition, an estimate of the total number of cells for each sample after demultiplexing will be found in the `sample_cell_estimate` (as opposed to the `sample_cell_count_estimate` column used for non-multiplexed samples). ## Merged object downloads From 902bf5955ca4cc9ba1a638bd16b0429ff0f0d507 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Tue, 24 Sep 2024 16:13:54 -0400 Subject: [PATCH 8/8] Finish date --- docs/CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index a0aa03f..bf7e0f4 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -12,7 +12,7 @@ For more information about `AlexsLemonade/scpca-nf` versions, please see [the re -## 2024.09.XX +## 2024.09.24 * Metadata for all samples from all projects on the Portal can now be downloaded in a single tab-separated values file. * For more information on what to expect in the metadata file, see the {ref}`metadata section of the Downloadable files page `.