Skip to content

Commit

Permalink
update info text + minor help changes (#10)
Browse files Browse the repository at this point in the history
* update info text + minor help changes

* remove extra space
  • Loading branch information
Lioscro authored Oct 27, 2019
1 parent 9801aea commit 3fe58e9
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 7 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include kb_python/info.txt
include kb_python/whitelists/*
recursive-include kb_python/bins *
6 changes: 3 additions & 3 deletions kb_python/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,19 @@
REFERENCES = [
Reference(
'human',
'https://github.com/pachterlab/kallisto-transcriptome-indices/releases/download/ensembl-96/homo_sapiens.tar.gz',
'https://caltech.box.com/shared/static/v1nm7lpnqz5syh8dyzdk2zs8bglncfib.gz',
None, None
),
Reference(
'mouse',
'https://github.com/pachterlab/kallisto-transcriptome-indices/releases/download/ensembl-96/mus_musculus.tar.gz',
'https://caltech.box.com/shared/static/vcaz6cujop0xuapdmz0pplp3aoqc41si.gz',
None, None
)
]
REFERENCES_MAPPING = {r.name: r for r in REFERENCES}
# File names that are in the tar.gz file.
INDEX_FILENAME = 'transcriptome.idx'
T2G_FILENAME = 'transcript_to_genes.txt'
T2G_FILENAME = 'transcripts_to_genes.txt'


class UnsupportedOSException(Exception):
Expand Down
2 changes: 2 additions & 0 deletions kb_python/constants.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
INFO_FILENAME = 'info.txt'

# Default filenames
CDNA_FILENAME = 'cdna.fa'
INTRON_FILENAME = 'introns.fa'
Expand Down
36 changes: 36 additions & 0 deletions kb_python/info.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
kb is a python package for rapidly pre-processing single-cell RNA-seq data. It is a wrapper for the methods described in

Melsted, Booeshaghi et al., Modular and efficient pre-processing of single-cell RNA-seq, bioRxiv, 2019

The goal of the wrapper is to simplify downloading and running of the kallisto [1] and bustools [2] programs. It was inspired by Sten Linnarsson’s loompy fromfq command (http://linnarssonlab.org/loompy/kallisto/index.html)

The kb program consists of two parts:

The `kb ref` command builds or downloads a species-specific index for pseudoalignment of reads. This command must be run prior to `kb count`, and it runs the `kallisto index` [1].

The `kb count` command runs the kallisto [1] and bustools [2] programs. It can be used for pre-processing of data from a variety of single-cell RNA-seq technologies, and for a number of different workflows (e.g. production of gene count matrices, RNA velocity analyses, etc.). The output can be saved in a variety of formats including mix and loom. Examples are provided below.


Examples
========
(1) kb ref -i transcriptome.idx -g transcripts_to_genes.txt -f1 cdna.fa Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.98.gtf
(2) kb count -i transcriptome.idx -g transcripts_to_genes.txt -x 10xv2 -o output --loom Reads1.fastq.gz Reads2.fasta.gz
Build a Kallisto index and transcripts-to-genes mapping using the mouse transcriptome, generated from the provided genomic FASTA and GTF. Then, generate count matrices with the built index and transcripts-to-genes mapping. Convert the final count matrix to a .loom file.

(1) kb ref -i transcriptome.idx -g transcripts_to_genes.txt -f1 cdna.fa Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.98.gtf
(2) kb count -i transcriptome.idx -g transcripts_to_genes.txt -x 10xv2 -w 10xv2_whitelist -o output --h5ad Reads1.fastq.gz Reads2.fasta.gz
Build a Kallisto index and transcripts-to-genes mapping using the mouse transcriptome, generated from the provided genomic FASTA and GTF. Then, generate count matrices with the built index, transcripts-to-genes mapping and provided whitelist. Convert the final count matrix to a .h5ad file.

(1) kb ref -i transcriptome.idx -g transcripts_to_genes.txt -d mouse
(2) kb count -i transcriptome.idx -g transcripts_to_genes.txt -x 10xv2 -o output Reads1.fastq.gz Reads2.fasta.gz
Instead of building a Kallisto index locally, download a pre-built index. Then, generate count matrices with the built index and transcripts-to-genes mapping.

(1) kb ref -i transcriptome.idx -g transcripts_to_genes.txt -f1 cdna.fa -f2 introns.fa -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture --lamanno Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.98.gtf
(2) kb count -i transcriptome.idx -g transcripts_to_genes.txt -x 10xv2 -o output -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno Reads1.fastq.gz Reads2.fasta.gz
Prepare files (Kallisto index, transcripts-to-genes mapping, cDNA transcripts to capture, intron transcripts to capture) for RNA velocity based on Lamanno et al. 2018 logic. Then, calculate RNA velocity using the prepared files.


References
==========
[1] Bray, N. L., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature biotechnology, 34(5), 525.
[2] Melsted, P., Booeshaghi, A. S., Gao, F., da Veiga Beltrame, E., Lu, L., Hjorleifsson, K. E., ... & Pachter, L. (2019). Modular and efficient pre-processing of single-cell RNA-seq. BioRxiv, 673285.
27 changes: 23 additions & 4 deletions kb_python/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
import os
import shutil
import sys
import textwrap

from . import __version__
from .config import REFERENCES_MAPPING, TECHNOLOGIES, TEMP_DIR
from .config import PACKAGE_PATH, REFERENCES_MAPPING, TECHNOLOGIES, TEMP_DIR
from .constants import INFO_FILENAME
from .count import count, count_lamanno
from .ref import download_reference, ref, ref_lamanno
from .utils import get_bustools_version, get_kallisto_version
Expand All @@ -18,7 +20,16 @@ def display_info():
kallisto: {}
bustools: {}
'''.format(__version__, kallisto_version, bustools_version)
print(info)
with open(os.path.join(PACKAGE_PATH, INFO_FILENAME), 'r') as f:
print(
'{}\n{}'.format(
info, '\n'.join([
line.strip()
if line.startswith('(') else textwrap.fill(line, width=80)
for line in f.readlines()
])
)
)
sys.exit(1)


Expand Down Expand Up @@ -135,6 +146,8 @@ def setup_ref_args(parser, parent):
help='Build a kallisto index and transcript-to-gene mapping',
parents=[parent],
)
parser_ref._actions[0].help = parser_ref._actions[0].help.capitalize()

required_ref = parser_ref.add_argument_group('required arguments')
required_ref.add_argument(
'-i',
Expand Down Expand Up @@ -194,7 +207,10 @@ def setup_ref_args(parser, parent):
)
parser_ref.add_argument(
'--lamanno',
help='Prepare files for RNA velocity based on Lamanno',
help=(
'Prepare files for RNA velocity based on '
'La Manno et al. 2018 logic'
),
action='store_true'
)
parser_ref.add_argument(
Expand Down Expand Up @@ -225,6 +241,8 @@ def setup_count_args(parser, parent):
help='Generate count matrices from a set of single-cell FASTQ files',
parents=[parent],
)
parser_count._actions[0].help = parser_count._actions[0].help.capitalize()

required_count = parser_count.add_argument_group('required arguments')
required_count.add_argument(
'-i',
Expand Down Expand Up @@ -299,7 +317,7 @@ def setup_count_args(parser, parent):

parser_count.add_argument(
'--lamanno',
help='Calculate RNA velocity based on Lamanno',
help='Calculate RNA velocity based on La Manno et al. 2018 logic',
action='store_true'
)
parser_count.add_argument(
Expand Down Expand Up @@ -327,6 +345,7 @@ def main():
parser = argparse.ArgumentParser(
description='kb_python {}'.format(__version__)
)
parser._actions[0].help = parser._actions[0].help.capitalize()
parser.add_argument(
'--list',
help='Display list of supported single-cell technologies',
Expand Down

0 comments on commit 3fe58e9

Please sign in to comment.