-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #17 from COMBINE-lab/dev
Dev
- Loading branch information
Showing
10 changed files
with
313 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
version: "2" | ||
|
||
build: | ||
os: "ubuntu-22.04" | ||
tools: | ||
python: "3.10" | ||
|
||
python: | ||
install: | ||
- requirements: docs/requirements.txt | ||
|
||
sphinx: | ||
configuration: docs/source/conf.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
@ECHO OFF | ||
|
||
pushd %~dp0 | ||
|
||
REM Command file for Sphinx documentation | ||
|
||
if "%SPHINXBUILD%" == "" ( | ||
set SPHINXBUILD=sphinx-build | ||
) | ||
set SOURCEDIR=source | ||
set BUILDDIR=build | ||
|
||
if "%1" == "" goto help | ||
|
||
%SPHINXBUILD% >NUL 2>NUL | ||
if errorlevel 9009 ( | ||
echo. | ||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | ||
echo.installed, then set the SPHINXBUILD environment variable to point | ||
echo.to the full path of the 'sphinx-build' executable. Alternatively you | ||
echo.may add the Sphinx directory to PATH. | ||
echo. | ||
echo.If you don't have Sphinx installed, grab it from | ||
echo.http://sphinx-doc.org/ | ||
exit /b 1 | ||
) | ||
|
||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
goto end | ||
|
||
:help | ||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
|
||
:end | ||
popd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
sphinx==7.1.2 | ||
furo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
API | ||
=== | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
lumache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
|
||
# -- Project information | ||
|
||
project = 'piscem-infer' | ||
copyright = '2022-, Rob Patro' | ||
author = 'Rob Patro' | ||
master_doc = 'index' | ||
|
||
release = '0.1' | ||
version = '0.1.1' | ||
|
||
# -- General configuration | ||
|
||
extensions = [ | ||
'sphinx.ext.duration', | ||
'sphinx.ext.doctest', | ||
'sphinx.ext.autodoc', | ||
'sphinx.ext.autosummary', | ||
'sphinx.ext.intersphinx', | ||
] | ||
|
||
intersphinx_mapping = { | ||
'python': ('https://docs.python.org/3/', None), | ||
'sphinx': ('https://www.sphinx-doc.org/en/master/', None), | ||
} | ||
intersphinx_disabled_domains = ['std'] | ||
|
||
templates_path = ['_templates'] | ||
|
||
# -- Options for HTML output | ||
|
||
html_theme = 'furo' | ||
pygments_style = "sphinx" | ||
pygments_dark_style = "monokai" | ||
|
||
# -- Options for EPUB output | ||
epub_show_urls = 'footnote' | ||
|
||
|
||
|
||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
# import os | ||
# import sys | ||
# sys.path.insert(0, os.path.abspath('.')) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
Formats | ||
======= | ||
|
||
+----------------------------------------+-----------------------------------------------+--------------------------------------------------------------+---------------------------------------------------------+ | ||
| target_name | eeln | tpm | ecount | | ||
+========================================+===============================================+==============================================================+=========================================================+ | ||
| The identifer of the quantified target | The expected *effective length* of the target | The abundance of the target in transcripts per million (TPM) | The expected number of fragments assigned to the target | | ||
+----------------------------------------+-----------------------------------------------+--------------------------------------------------------------+---------------------------------------------------------+ | ||
|
||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
piscem-infer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
Welcome to piscem-infers's documentation! | ||
========================================= | ||
|
||
What is `piscem-infer`? | ||
----------------------- | ||
|
||
``piscem-infer`` is a tool to consume bulk-RAD files (produced by `piscem <https://github.com/COMBINE-lab/piscem>`_ or `piscem-cpp <https://github.com/COMBINE-lab/piscem-cpp>`_) | ||
and to produce from them abundance estimates of the targets against which the reads were mapped. For example, a basic RNA-seq pipeline could consist of mapping the reads | ||
against the transcriptome using `piscem`, and then quantifying transcript abundance using `piscem-infer`. Likewise, one could use the pair of tools on metagenomic index and | ||
metagenomic sequencing reads to perform taxonomic abundance estimation. The main goal of `piscem-infer` is to separate the statistical inference algorithms and code from the | ||
code that performs indexing and mapping. This allows faster and more agile development of new improvements to the inference method, as well as eases the maintenance burden. | ||
|
||
The `piscem-infer` program is written in `rust`, which makes it easy for end-users to compile, and which also makes it easy for us to deploy without the need for end-users | ||
to compile it (it's easy to create statically-linked, pre-compiled executables, and to put the package on `bioconda`). At the same time, this gives us access to the safety | ||
guarantees of `rust`, making the code easier to develop and maintain with confidence while retaining the efficiency of a high-performance, statically-typed, compiled language. | ||
|
||
While `piscem-infer` is in *active development*, it is already very usable! We encourage folks who are curious to try it out, to open Github issues for any questions or feature | ||
requests, and to provide feedback on how you'd like to see this next "evolution" of (bulk sequencing) abundance estimation tools evolve! | ||
|
||
Check out the :doc:`usage` section for further information. | ||
|
||
.. note:: | ||
|
||
This project is under active development. | ||
|
||
Contents | ||
-------- | ||
|
||
.. toctree:: | ||
|
||
usage | ||
formats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
Usage | ||
===== | ||
|
||
.. _installation: | ||
|
||
Installation | ||
------------ | ||
|
||
To install piscem-infer, use `cargo <https://github.com/rust-lang/cargo>`_: | ||
|
||
.. code-block:: console | ||
$ cargo install piscem-infer | ||
or install it from source | ||
|
||
.. code-block:: console | ||
$ git clone https://github.com/COMBINE-lab/piscem-infer | ||
$ cd piscem-infer | ||
$ cargo build --release | ||
An example run | ||
-------------- | ||
|
||
Obtaining the reference | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
First, we'll grab the reference transcriptome we will use for quantification - in this case Gencode's v44 annotation | ||
of the human transcriptome. | ||
|
||
.. code-block:: console | ||
$ wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz | ||
Building the index | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
|
||
Next, we'll build the index. You'll only have to do this once (or whenever you want to update the annotation you're using). To | ||
build the index and map the reads, we'll need ``piscem``. You can either build it from source according to the instructions | ||
on the `GitHub page <https://github.com/COMBINE-lab/piscem>`_, or you can install it from ``biconda`` using ``conda install piscem``. | ||
Once you have it installed, you can build the index with: | ||
|
||
.. code-block:: console | ||
$ piscem build -s gencode.v44.transcripts.fa.gz -k 31 -m 19 -t 16 -o gencode_v44_idx | ||
Obtaining the reads | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
To obtain some sample read data, we'll use the excellent |fastqdl|_ tool that you can install | ||
via either ``pip`` or bioconda (through ``conda`` or ``mamba``). | ||
|
||
.. code-block:: console | ||
$ fastq-dl -a SRR1039508 | ||
Mapping the reads | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
Next, we'll use ``piscem`` again to map the reads. The following command will do it for us (you can check out ``piscem map-bulk -h`` for | ||
a descripton of all the options): | ||
|
||
.. code-block:: console | ||
$ piscem map-bulk -t 16 -i gencode_v44_idx -1 SRR1039508_1.fastq.gz -2 SRR1039508_1.fastq.gz -o SRR1039508_mapped | ||
Quantification with ``piscem-infer`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Now that we've mapped the reads to produce a bulk-oriented ``RAD`` file, we're ready to quantify with ``piscem-infer``! | ||
There exist other options we can pass (e.g. we can perform bootstrap sampling to produce inferential replicates, in which | ||
case we can also request the use of multiple threads, but here we just invoke the most basic quantification process). | ||
|
||
.. code-block:: console | ||
$ piscem-infer quant -i SRR1039508_map -l IU -o quant/SRR1039508 | ||
Note that we pass to the ``-o`` flag a file *stem* prefixed with a path (in this case ``quant``). This is because ``piscem-infer`` | ||
will produce several output files. All of them will share the same *stem*. If we pass a stem that is prefixed with some path | ||
(e.g. a directory) then this directory will be created if it doesn't exist. We also let ``piscem-infer`` know the library type | ||
(i.e. how we expect the reads to map), where ``piscem-infer`` uses `salmon's library type specification <https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype>`_. | ||
Here we expect the library to be unstranded and the paired-end reads to map "inward" (i.e. facing each other). | ||
|
||
.. code-block:: console | ||
$ ls -la quant/ | ||
.rw-rw-r--@ 3.1k rob 30 Sep 12:33 SRR1039508.fld.pq | ||
.rw-rw-r--@ 628 rob 30 Sep 12:33 SRR1039508.meta_info.json | ||
.rw-rw-r--@ 33M rob 30 Sep 12:33 SRR1039508.quant | ||
The file ``SRR1039508.quant`` contains the quantification estimates, and is of a very similar format to e.g. a ``salmon`` ("quant.sf") format file. | ||
|
||
|
||
|
||
.. |fastqdl| replace:: ``fastq-dl`` | ||
.. _fastqdl: https://github.com/rpetit3/fastq-dl | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters