diff --git a/doc/source/_static/benchmark.pdf b/doc/source/_static/benchmark.pdf new file mode 100644 index 0000000..60b3930 Binary files /dev/null and b/doc/source/_static/benchmark.pdf differ diff --git a/doc/source/_static/benchmark.png b/doc/source/_static/benchmark.png new file mode 100644 index 0000000..82f4089 Binary files /dev/null and b/doc/source/_static/benchmark.png differ diff --git a/doc/source/_static/pipeline.pdf b/doc/source/_static/pipeline.pdf index 6d742ee..3ce8848 100644 Binary files a/doc/source/_static/pipeline.pdf and b/doc/source/_static/pipeline.pdf differ diff --git a/doc/source/_static/pipeline.png b/doc/source/_static/pipeline.png index bb25ced..0a2ae9f 100644 Binary files a/doc/source/_static/pipeline.png and b/doc/source/_static/pipeline.png differ diff --git a/doc/source/_static/schema.pdf b/doc/source/_static/schema.pdf index 97b39ab..59a972c 100644 Binary files a/doc/source/_static/schema.pdf and b/doc/source/_static/schema.pdf differ diff --git a/doc/source/_static/schema.png b/doc/source/_static/schema.png index f66c08d..c4083c8 100644 Binary files a/doc/source/_static/schema.png and b/doc/source/_static/schema.png differ diff --git a/doc/source/conf.py b/doc/source/conf.py index 2bf97c2..8b8c968 100644 --- a/doc/source/conf.py +++ b/doc/source/conf.py @@ -58,9 +58,9 @@ # built documents. # # The short X.Y version. -version = '1.3' +version = '1.4' # The full version, including alpha/beta/rc tags. -release = '1.3' +release = '1.4' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/doc/source/index.rst b/doc/source/index.rst index d79edcf..0b7e4ae 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -1,8 +1,5 @@ .. IntegronFinder - Detection of Integron in DNA sequences - documentation master file, created by - sphinx-quickstart on Mon Jul 27 15:07:43 2015. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. + Welcome to IntegronFinder's documentation! ========================================== @@ -10,58 +7,8 @@ Welcome to IntegronFinder's documentation! IntegronFinder is a program that detects integrons in DNA sequences. The program is available on a webserver :ref:`(Mobyle) `, or by command line (`IntegronFinder on github`_). -You already read the :ref:`paper ` and want to install it ? Click :ref:`here ` - -Integrons are major genetic element, notorious for their major implication in the spread of antibiotic resistance genes. More generally, integrons are gene-capturing device, whose broader evolutionary role remains poorly understood. IntegronFinder is able to detect with high accuracy integron in DNA sequences. It is accurate because it combines the use of HMM profiles for the detection of the essential protein, the site-specific integron integrase, and the use of Covariance Models for the detection of the recombination site, the *attC* site. - -|integron schema| - -**How does it work ?** - -- First, IntegronFinder annotates the DNA sequence's CDS with Prodigal. - -- Second, IntegronFinder detects independently integron integrase and *attC* - recombination sites. The Integron integrase is detected by using the intersection - of two HMM profiles: - - - one specific of tyrosine-recombinase (PF00589) - - one specific of the integron integrase, near the patch III domain of tyrosine recombinases. - -The *attC* recombination site is detected with a covariance model (CM), which -models the secondary structure in addition to the few conserved sequence -positions. - - -- Third, the results are integrated, and IntegronFinder distinguishes 3 types of - elements: - - - complete integron - Integron with integron integrase nearby *attC* site(s) - - In0 element - Integron integrase only, without any *attC* site nearby - - CALIN element - *attC* sites only, without integron integrase nearby. - A rule of thumb to avoid false positive is to filter out singleton of - *attC* site. - -IntegronFinder can also annotate gene cassettes (CDS nearby *attC* sites) using -Resfams, a database of HMM profiles aiming at annotating antibiotic resistance -genes. This database is provided but the user can add any other HMM profiles -database of its own interest. - -When available, IntegronFinder annotates the promoters and attI sites by pattern -matching. - -.. image:: _static/pipeline.* - :width: 400px - :align: middle - :alt: IntegronFinder Pipeline - -.. |integron schema| image:: _static/schema.* - :align: middle - :width: 300px - :alt: Integron Schema - +- You already read the :ref:`paper ` and want to install it ? Click :ref:`here ` +- You did not read the paper (yet) but you would like to have rapid introduction to integrons and the program? click :ref:`here ` .. _`IntegronFinder on github`: https://github.com/gem-pasteur/Integron_Finder @@ -69,6 +16,7 @@ matching. .. toctree:: :maxdepth: 2 + introduction installation tutorial mobyle diff --git a/doc/source/introduction.rst b/doc/source/introduction.rst new file mode 100644 index 0000000..5a3687b --- /dev/null +++ b/doc/source/introduction.rst @@ -0,0 +1,76 @@ +.. IntegronFinder - Detection of Integron in DNA sequences + +.. _introduction: + +************ +Introduction +************ + +Integrons are major genetic element, notorious for their major implication in the spread of antibiotic resistance genes. More generally, integrons are gene-capturing platform, whose broader evolutionary role remains poorly understood. IntegronFinder is able to detect with high accuracy integron in DNA sequences. It is accurate because it combines the use of HMM profiles for the detection of the essential protein, the site-specific integron integrase, and the use of Covariance Models for the detection of the recombination site, the *attC* site. + +|integron schema| + +**How does it work ?** + +- First, IntegronFinder annotates the DNA sequence's CDS with Prodigal. + +- Second, IntegronFinder detects independently integron integrase and *attC* + recombination sites. The Integron integrase is detected by using the intersection + of two HMM profiles: + + - one specific of tyrosine-recombinase (PF00589) + - one specific of the integron integrase, near the patch III domain of tyrosine recombinases. + +The *attC* recombination site is detected with a covariance model (CM), which +models the secondary structure in addition to the few conserved sequence +positions. + + +- Third, the results are integrated, and IntegronFinder distinguishes 3 types of + elements: + + - complete integron (panel B above) + Integron with integron integrase nearby *attC* site(s) + - In0 element (panel C above) + Integron integrase only, without any *attC* site nearby + - CALIN element (panel D above) + *attC* sites only, without integron integrase nearby. + A rule of thumb to avoid false positive is to filter out singleton of + *attC* site. + +IntegronFinder can also annotate gene cassettes (CDS nearby *attC* sites) using +Resfams, a database of HMM profiles aiming at annotating antibiotic resistance +genes. This database is provided but the user can add any other HMM profiles +database of its own interest. + +When available, IntegronFinder annotates the promoters and attI sites by pattern +matching. + +|pipeline| + +**Does it work ?** + +Yes! The estimated sensitivity is 61% on average with the default option and goes up to 88% with the `--local_max` option. The missing *attC* sites are usually at the end of the array. The False positive rate with the `--local_max` option is estimated between 0.03 False Positive per Megabases (FP/Mb) to 0.72 FP/Mb. This leads to a probability of finding 2 consecutive *attC* sites within 4kb between 4.10^-6 and 7.10^-9. Finally, this parameters do not depend on the G+C percent of the given replicon. + +|benchmark| + +The time in the table correspond to the average time per run with a pseudogenome having attC sites on a Mac Pro, 2 x 2.4 GHz 6-Core Intel Xeon, 16 Gb RAM, with options --cpu 20 and --no-proteins. + +.. Note:: + The time does not vary depending of the mode (default or local_max), and is about a couple of second, if the replicon does not contain any *attC* site. + + +.. |benchmark| image:: _static/benchmark.* + :width: 400px + :align: middle + :alt: IntegronFinder Benchmark + +.. |pipeline| image:: _static/pipeline.* + :width: 400px + :align: middle + :alt: IntegronFinder Pipeline + +.. |integron schema| image:: _static/schema.* + :align: middle + :width: 300px + :alt: Integron Schema diff --git a/doc/source/tutorial.rst b/doc/source/tutorial.rst index 01b1681..9c9bc63 100644 --- a/doc/source/tutorial.rst +++ b/doc/source/tutorial.rst @@ -86,16 +86,13 @@ Parallelization --------------- The time limiting part are HMMER and INFERNAL. So IntegronFinder does not have -parallel implementation, but the user can set the number of CPU used by HMMER and +parallel implementation (yet?), but the user can set the number of CPU used by HMMER and INFERNAL:: integron_finder mysequence.fst --cpu 4 Default is 1. -To start IntegronFinder on many nucleotide sequences, one can use "manual" -parallelization by calling multiple times IntegronFinder in ``bash``. - .. _advance: Advanced use