Skip to content

Commit

Permalink
Organizing documentation (scikit-bio#1999)
Browse files Browse the repository at this point in the history
* removed link to api stability

* updated io documentation

* organized sequence and alignment documents

* organized tree and statistics

* working on diversity and workflow

* reordered modules

* worked on the rest

* fixed duplicated class members display

* worked on metadata document

* reduced font size

* more edits

* fixed typo

* fixed readme badges
  • Loading branch information
qiyunzhu authored Mar 27, 2024
1 parent 3e022f8 commit caf13c1
Show file tree
Hide file tree
Showing 23 changed files with 503 additions and 337 deletions.
52 changes: 27 additions & 25 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,4 @@
.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg
:alt: License
:target: https://opensource.org/licenses/BSD-3-Clause
.. image:: https://github.com/scikit-bio/scikit-bio/actions/workflows/ci.yml/badge.svg
:alt: Build Status
:target: https://github.com/scikit-bio/scikit-bio/actions/workflows/ci.yml
.. image:: https://codecov.io/gh/scikit-bio/scikit-bio/graph/badge.svg?token=1qbzC6d2F5
:alt: Coverage Status
:target: https://codecov.io/gh/scikit-bio/scikit-bio
.. image:: https://img.shields.io/badge/benchmarked%20by-asv-green.svg
:alt: ASV Benchmarks
:target: https://s3-us-west-2.amazonaws.com/scikit-bio.org/benchmarks/main/index.html
.. image:: https://img.shields.io/github/v/release/scikit-bio/scikit-bio.svg
:alt: Release
:target: https://github.com/scikit-bio/scikit-bio/releases
.. image:: https://img.shields.io/pypi/dm/scikit-bio.svg?label=PyPI%20downloads
:alt: PyPI Downloads
:target: https://pypi.org/project/scikit-bio/
.. image:: https://img.shields.io/conda/dn/conda-forge/scikit-bio.svg?label=Conda%20downloads
:alt: Conda Downloads
:target: https://anaconda.org/conda-forge/scikit-bio
.. image:: https://badges.gitter.im/Join%20Chat.svg
:alt: Gitter
:target: https://gitter.im/biocore/scikit-bio

|license| |build| |coverage| |bench| |release| |pypi| |conda| |gitter|

.. image:: logos/logo.svg
:width: 600 px
Expand Down Expand Up @@ -147,3 +123,29 @@ Pre-history
scikit-bio began from code derived from `PyCogent <https://github.com/pycogent/pycogent>`_ and `QIIME <https://github.com/biocore/qiime>`_, and the contributors and/or copyright holders have agreed to make the code they wrote for PyCogent and/or QIIME available under the BSD license. The contributors to PyCogent and/or QIIME modules that have been ported to scikit-bio are listed below:

- Rob Knight (@rob-knight), Gavin Huttley (@gavinhuttley), Daniel McDonald (@wasade), Micah Hamady, Antonio Gonzalez (@antgonza), Sandra Smit, Greg Caporaso (@gregcaporaso), Jai Ram Rideout (@jairideout), Cathy Lozupone (@clozupone), Mike Robeson (@mikerobeson), Marcin Cieslik, Peter Maxwell, Jeremy Widmann, Zongzhi Liu, Michael Dwan, Logan Knecht (@loganknecht), Andrew Cochran, Jose Carlos Clemente (@cleme), Damien Coy, Levi McCracken, Andrew Butterfield, Will Van Treuren (@wdwvt1), Justin Kuczynski (@justin212k), Jose Antonio Navas Molina (@josenavas), Matthew Wakefield (@genomematt) and Jens Reeder (@jensreeder).


.. |license| image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg
:alt: License
:target: https://opensource.org/licenses/BSD-3-Clause
.. |build| image:: https://github.com/scikit-bio/scikit-bio/actions/workflows/ci.yml/badge.svg
:alt: Build Status
:target: https://github.com/scikit-bio/scikit-bio/actions/workflows/ci.yml
.. |coverage| image:: https://codecov.io/gh/scikit-bio/scikit-bio/graph/badge.svg?token=1qbzC6d2F5
:alt: Coverage Status
:target: https://codecov.io/gh/scikit-bio/scikit-bio
.. |bench| image:: https://img.shields.io/badge/benchmarked%20by-asv-green.svg
:alt: ASV Benchmarks
:target: https://s3-us-west-2.amazonaws.com/scikit-bio.org/benchmarks/main/index.html
.. |release| image:: https://img.shields.io/github/v/release/scikit-bio/scikit-bio.svg
:alt: Release
:target: https://github.com/scikit-bio/scikit-bio/releases
.. |pypi| image:: https://img.shields.io/pypi/dm/scikit-bio.svg?label=PyPI%20downloads
:alt: PyPI Downloads
:target: https://pypi.org/project/scikit-bio/
.. |conda| image:: https://img.shields.io/conda/dn/conda-forge/scikit-bio.svg?label=Conda%20downloads
:alt: Conda Downloads
:target: https://anaconda.org/conda-forge/scikit-bio
.. |gitter| image:: https://badges.gitter.im/Join%20Chat.svg
:alt: Gitter
:target: https://gitter.im/biocore/scikit-bio
7 changes: 7 additions & 0 deletions doc/source/_static/css/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ https://pydata-sphinx-theme.readthedocs.io/en/stable/user_guide/styling.html

html {
--pst-icon-external-link: unset;

--pst-font-size-h1: 2rem;
--pst-font-size-h2: 1.5rem;
--pst-font-size-h3: 1.25rem;
--pst-font-size-h4: 1.1rem;
--pst-font-size-h5: 1.0rem;
--pst-font-size-h6: 1.0rem;
}


Expand Down
10 changes: 10 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,16 @@
}


# -- numpydoc configuration --------------------------------------------------

# References:
# https://numpydoc.readthedocs.io/en/latest/install.html#configuration

numpydoc_class_members_toctree = False
numpydoc_show_class_members = False
numpydoc_show_inherited_class_members = False


# -- PyData Theme configuration ----------------------------------------------

# References:
Expand Down
9 changes: 3 additions & 6 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,24 +54,21 @@
About <https://scikit.bio/about.html>


scikit-bio |version|
====================
scikit-bio |version| documentation
==================================

scikit-bio (canonically pronounced *sigh-kit-buy-oh*) is a library for working with biological data in Python 3. scikit-bio is open source, BSD-licensed software that is currently under active development.

API Reference
-------------

.. toctree::
:maxdepth: 2

io
sequence
alignment
tree
workflow
diversity
stats
table
metadata
workflow
util
43 changes: 25 additions & 18 deletions skbio/alignment/__init__.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,26 @@
r"""Alignments (:mod:`skbio.alignment`)
===================================
r"""Sequence Alignments (:mod:`skbio.alignment`)
============================================
.. currentmodule:: skbio.alignment
This module provides functionality for computing and manipulating sequence
alignments. DNA, RNA, and protein sequences can be aligned, as well as
sequences with custom alphabets.
Data Structures
---------------
Alignment structure
-------------------
.. autosummary::
:toctree: generated/
TabularMSA
Optimized (i.e., production-ready) Alignment Algorithms
-------------------------------------------------------
Alignment algorithms
--------------------
.. rubric:: Optimized (i.e., production-ready) algorithms
.. autosummary::
:toctree: generated/
Expand All @@ -25,8 +29,7 @@
AlignmentStructure
local_pairwise_align_ssw
Slow (i.e., educational-purposes only) Alignment Algorithms
-----------------------------------------------------------
.. rubric:: Slow (i.e., educational-purposes only) algorithms
.. autosummary::
:toctree: generated/
Expand All @@ -38,16 +41,21 @@
local_pairwise_align_protein
local_pairwise_align
General functionality
---------------------
Deprecated functionality
^^^^^^^^^^^^^^^^^^^^^^^^
.. autosummary::
:toctree: generated/
make_identity_substitution_matrix
Data Structure Examples
-----------------------
Tutorial
--------
Alignment data structure
^^^^^^^^^^^^^^^^^^^^^^^^
Load two DNA sequences that have been previously aligned into a ``TabularMSA``
object, using sequence IDs as the MSA's index:
Expand All @@ -67,11 +75,9 @@
>>> msa.index
Index(['seq1', 'seq2'], dtype='object')
Alignment Algorithm Examples
----------------------------
Using the optimized alignment algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Optimized Alignment Algorithm Examples
--------------------------------------
Using the convenient ``local_pairwise_align_ssw`` function:
>>> from skbio.alignment import local_pairwise_align_ssw
Expand Down Expand Up @@ -131,8 +137,9 @@
>>> print(alignments[0].aligned_target_sequence)
ACT-AGGCTCCCTTCTACCCCTCTCAGAGA
Slow Alignment Algorithm Examples
---------------------------------
Using the slow alignment algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scikit-bio also provides pure-Python implementations of Smith-Waterman and
Needleman-Wunsch alignment. These are much slower than the methods described
above, but serve as useful educational examples as they're simpler to
Expand Down
134 changes: 77 additions & 57 deletions skbio/diversity/__init__.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,69 @@
"""Diversity calculations (:mod:`skbio.diversity`)
===============================================
r"""Community Diversity (:mod:`skbio.diversity`)
============================================
.. currentmodule:: skbio.diversity
This package provides functionality for analyzing biological diversity. It
implements metrics of alpha and beta diversity, and provides two "driver
functions" that are intended to be the primary interface for computing alpha
and beta diversity with scikit-bio. Functions are additionally provided that
support discovery of the available diversity metrics. This document provides a
high-level discussion of how to work with the ``skbio.diversity`` module, and
should be the first document you read before working with the module.
Driver functions
----------------
The driver functions, ``skbio.diversity.alpha_diversity`` and
``skbio.diversity.beta_diversity``, are designed to compute alpha diversity for
one or more samples, or beta diversity for one or more pairs of samples. The
diversity driver functions accept a matrix containing vectors of frequencies of
taxa within each sample.
The term "taxon" (plural: "taxa") describes a group of biologically related
organisms that constitute a unit in the community. Taxa are usually defined at
a uniform taxonomic rank, such as species, genus or family. In community
ecology, taxon is usually referred to as "species" (singular = plural), but its
definition is not limited to species as a taxonomic rank. The term "taxonomic
group" is a synonym of taxon in many situations.
This module provides functionality for analyzing biodiversity of communities
-- groups of organisms living in the same area. It implements various metrics
of alpha (within-community) and beta (between-community) diversity, and
provides "driver functions" for computing alpha and beta diversity for an
entire data table. Additional utilities are provided to support discovery of
available diversity metrics. While diversity metrics were originally designed
to study biological communities, they can be generalized to the analysis of
various biological data types.
Alpha diversity
---------------
.. rubric:: Alpha diversity metrics
.. autosummary::
:toctree: generated/
alpha
get_alpha_diversity_metrics
.. rubric:: Driver function
.. autosummary::
:toctree: generated/
alpha_diversity
Beta diversity
--------------
.. rubric:: Beta diversity metrics
.. autosummary::
:toctree: generated/
beta
get_beta_diversity_metrics
.. rubric:: Driver functions
.. autosummary::
:toctree: generated/
beta_diversity
partial_beta_diversity
block_beta_diversity
Introduction
------------
A community (i.e., sample) is represented by a vector of frequencies of taxa
within the sample. The term "taxon" (plural: "taxa") describes a group of
biologically related organisms that constitute a unit in the community. Taxa
are usually defined at a uniform taxonomic rank, such as species, genus or
family. In community ecology, taxon is usually referred to as "species"
(singular = plural), but its definition is not limited to species as a
taxonomic rank. The term "taxonomic group" is a synonym of taxon in many
situations.
In scikit-bio, the term "taxon/taxa" is used very loosely, as these can in
practice represent diverse feature types including organisms, genes, and
Expand All @@ -50,16 +88,19 @@
single sample as a *counts vector* or ``counts`` throughout the documentation.
Counts vectors are `array_like`: anything that can be converted into a 1-D
numpy array is acceptable input. For example, you can provide a numpy array or
a native Python list and the results will be identical. As mentioned above, the
driver functions accept one or more of these vectors (representing one or more
samples) in a matrix which is also `array_like`. Each row in the matrix
represents a single sample's count vector, so that rows represent samples and
columns represent taxa.
a native Python list and the results will be identical.
The driver functions :func:`alpha_diversity` and :func:`beta_diversity` are
designed to compute alpha diversity for one or more samples, or beta diversity
for one or more pairs of samples. The driver functions accept a matrix
containing vectors of frequencies of taxa within each sample. Each row in the
matrix represents a single sample's count vector, so that rows represent
samples and columns represent taxa.
Some diversity metrics incorporate relationships between the taxa in their
computation through reference to a phylogenetic tree. These metrics
additionally take a ``skbio.TreeNode`` object and a list of taxa mapping the
values in the counts vector to tips in the tree.
additionally take a :class:`skbio.TreeNode` object and a list of taxa mapping
the values in the counts vector to tips in the tree.
The driver functions are optimized so that computing a diversity metric more
than one time (i.e., for more than one sample for alpha diversity metrics, or
Expand All @@ -72,7 +113,7 @@
compute beta diversity for all pairs of counts vectors in the matrix.
Input validation
----------------
^^^^^^^^^^^^^^^^
The driver functions perform validation of input by default. Validation can be
slow so it is possible to disable this step by passing ``validate=False``. This
Expand Down Expand Up @@ -102,7 +143,7 @@
* all provided taxa correspond to tip names in the provided tree
Count vectors
-------------
^^^^^^^^^^^^^
There are different ways that count vectors are represented in the ecological
literature and in related software. The diversity measures provided here
Expand Down Expand Up @@ -131,7 +172,7 @@
Always use the first representation (a counts vector) with this module.
Specifying a diversity metric
-----------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The driver functions take a parameter, ``metric``, that specifies which
diversity metric should be applied. The value that you provide for ``metric``
Expand Down Expand Up @@ -160,29 +201,8 @@
passed as strings which won't be listed here, such as those implemented in
``scipy.spatial.distance.pdist``.
Subpackages
-----------
.. autosummary::
:toctree: generated/
alpha
beta
Functions
---------
.. autosummary::
:toctree: generated/
alpha_diversity
beta_diversity
partial_beta_diversity
block_beta_diversity
get_alpha_diversity_metrics
get_beta_diversity_metrics
Examples
Tutorial
--------
Create a matrix containing 6 samples (rows) and 7 taxa (columns):
Expand Down
Loading

0 comments on commit caf13c1

Please sign in to comment.