diff --git a/docs/api/data.rst b/docs/api/data.rst new file mode 100644 index 00000000..810630d8 --- /dev/null +++ b/docs/api/data.rst @@ -0,0 +1,29 @@ +=========== +Data +=========== +.. module:: cassiopeia.data +.. currentmodule:: cassiopeia + +CassiopeiaTrees +~~~~~~~~~~~~~~~~~~~ + +The main data structure that Cassiopeia uses for all tree-based analyses is the CassiopeiaTree: + +.. autosummary:: + :toctree: reference/ + + data.CassiopeiaTree + +Utilities +~~~~~~~~~~~~~~~~~~~ + +We also have several utilities that are useful for working with various data related to phylogenetics: + +.. autosummary:: + :toctree: reference/ + + data.compute_dissimilarity_map + data.get_lca_characters + data.sample_bootstrap_allele_tables + data.sample_bootstrap_character_matrices + data.to_newick \ No newline at end of file diff --git a/docs/api/index.rst b/docs/api/index.rst new file mode 100644 index 00000000..63d25871 --- /dev/null +++ b/docs/api/index.rst @@ -0,0 +1,17 @@ +=== +API +=== + + +Import Cassiopeia as:: + + import cassiopeia as cas + +.. toctree:: + :maxdepth: 1 + + preprocess + data + solver + simulator + plotting \ No newline at end of file diff --git a/docs/api/plotting.rst b/docs/api/plotting.rst new file mode 100644 index 00000000..c738dba3 --- /dev/null +++ b/docs/api/plotting.rst @@ -0,0 +1,18 @@ +========== +Plotting +========== + +.. module:: cassiopeia.pl +.. currentmodule:: cassiopeia + +Plotting +~~~~~~~~~~~~~~~~~~~ + +Currently, our plotting functionality is linked to the rich iTOL framework: + +.. autosummary:: + :toctree: reference/ + + pl.upload_and_export_itol + + \ No newline at end of file diff --git a/docs/api/preprocess.rst b/docs/api/preprocess.rst new file mode 100644 index 00000000..f79eb921 --- /dev/null +++ b/docs/api/preprocess.rst @@ -0,0 +1,43 @@ +=========== +Preprocess +=========== +.. module:: cassiopeia.pp +.. currentmodule:: cassiopeia + +Data Preprocessing +~~~~~~~~~~~~~~~~~~~ + +We have several functions that are part of our pipeline for processing sequencing data from single-cell lineage tracing technologies: + +.. autosummary:: + :toctree: reference/ + + pp.align_sequences + pp.call_alleles + pp.call_lineage_groups + pp.collapse_umis + pp.convert_fastqs_to_unmapped_bam + pp.error_correct_cellbcs_to_whitelist + pp.error_correct_intbcs_to_whitelist + pp.error_correct_umis + pp.filter_bam + pp.filter_molecule_table + pp.filter_cells + pp.filter_umis + pp.resolve_umi_sequence + + + + +Data Utilities +~~~~~~~~~~~~~~~~~~~ + +We also have several functions that are useful for converting between data formats for downstream analyses: + +.. autosummary:: + :toctree: reference/ + + pp.compute_empirical_indel_priors + pp.convert_alleletable_to_character_matrix + pp.convert_alleletable_to_lineage_profile + pp.convert_lineage_profile_to_character_matrix \ No newline at end of file diff --git a/docs/api/simulator.rst b/docs/api/simulator.rst new file mode 100644 index 00000000..45d16f96 --- /dev/null +++ b/docs/api/simulator.rst @@ -0,0 +1,41 @@ +=========== +Simulator +=========== +.. module:: cassiopeia.sim +.. currentmodule:: cassiopeia + + +Our simulators for cassiopeia are split up into those that simulate topologies and those that simulate data on top of the topologies. + +Tree Simulators +~~~~~~~~~~~~~~~~~~~ + +We have several frameworks available for simulating topologies: + +.. autosummary:: + :toctree: reference/ + + sim.BirthDeathFitnessSimulator + sim.CompleteBinarySimulator + sim.SimpleFitSubcloneSimulator + + +Data Simulators +~~~~~~~~~~~~~~~~~~~ + +These simulators are subclasses of the `DataSimulator` class and implement the `overlay_data` method which simulates data according to a given topology. + +.. autosummary:: + :toctree: reference/ + + sim.Cas9LineageTracingDataSimulator + +Leaf SubSamplers +~~~~~~~~~~~~~~~~~~~ +These are utilities for subsampling lineages for benchmarking purposes. For example, sampling a random proportion of leaves or grouping together cells into clades to model spatial data. + +.. autosummary:: + :toctree: reference/ + + sim.SupercellularSampler + sim.UniformLeafSubsampler \ No newline at end of file diff --git a/docs/api/solver.rst b/docs/api/solver.rst new file mode 100644 index 00000000..1215dac1 --- /dev/null +++ b/docs/api/solver.rst @@ -0,0 +1,41 @@ +=========== +Solver +=========== +.. module:: cassiopeia.solver +.. currentmodule:: cassiopeia + +CassiopeiaSolvers +~~~~~~~~~~~~~~~~~~~ + +We have several algorithms available for solving phylogenies: + +.. autosummary:: + :toctree: reference/ + + solver.HybridSolver + solver.ILPSolver + solver.MaxCutSolver + solver.MaxCutGreedySolver + solver.NeighborJoiningSolver + solver.PercolationSolver + solver.SharedMutationJoiningSolver + solver.SpectralSolver + solver.SpectralGreedySolver + solver.UPGMASolver + solver.VanillaGreedySolver + + +Dissimilarity Maps +~~~~~~~~~~~~~~~~~~~ + +For use in our distance-based solver and for comparing character states, we also have available several dissimilarity functions: + +.. autosummary:: + :toctree: reference/ + + solver.dissimilarity_functions.cluster_dissimilarity + solver.dissimilarity_functions.hamming_distance + solver.dissimilarity_functions.hamming_similarity_normalized_over_missing + solver.dissimilarity_functions.hamming_similarity_without_missing + solver.dissimilarity_functions.weighted_hamming_distance + solver.dissimilarity_functions.weighted_hamming_similarity \ No newline at end of file diff --git a/notebooks/benchmark.ipynb b/notebooks/benchmark.ipynb index b695fd38..d005cab3 100755 --- a/notebooks/benchmark.ipynb +++ b/notebooks/benchmark.ipynb @@ -8,9 +8,9 @@ "\n", "This notebook serves as an entry point for understanding how to interface with Cassiopeia for the purposes of simulating trees, data, benchmarking algorithms.\n", "\n", - "You can install Cassiopeia by following the guide [here](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/installation).\n", + "You can install Cassiopeia by following the guide [here](https://cassiopeia-lineage.readthedocs.io/en/latest/installation).\n", "\n", - "All of our documentation is hosted [here](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/)." + "All of our documentation is hosted [here](https://cassiopeia-lineage.readthedocs.io/en/latest/)." ] }, { @@ -50,7 +50,7 @@ "\n", "We can use a simple birth-death model with fitness to simulate trees.\n", "\n", - "Specifically, this is a continuous-time birth-death process in which birth and death events are sampled from indepedent waiting distributions. Importnatly, we can incorporate fitness into this framework by modulating the `scale` of the birth waiting distribution. This is done by sampling a random number of fitness events per generation, each with a fitness effect drawn from a distribution. The documentation for this class can be found [here](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/reference/cassiopeia.sim.BirthDeathFitnessSimulator.html). " + "Specifically, this is a continuous-time birth-death process in which birth and death events are sampled from indepedent waiting distributions. Importnatly, we can incorporate fitness into this framework by modulating the `scale` of the birth waiting distribution. This is done by sampling a random number of fitness events per generation, each with a fitness effect drawn from a distribution. The documentation for this class can be found [here](https://cassiopeia-lineage.readthedocs.io/en/latest/api/reference/cassiopeia.sim.BirthDeathFitnessSimulator.html). " ] }, { @@ -204,7 +204,7 @@ "\n", "Cassiopeia has implemented several CassiopeiaSolvers for reconstructing trees. Each of these can take in several class-specific parameters and at a minimum implements the `solve` routine which operates on a CassiopeiaTree. \n", "\n", - "The full list of solvers can be found [here](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/solver.html). For a full tutorial on tree reconstruction, refer to the [Tree Reconstruction notebook](https://github.com/YosefLab/Cassiopeia/blob/testdeployment/notebooks/reconstruct.ipynb).\n", + "The full list of solvers can be found [here](https://cassiopeia-lineage.readthedocs.io/en/latest/api/solver.html). For a full tutorial on tree reconstruction, refer to the [Tree Reconstruction notebook](https://github.com/YosefLab/Cassiopeia/blob/latest/notebooks/reconstruct.ipynb).\n", "\n", "Here we use the VanillaGreedySolver, which was described in the [Cassiopeia paper published in 2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02000-8)." ] diff --git a/notebooks/preprocess.ipynb b/notebooks/preprocess.ipynb index 93b06a4d..3564a841 100644 --- a/notebooks/preprocess.ipynb +++ b/notebooks/preprocess.ipynb @@ -24,7 +24,7 @@ "\n", "\n", "## Pipeline API\n", - "All of the key modules of the preprocessing pipeline can be invoked by a call from `cassiopeia.pp`. Assuming the user would like to begin at the beginning of the pipeline, we'll start with the `convert` stage. You can find all documentation on our [main site](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/).\n", + "All of the key modules of the preprocessing pipeline can be invoked by a call from `cassiopeia.pp`. Assuming the user would like to begin at the beginning of the pipeline, we'll start with the `convert` stage. You can find all documentation on our [main site](https://cassiopeia-lineage.readthedocs.io/en/latest/).\n", "\n", "An alternative to running the pipeline interactively is to take advantage of the command line tool `cassiopeia-preprocess`, which takes in a configuration file (for example in Cassiopeia/data/preprocess.cfg) and runs the pipeline end-to-end. For example, if you have a config called `example_config.cfg`, this can be invoked from the command line with:\n", "\n", @@ -333,7 +333,7 @@ "\n", "The `min_umi_per_cell` and `min_avg_reads_per_umi` behave the same as the \"resolve\" step.\n", "\n", - "See the [documentation](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/reference/cassiopeia.pp.filter_molecule_table.html#cassiopeia.pp.filter_molecule_table) for more details." + "See the [documentation](https://cassiopeia-lineage.readthedocs.io/en/latest/api/reference/cassiopeia.pp.filter_molecule_table.html#cassiopeia.pp.filter_molecule_table) for more details." ] }, { @@ -365,7 +365,7 @@ "\n", "The `min_umi_per_cell` and `min_avg_reads_per_umi` behave the same as the \"resolve\" step.\n", "\n", - "See the [documentation](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/reference/cassiopeia.pp.call_lineage_groups.html#cassiopeia.pp.call_lineage_groups) for more details." + "See the [documentation](https://cassiopeia-lineage.readthedocs.io/en/latest/api/reference/cassiopeia.pp.call_lineage_groups.html#cassiopeia.pp.call_lineage_groups) for more details." ] }, { diff --git a/notebooks/reconstruct.ipynb b/notebooks/reconstruct.ipynb index 28a50c61..cde32200 100644 --- a/notebooks/reconstruct.ipynb +++ b/notebooks/reconstruct.ipynb @@ -6,7 +6,7 @@ "source": [ "# Reconstructing trees with Cassiopeia\n", "\n", - "Cassiopeia offers several utilities for reconstructing phylogenies, carrying users from the allele tables they've created in the [benchmarking tutorial]() to the full phylogeneis. This tutorial serves as a general overview of the tools that Cassiopeia offers for tree reconstruction." + "Cassiopeia offers several utilities for reconstructing phylogenies, carrying users from the allele tables they've created in the preprocessing tutorial to the full phylogeneis. This tutorial serves as a general overview of the tools that Cassiopeia offers for tree reconstruction." ] }, { @@ -1115,7 +1115,7 @@ "source": [ "### Creating and working with CassiopeiaSolvers\n", "\n", - "As mentioned previously, Cassiopeia works with a general class of CassiopeiaSolvers. We have implemented several solvers, which you can find [here](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/solver.html).\n", + "As mentioned previously, Cassiopeia works with a general class of CassiopeiaSolvers. We have implemented several solvers, which you can find [here](https://cassiopeia-lineage.readthedocs.io/en/latest/api/solver.html).\n", "\n", "Perhaps the most popular are the `VanillaGreedySolver`, `ILPSolver`, `HybridSolver`, and `NeighborJoiningSolver`. Here, we'll provide a quick overview of each of these.\n", "\n", @@ -1272,7 +1272,7 @@ "\n", "The `ILPSolver` is an implementaion of Steiner-Tree approach described in [Jones et al, 2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02000-8). The constructor takes in several options controlling the size and complexity of the potential graph to infer as well as stopping criteria for the integer-linear program (ILP) optimization routine.\n", "\n", - "There are several parameters of interest which can all be explored on our [documentation website](https://cassiopeia-lineage.readthedocs.io/en/testdeployment/api/reference/cassiopeia.solver.ILPSolver.html#cassiopeia.solver.ILPSolver). Because this process can take a long time, we'll restrict the potential graph layer size to 500 nodes and the convergence time to 500s. A more realistic solver might use our defaults - namely, a maximum potential graph layer size of 10,000 and a convergence time of 12,600s (3.5hr).\n", + "There are several parameters of interest which can all be explored on our [documentation website](https://cassiopeia-lineage.readthedocs.io/en/latest/api/reference/cassiopeia.solver.ILPSolver.html#cassiopeia.solver.ILPSolver). Because this process can take a long time, we'll restrict the potential graph layer size to 500 nodes and the convergence time to 500s. A more realistic solver might use our defaults - namely, a maximum potential graph layer size of 10,000 and a convergence time of 12,600s (3.5hr).\n", "\n", "The `ILPSolver` logs the progress of the potential graph inference and optimization in a user-defined logfile (by default, `stdout.log`). This logfile will also be output here.\n", "\n",