This repo tracks policies, management and communication for MetaSUB Bioinformatics.
This is an appropriate place to ask questions and make suggestions. Please use GitHub's issue feature to do so.
We use a number of programs for bioinformatics at MetaSUB. Everyone in MetaSUB is welcome to contribute to any relevant codebase including filing issues.
MetaSUB Data Packet v1.2.0, Analyzed data tables for downstream analysis
Metadata for Sequenced Samples, A fully reproducible metadata table for all sequenced samples.
The Core Analysis Pipeline, A pipeline for analysis of short read metagenomic data.
The Core QC Pipeline, A pipeline to perform quality control of samples.
Data Table Generator, A tool to build data tables from the output of the CAP
Main Paper Figures, Code used to build figures in the MetaSUB manuscript
Bioinformatics Management, Policies, and Communications
Utility Scripts, Used for managing metasub data, transferring storage, and handling analysis. Serves as a catch all for small programs.
We provide the results of the CAP as a set of data tables that can be easily analyzed using tools like Python, R, and Excel. These data tables include information on taxonomy, functional profiles, and resistomes for MetaSUB Samples. The data packet can be accessed via Dropbox or on GitHub (private repo), please contact David Danko or Chris Mason for access.
The MetaSUB data packet includes a curated subset of the data useful for teching or demos. This packet includes 8 samples from each of 16 cities with complete metadata and results.
We are storing the MetaSUB data on Wasabi Storage, a service that is functionally identical to Amazon S3. We have a single bucket on Wasabi that contains:
- Raw Sequence Data
- CAP Outputs
- Assemblies
- Data Packet
We maintain utilities to download data from this service. Please contact David Danko for API keys.
See this file for more information on how MetaSUB's data is stored.
MetaGenScope is available for automated visualization MetaSUB data. All MetaSUB cities with sufficient data have been visualized and may be accessed with appropriate credentials.
It is possible to create arbitrary sample groups. Please contact David Danko for access and questions.
There are no plans to make the data publicly available until after the consortium has published a manuscript. Any data published will have human sequence scrubbed.