Update README.md

Gibbons-Lab · Oct 11, 2021 · a1144a6 · a1144a6
1 parent 0754fa0
commit a1144a6
Showing 1 changed file with 21 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,2 +1,21 @@
-# 2021_microbiome_course_data
-Additional data and cache for the ISB Virtual Microbiome Series
+## 2021 ISB Virtual Microbiome Symposium   
+# Day 2 course data
+
+This repository contains cached data and processing steps for day 2 of the symposium. This is split into two major pipelines: [1] Obtaining data from the BioML data set and processing it into assemblies and [2] gnerating carveME reconstructions for all assemblies with decent GTDB assignments.
+
+**Required compute:** About 1000 CPU hours
+
+### Obtaining data and processing it
+
+This is all wrapped into a nextflow pipeline which is provided along with this repository: [assemly.nf](assembly.nf). There is [conda environment file](conda.yml) to set up all required dependencies. it covers the following steps.
+
+1. Downloading the first 1000 isolate genomes from the [BioML paper](https://doi.org/10.1038/s41591-019-0559-3).
+2. Quality filtering and trimming with FASTP.
+3. Assembly with MEGAHIT.
+4. Taxonomic placement with the GTDB toolkit.
+
+After that the data is curated by hand to remove isolates with no clear GTDB bacterial assignment. This is contained in an [Rstudio notebook](curation.rmd). This will leave a little less than 980 assemblies.
+
+### Model reconstruction
+
+This done using the [Gibbons Lab model builder pipeline](https://github.com/Gibbons-Lab/pipelines/tree/master/model_builder). The required media are provided in the repository as well.