Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/hzi-bifo/Haploflow
Browse files Browse the repository at this point in the history
  • Loading branch information
AlphaSquad committed Apr 6, 2021
2 parents e535e65 + 3e8f0b4 commit 8becaec
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,6 @@ reused in another run. This file is possibly huge (because uncompressed), so use
## Toy data set
There is a small test data set of reads for three HIV strains added alongside Haploflow, `HIV_3_toy.fq`
After compiling Haploflow, you can assemble this data set using the following simple command: `./haploflow --read-file ../HIV_3_toy.fq --out test --log test/log`
If everything worked, the assembly of this data set should take about 1 minute and produce a fodler called `out`, containing a fasta-file called `contigs.fa` containing three contigs, a sub-folder called `Coverages` containing the coverage distributions of all connected components. Since the thre HIV strains are closely related, this is only one single file `Cov0.tsv`, containing tab-separated the coverage of a k-mer and the number of k-mers with that coverage. The second sub-folder is `Graphs`, containing the initial unitig graph (`Graph.dot`) as well as all temporary assembly graphs after each path removal step (`Graph0.dot` to `Graph13.dot`). Finally the log of Haploflow is stored in the file `log`, printing the used options and the individual steps of Haploflow.
If everything worked, the assembly of this data set should take about 1 minute and produce a folder called `out`, containing a fasta-file called `contigs.fa` containing three contigs, a sub-folder called `Coverages` containing the coverage distributions of all connected components. Since the thre HIV strains are closely related, this is only one single file `Cov0.tsv`, containing tab-separated the coverage of a k-mer and the number of k-mers with that coverage. The second sub-folder is `Graphs`, containing the initial unitig graph (`Graph.dot`) as well as all temporary assembly graphs after each path removal step (`Graph0.dot` to `Graph13.dot`). Finally the log of Haploflow is stored in the file `log`, printing the used options and the individual steps of Haploflow.
The format of contigs produced by Haploflow in the fasta-file is `Contig_CONTIGNUMBER_flow_FLOWVALUE_cc_CONNECTEDCOMPONENT`; the abundance of individual strains/contigs is stored in `FLOWVALUE`.
You can then test different *k*-mer and error-correction settings for further testing or move on to your own data sets.

0 comments on commit 8becaec

Please sign in to comment.