diff --git a/README.md b/README.md index 1b05c63..b800f94 100644 --- a/README.md +++ b/README.md @@ -76,32 +76,8 @@ optional arguments: -h, --help show this help message and exit -t THREADS, --threads THREADS Number of threads to use (default: 1) - --reference-trim-length TRIM_ENDS - Exclude n bases at the ends of the reference sequences (default: 0) - --trim-min TRIM_MIN Remove coverage that are below this percentile. Used for the Truncated Average - Depth (TAD) calculation (default: 10) - --trim-max TRIM_MAX Remove coverage that are above this percentile. Used for the Truncated Average - Depth (TAD) calculation (default: 90) -p PREFIX, --prefix PREFIX Prefix used for the output files (default: None) - -A MIN_READ_ANI, --min-read-ani MIN_READ_ANI - Minimum read ANI to keep a read (default: 90.0) - -l MIN_READ_LENGTH, --min-read-length MIN_READ_LENGTH - Minimum read length (default: 30) - -n MIN_READ_COUNT, --min-read-count MIN_READ_COUNT - Minimum read count (default: 10) - -b MIN_EXPECTED_BREADTH_RATIO, --min-expected-breadth-ratio MIN_EXPECTED_BREADTH_RATIO - Minimum expected breadth ratio (default: 0.5) - -e MIN_NORM_ENTROPY, --min-normalized-entropy MIN_NORM_ENTROPY - Minimum normalized entropy (default: auto) - -g MIN_NORM_GINI, --min-normalized-gini MIN_NORM_GINI - Minimum normalized Gini coefficient (default: None) - -B MIN_BREADTH, --min-breadth MIN_BREADTH - Minimum breadth (default: 0) - -a MIN_AVG_READ_ANI, --min-avg-read-ani MIN_AVG_READ_ANI - Minimum average read ANI (default: 90.0) - -c MIN_COVERAGE_EVENNESS, --min-coverage-evenness MIN_COVERAGE_EVENNESS - Minimum coverage evenness (default: 0) -m SORT_MEMORY, --sort-memory SORT_MEMORY Set maximum memory per thread for sorting; suffix K/M/G recognized (default: 1G) @@ -122,6 +98,34 @@ optional arguments: Chunk size for parallel processing (default: None) --debug Print debug messages (default: False) --version Print program version + +filtering arguments: + -A MIN_READ_ANI, --min-read-ani MIN_READ_ANI + Minimum read ANI to keep a read (default: 90.0) + -l MIN_READ_LENGTH, --min-read-length MIN_READ_LENGTH + Minimum read length (default: 30) + -n MIN_READ_COUNT, --min-read-count MIN_READ_COUNT + Minimum read count (default: 10) + -b MIN_EXPECTED_BREADTH_RATIO, --min-expected-breadth-ratio MIN_EXPECTED_BREADTH_RATIO + Minimum expected breadth ratio (default: 0.5) + -e MIN_NORM_ENTROPY, --min-normalized-entropy MIN_NORM_ENTROPY + Minimum normalized entropy (default: auto) + -g MIN_NORM_GINI, --min-normalized-gini MIN_NORM_GINI + Minimum normalized Gini coefficient (default: None) + -B MIN_BREADTH, --min-breadth MIN_BREADTH + Minimum breadth (default: 0) + -a MIN_AVG_READ_ANI, --min-avg-read-ani MIN_AVG_READ_ANI + Minimum average read ANI (default: 90.0) + -c MIN_COVERAGE_EVENNESS, --min-coverage-evenness MIN_COVERAGE_EVENNESS + Minimum coverage evenness (default: 0) + +miscellaneous arguments: + --reference-trim-length TRIM_ENDS + Exclude n bases at the ends of the reference sequences (default: 0) + --trim-min TRIM_MIN Remove coverage that are below this percentile. Used for the Truncated Average + Depth (TAD) calculation (default: 10) + --trim-max TRIM_MAX Remove coverage that are above this percentile. Used for the Truncated Average + Depth (TAD) calculation (default: 90) ``` One would run filterBAM as: @@ -169,6 +173,8 @@ The program will produce two main outputs: - **max_covered_bases**: Maximum number of bases covered in the reference - **mean_covered_bases**: Average number of bases covered in the reference - **coverage_mean**: Mean depth of the reference + - **coverage_mean_trunc**: Mean depth of the reference after removing the 10% and 90% of the coverage values (TAD80, default) + - **coverage_mean_trunc_len**: Length of the reference after being truncated by the TAD(X) values - **coverage_covered_mean**: Mean depth of the reference only counting covered bases - **reference_length**: Real reference length - **bam_reference_length**: Length reported by the BAM file @@ -186,6 +192,8 @@ The program will produce two main outputs: - **cov_evenness**: Eveness of coverage as calculated [here](https://www.nature.com/articles/jhg201621). - **tax_abund_read**: Counts estimated using the number of reads and normalized by the reference length. - **tax_abund_aln**: Counts estimated using the number of alignments and normalized by the reference length. + - **tax_abund_tad**: Counts estimated using the estimated number of reads in the TAD region and normalized by the length of the TAD region + - **n_reads_tad**: Number of reads estimated in the TAD region using the formula *C = LN / G*, where C stands for the TAD coverage, N for the length of the TAD region and L for the average read length mapped to the reference. ## Applications and recommendations