Skip to content

Coverage metrics

Clay McLeod edited this page Sep 29, 2022 · 2 revisions

The Coverage Metrics facet reports statistics related to coverage. Statistics are generally reported on a per-sequence basis for the supported sequences. The report is delivered at under the coverage key within the results.json file. You can easily examine the output of the general facet by using jq:

cat results.json | jq .coverage

Outputs

This facet has the following top-level keys,

Key Description
mean_coverage Mean coverage for a given supported sequence (as specified by the key).
median_coverage Median coverage for a given supported sequence (as specified by the key).
median_over_mean_coverage Median coverage divided by mean coverage for a given supported sequence (as specified by the key).
ignored Statistics on how many positions were ignored (usually because they had too high coverage for our histogram to support).
histograms One histogram per supported sequence that shows the distribution of coverage across positions on that sequence.

Mean coverage

Mean coverage for a given supported sequence (as specified by the key).

Median coverage

Median coverage for a given supported sequence (as specified by the key).

Median over mean coverage

The median over mean (MoM) coverage for a given supported sequence (as specified by the key). The MoM is simply the median coverage divided by the mean coverage for the given sequence. This is an important statistic in determining whether a sequence is evenly covered.

  • If the MoM is less than 1, then the mean is (potentially much) greater than the median, suggesting there are areas of extremely high-depth coverage that are pulling the mean up. In other words, the sequence may be over-covered in some areas.
  • If the MoM is greater than 1, then the mean is (potentially much) less than the median, suggesting there are areas of extremely low-depth coverage that are pulling the mean down. In other words, the sequence may be under-covered in some areas.

Ignored

The histogram that stacks up the coverage count at each position is of finite size. Thus, if a position has more coverage than our histogram supports, we mark the record as ignored and continue on. A significantly high number of ignored records also indicates the effect of extremely high coverage.

Histograms

Contains one histogram per supported sequence where the histogram represents the number of positions with a given coverage. For example, if bin 10 has a value of 100000, then 100,000 positions on that sequence had a coverage of exactly 10.