-
Notifications
You must be signed in to change notification settings - Fork 12
NGS & statistics
Abbreviation | Meaning |
---|---|
NGS | Next Generation Sequencing |
DEG | Differentially expressed gene(s) |
padj | Adjusted p-value |
FDR | False Discovery Rate |
Symbol | Meaning |
---|---|
Number of tests in a multiple-testing schema (e.g. number of genes in differential expression analysis) | |
p-value | |
e-value |
The multiple testing problem arises from the application of a given statistical test to a large number of cases. For example, in differential expression analysis, each gene/transcript is submitted to a test of equality between two conditions. A single analysis thus typically involves several tens of thousands tests.
The general problem of multiple testing is that the risk of false positive indicated by the nominal p-value will be challenged for each element.
The nominal p-value is the p-value attached to one particular element in a series of multiple tests. For example, in differential analysis, one nominal p-value is computed for each gene. This p-value indicates the risk to obtain an effect at least as important as our observation under the null hypothesis, i.e. in the absence of regulation.
The e-value indicates the number of false positives expected by chance, for a given threshold of p-value.
Where
Note that the e-value is a positive number ranging from
The Family-Wise Error Rate (FWER) indicates the probability to observe at least one false positive among the multiple tests.
The False Discovery Rate (FDR) indicates the expected proportion of false positives among the cases declared positive. For example, if a differential analysis reports 200 differentially expressed genes with an FDR threshold of 0.05, we should expect to have
An adjusted p-value is a statistics derived from the nominal p-value in order to correct for the effects of multiple testing.
Various types of corrections for multiple testing have been defined (Bonferoni, e-value, FWER, FDR). Note that some of these corrections are not actual "adjusted p-values".
- the original Bonferoni correction consists in adapting the
$\alpha$ threshold rather than correcting the p-value. - the e-value is a number that can exceed 1, it is thus not a probability, and thus, not a p-value.
The most usual correction is the FDR, which can be estimated in various ways.