Skip to content

Statistical Tests Selection

Milan Jelisavcic edited this page Apr 21, 2018 · 21 revisions

Selection of the proper statistical test is of the essential importance for analysing retrieved data of an experiment. Depending on the type of data points, different tests could more or less misleading or truthful. Here is presented a division in the interest of making the selection process easier and accurate.

Type of Data
Goal Measurement (from Gaussian Population) Rank, Score, or Measurement (from Non- Gaussian Population) Binomial,(Two Possible Outcomes) Survival Time
Describe one group mean, sd median, interquartile range proportion Kaplan-Meier survival curve
Compare one group to a hypothetical value one-sample t-test Wilcoxon test chi-square, or, binomial test **
Compare two unpaired groups unpaired t-test Mann-Whitney test Fisher's test,(chi-square for large samples) Log-rank test or Mantel-Haenszel*
Compare two paired groups paired t-test Wilcoxon test McNemar's test conditional proportional hazards regression*
Compare three or more unmatched groups one-way ANOVA Kruskal-Wallis test Chi-square test Cox proportional hazard regression**
Compare three or more matched groups repeated-measures ANOVA Friedman test Cochrane Q** conditional proportional hazards regression**
Quantify association between two variables Pearson correlation Spearman correlation contingency coefficients**
Predict value from another measured variable simple linear regression, or, nonlinear regression nonparametric regression** simple logistic regression* Cox proportional hazard regression*
Predict value from several measured or binomial variables multiple linear regression*, or, multiple nonlinear regression** multiple logistic regression* Cox proportional hazard regression*

Statistical Tests

One-way ANOVA

Average value

Mean

Median

Standard Deviation

Kruskal-Wallis test

Pearson correlation

One-sample t-test

The one-sample t-test is a statistical test used for testing the null hypothesis that the population mean is equal to a specified value mu_0.

Assumption

  1. The dependent variable must be continuous (interval/ratio).
  2. The observations are independent of one another.
  3. The dependent variable should be approximately normally distributed.
  4. The dependent variable should not contain any outliers.

Implementations

  • Implementation in R: t.test(a, mu=mu_0)
  • Implementation in Python: scipy.stats.ttest_1samp(a, popmean, axis=0)

Two-sample t-test

The one-sample t-test is a statistical test used for testing the null hypothesis such that the means of two populations are equal.

Assumption

  1. The populations from which the samples have been drawn should be normal - appropriate statistical methods exist for testing this assumption (e.g. the Kolmogorov Smirnov non-parametric test).
  2. The standard deviation of the populations is unknown. This assumption can be tested by the F-test.
  3. Samples have to be randomly drawn independent of each other.

Implementations

  • Implementation in R: t.test(a,b, var.equal=TRUE, paired=FALSE)
  • Implementation in Python: scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate')

Paired t-test

The paired t-test is a statistical test used for testing the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. The dependent variable must be continuous (interval/ratio).

Assumption

  1. The observations are independent of one another.
  2. The dependent variable should be approximately normally distributed.
  3. The dependent variable should not contain any outliers.

Implementations

  • Implementation in R: t.test(a,b, paired=TRUE)
  • Implementation in Python: scipy.stats.ttest_rel(a, b, axis=0, nan_policy='propagate')

Unpaired t-test

The paired t-test is a statistical test used for testing whether the slope of a regression line differs significantly from 0.

Assumption

  1. The observations are independent of one another.
  2. The dependent variable should be approximately normally distributed.
  3. The dependent variable should not contain any outliers.
  4. The data is continuous.
  5. The groups should have equal variance.

Implementations

  • Implementation in R: t.test(x, y, alternative="two.sided", var.equal=FALSE)
  • Implementation in Python: scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate')

Wilcoxon test

The Wilcoxon test is a non-parametric statistical test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ.

Assumption

  1. Data are paired and come from the same population.
  2. Each pair is chosen randomly and independently.
  3. The data are measured on at least an interval scale when, as is usual, within-pair differences are calculated to perform the test (though it does suffice that within-pair comparisons are on an ordinal scale).

Implementations

  • Implementation in R: wilcox.test(a,b, paired=TRUE)
  • Implementation in Python: scipy.stats.wilcoxon(x, y=None, zero_method='wilcox', correction=False)

[1] Table summary is extracted from https://www.graphpad.com/support/faqid/1790/

_________________
/ Premature      \
| optimization   |
| is the root of |
| all evil.      |
|                |
\ -- D.E. Knuth  /
-----------------
    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||
Clone this wiki locally