-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathStatsInf_Simulation_IainR_Sep2020.Rmd
88 lines (61 loc) · 2.91 KB
/
StatsInf_Simulation_IainR_Sep2020.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: "Simulation to investigate the Central Limit Theorem"
author: "Iain Russell"
date: "04/09/2020"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Overview
This is an investigation of the Central Limit Theorem, hereafter CLT, using the exponential distribution. Simulations containing 40 exponentials are run 1000 times and the mean computed each time; the distribution of 1000 simulation means is compared with 1000 random exponentials to demonstrate the CLT.
A histogram of 1000 random exponentials drawn from the exponential distribution demonstrates exponential growth/decay with a rate of 0.2.
```{r}
set.seed(3433)
hist(rexp(1000, rate=0.2))
```
## Simulations
Generate 1000 simulations of 40 random exponentials with lambda/rate 0.2, and compute the mean for each simulation.
Lambda (0.2) is the growth/decay rate of the exponential, n (40) is the number of exponentials in each simulation, nosims (1000) is the number of simulations, sims is the vector of 1000 simulated means.
```{r}
lambda = 0.2
n = 40
nosims = 1000
sims = apply(matrix(rexp(nosims * n, rate=lambda), nosims), 1, mean)
```
The theoretical mean, mu, and standard deviation, sigma, are computed as 1/lambda and in accordance with the CLT are then used to shift and scale the distribution of sample means so that it is comparable with a standard normal distribution.
```{r}
mu = sigma = 1/lambda
se = sigma/sqrt(n)
sims_std = (sims - mu) / se
```
Successful use of the CLT is strongly dependent on the sample size, n (40 in our case). This may or may not be sufficient to ensure that the sample statistics (mean, variance) are representative of the population statistics. The larger that n is, the better the sample statistics approximate those of the population and the narrower the confidence interval for the population mean.
## Sample Mean versus Theoretical Mean
The sample mean of the simulations is approximately 5, the same as the theoretical mean, mu, of the population.
```{r}
mean(sims)
mu
```
A 95% confidence interval for the mean contains the theoretical mean mu (5), indicating that the sample mean is a good estimate of the population mean.
```{r}
mean(sims) + c(-1,1) * qnorm(.975) * se
```
## Sample Variance versus Theoretical Variance
The theoretical variance is sigma^2/n, where sigma is the standard deviation, or 1/lambda.
```{r}
sigma^2/n
```
The sample variance computed from the simulated data is very close to this.
```{r}
var(sims)
```
Therefore we may conclude that the variance of the sample distribution approximates well the theoretical variance.
## Distribution
A histogram of the simulated means showing that the distribution is approximately normal.
```{r}
hist(sims)
```
A histogram of the standardized simulated means showing that the distribution is approximately normal with a mean close to zero, and the majority of the data between quantiles -2,2.
```{r}
hist(sims_std)
```