-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChapter-07-Notes.Rmd
642 lines (440 loc) · 18.4 KB
/
Chapter-07-Notes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
---
title: "Ch 7 Notes"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(fpp3)
```
# Chapter 7: Time Series Regression Models
__Basic Concept__: Forecast the time series $y$ assuming it has linear relationship
with other time series $x$
__Forecast Variable__: $y$, regressand, dependent or explained variable
__Predictor Variable__: $x$, regressor, independent, explanatory variable
## 7.1 The Linear Model
### Simple Linear
Formula:
$$y_t = \beta_0 + \beta_1x_t + \epsilon_t$$
* $\beta_0$ - represents the predicted value of $y$ when $x$=0
* $\beta_1$ - slope of the line representing the average predicted change in
$y$ resulting from a one unit increase in $x$
* $\epsilon_t$ - error term is the deviation of the straight line model
__Example__
How is consumption expenditure $y$ affected by disposable income $x$
```{r}
us_change %>%
pivot_longer(c(Consumption, Income), names_to="Series") %>%
autoplot(value) +
labs(y = "% change")
```
As a scatter plot of consumption changes against income changes with regression line
```{r}
us_change %>%
ggplot(aes(x = Income, y = Consumption)) +
labs(y = "Consumption (Quarterly % Change)",
x = "Income (Quarterly % Change") +
geom_point() +
geom_smooth(method = "lm", se = FALSE
)
```
The equation is estimated using `TSLM()`
```{r}
us_change %>%
model(TSLM(Consumption ~ Income)) %>%
report()
```
Fitted line has a positive slope indicating positive relationship between the two.
The coefficient shows that a one unit increase in $x$ results in an average of
0.27 increase of $y$. Or a 1% increase in personal disposable income results in
an average increase of 0.27 percentage points in personal consumption expenditure.
### Multiple Linear Regression
This Phrase is used when there are two or more predictor variables.
- Each predictor variable must be numeric
- Each coefficient measures marginal effects as they take in to account the
effects of each predictor
- Can create more accurate forecasts as we assume
### Assumptions
We assume the following with a linear regression:
- model is a reasonable approximation to reality
- errors have mean of zero; otherwise forecast systematically biased
- errors are not autocorrelated. If so, there is more information in the data
that could be used in the prediction
- errors are unrelated to predictor variables
Also helpful to have errors normally distributed with constant variance
***
## 7.2 Least Squares Estimation
Provides a way of choosing the coefficients effectively by minimizing the sum of
the squared errors.
`TSLM()` fits a linear regression model to time series data.
__Example__
```{r}
fit_consMR <- us_change %>%
model(tslm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)
```
- First column is the estimate for the coefficient of each predictor variable
- Second column gives its standard error
- 't value' is the ratio of an estimated coefficient to its standard error
- 'p value' is the measure of significance for the predictor variable
The last two columns are not particularly helpful in forecasting.
### Fitted Values
Predictions of $y$ can be obtained by using the estimated coefficients and setting
the error term to zero. (Note these are predictions of data used to estimate the
model, not forecasts of future values of $y_t$).
__Example__
```{r}
augment(fit_consMR) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, color = 'Data')) +
geom_line(aes(y = .fitted, colour = 'Fitted')) +
labs(y = NULL,
title = 'Percent Change in US Consumption Expenditure') +
scale_colour_manual(values = c(Data = 'black', Fitted = '#D55E00'))+
guides(color = guide_legend(title=NULL))
```
```{r}
augment(fit_consMR) %>%
ggplot(aes(x = Consumption, y = .fitted)) +
geom_point() +
labs(
y = 'Fitted Values',
x = 'Actual Values',
title = 'Percent Change in Us Consumption Expenditure'
) +
geom_abline(intercept = 0, slope = 1)
```
The correlation between these variables is $r = 0.877$ with $R^2 = 0.768$. So,
model explains 76.8% of the data. Compared to the $R^2$ value of .15 in section
7.1, adding the three extra predictors allowed more of the data to be explained.
### Goodness of Fit
$R^2$ is the typical measurement used to determine goodness of fit for a linear
model. This is calculated as the square of the correlation between the observed
$y$ values and the predicted $\hat{y}$ values
In a simple linear regression, the value of $R^2$ is also equal to the square of
the correlation between $x$ and $y$.
$R^2$ lies between 0 and 1
If predictions are close to actual values, we expect $R^2$ to be close to 1.
If predictions are unrelated to actual values, then $R^2 = 0$
Caveat: $R^2$ value will never decrease when adding an extra predictor. Thus
it can lead to overfitting. Ensure you are validating the model with test data.
### Standard Error of the Regression
Another measure is called "Residual Standard Error". The standard error is related
to the size of the average error that the model produces. We ca ncompare this error
to the sample mean of $y$ or with the standard deviation of $y$.
***
## 7.3 Evaluating the Regression Model
Residuals in a regression model are similar to residuals we've seen so far, and
are calculated in a similar fashion: as the difference between $y$ and $\hat{y}$
After selecting the regression variables and fitting a model, it is necessary
to plot the residuals to check that the assumptions have been satisfied.
### ACF Plot of Residuals
It is very common to find autocorrelation in the residuals of a time series
regression model because time series values are highly likely to be similar to
the values preceding it.
### Histograms of the Residuals
Always check if the residuals are normally distributed.
__Example__
```{r}
fit_consMR %>% gg_tsresiduals()
```
```{r}
augment(fit_consMR) %>%
features(.innov, ljung_box, lag = 10, dof = 5)
```
Regarding the above:
- the heteroskedasticity will potentially make the prediction
interval coverage inaccurate.
- Histogram shows residuals slightly skewed.
- Autocorrelation shows spike at lag 7, and significant Ljung-Box
test at 5%, however it is unlikely to impact the forecasts.
### Residual Plots Against Predictors
We expect residuals to be random without showing patterns. Best way to check
is by plotting residuals against each predictor variable as well as variables
that are not in the model. If any show a pattern, they may need to be added
to the model.
__Example__
```{r}
us_change %>%
left_join(residuals(fit_consMR), by = 'Quarter') %>%
pivot_longer(Income:Unemployment,
names_to = 'regressor', values_to = 'x') %>%
ggplot(aes(x=x, y=.resid)) +
geom_point() +
facet_wrap(. ~ regressor, scales = 'free_x') +
labs(y = 'Residuals', x = '')
```
Above residuals look to be randomly scattered.
### Residual Plots Against Fitted Values
A plot of the residuals against the fitted values should also show no pattern. If
there is a pattern, errors may be heteroskedastic, meaning the variance of the
errors is not constant. If there is a pattern, a transformation of the forecast
variable may be needed.
__Example__
```{r}
augment(fit_consMR) %>%
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() + labs(x = 'Fitted', y = 'Residuals')
```
Above, the randomness of the scatterplot suggests errors are homoscedastic.
### Outliers and Influential Observations
Sometimes an extreme outlier can have an outsized influence on the data. It is
important to determine if the outlier is an incorrect entry or is simply different.
### Spurious Regression
Regressing a non-stationary time series can lead to spurious regressions. Signs
of spurious regression:
- High R2 accompanies by high residual autocorrelation
```{r}
fit <- aus_airpassengers %>%
filter(Year <= 2011) %>%
left_join(guinea_rice, by = "Year") %>%
model(TSLM(Passengers ~ Production))
report(fit)
```
```{r}
fit %>% gg_tsresiduals()
```
***
## 7.4 Some Useful Predictors
### Trend
A trend variable can be specified in the `TSLM()` function using `trend()`
### Dummy Variables
When you have categorical variables, and would like to use them as a predictor,
turn them into a "dummy variable" by one-hot encoding. For example, if a "yes/no"
or "T/F" column, the column would be converted to "1/0" respectively. You will
always need one fewer column than unique categories.
### Seasonal Dummy Variables
For example, if you wanted to see how weekdays could be used as a prediction,
you would only need 6 columns of 0/1
`TSLM()` function will automatically handle this when calling `season()`
__Example__
```{r}
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
recent_production %>%
autoplot(Beer) +
labs( y="Megalitres",
title='Australian Quarterly Beer Production')
```
```{r}
fit_beer <- recent_production %>%
model(TSLM(Beer ~ trend() + season()))
report(fit_beer)
```
The above coefficients explain:
- the trend accounts for a decrease of -.34 megalitres
per quarter
- Q2 has production of 34 MegaL lower than Q1
- Q3 has production of 17 MegaL lower than Q1
- Q4 has production of 72 MegaL higher than Q1
Note all dummy variables compared to the dummy variable not explicitly used
```{r}
augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, color = "Data")) +
geom_line(aes(y = .fitted, color = "Fitted")) +
scale_color_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Megalitres",
title = "Australian Quarterly Beer Production") +
guides(color = guide_legend(title = "Series"))
```
```{r}
augment(fit_beer) %>%
ggplot(aes(x = Beer, y = .fitted,
color = factor(quarter(Quarter)))) +
geom_point() +
labs(y = 'Fitted', x = "Actual values",
title = "Australian Quarterly Beer Production") +
geom_abline(intercept = 0, slope = 1) +
guides(color = guide_legend(title = 'Quarter'))
```
### Intervention Variables
Events that affect the variable being forecast are called interventions (e.g.
competitor activity, advertising expenditure, etc.).
If the effect only last for one period, we use a "spike" variable. This is
equivalent to using a dummy variable for handling an outlier.
### Trading Days
The number of trading days per month can vary and have a substantial effect
on sales data. This can also be used as a predictor.
### Distributed Lags
If the effect of an event can last after the event takes place (like the
effect of an advertising campaign), lagged values of the event must be included.
### Easter
Easter differes from most holidays in that it is not held on the same day each
year and its effect can last for several days.
### Fourier Series
A series of sine and cosine terms at the right frequencies can approximate any
periodic function. Fourier terms often need fewer predictor than dummy variables,
which makes them useful with large variable sets like weekly data where
$m \approx 52$. Less useful for smaller sets like days of the week.
Fourier terms are produced using the `fourier()` function
__Example__
```{r}
fourier_beer <- recent_production %>%
model(TSLM(Beer ~ trend() + fourier(K = 2)))
report(fourier_beer)
```
The K argument to `fourier()` specifies how many pairs of sin and cos timers to
include. Max allowed is $K = m/2$ where $m$ is the seasonal period.
A regression model containing Fourier terms is often called a __harmonic__
regression because the successive Fourier terms represent harmonics of the
first two.
***
## 7.5 Selecting Predictors
Things not to do:
- Do not plot the forecast variable against every variable and drop predictors
that have no visible relationship. Not all relationships are easily seen by
the human eye in a scatter plot.
- Do not create a multiple linear regression with all variables and drop the ones
with the lowest *p*-value. Statistical significance does not equal predictive
power.
Instead, here are 5 measures found in the `glance()` function:
```{r}
glance(fit_consMR) %>%
select(adj_r_squared, CV, AIC, AICc, BIC)
```
These values are then compared with values from other models. for Adj $R^2$, we
want the highest values. The rest, we wante the lowest.
### Adjusted $R^2$ (or $\bar{R}^2$)
Maximizing $\bar{R}^2$ works well as a method of selecting predictors. It is
equivalent to minimising the standard error of a given equation.
### Cross-validation
The procedure uses the following steps:
1. Remove observation $t$ from the data set, and fit the model using remaining data.
Then compute the error.
2. Repeat step one for all observations
3. Compute MSE from all error terms
### Akaike's Information Criterion
This idea penalizes the fit of the model (SSE) with the number of params that need
to be estimated.
### Corrected Akaike's Information Criterion
For a small set of observations, AIC selects too many predictors. $AIC_c$ is
bias corrected for small data sets.
### Schwarz's Bayesian Information Criterion
BIC penalizes the number of params more heavily than AIC.
### Which measure to use?
Depends.... Everyone seems to have a preference.
### Stepwise Regression
If a large number of predictors, it's impossible to fit all models. Besides PCA,
another strategy to limit the number of models explored is __backward stepwise regression__
1. Start with the model containing all potential predictors
2. Remove one predictor at a time. Keep the model if it improves the measure of
predictive accuracy
3. Iterate until no further improvement
If the number of potential predictors is too large for the above, employ
__forward stepwise regression__, where you start with a model that only includes
the intercept and predictors are added one at a time keeping the ones that improve
the predictive accuracy.
The stepwise approach is not guaranteed to attain the best model, but it will
likely lead to a good model.
### Beware of Inference After Selecting Predictors
The methods above are only useful when selecting a model for forecasting. They
are not helpful when looking to study the effect of a predictor. If looking at the
statistical significance of predictors, any procedure that selects predictors first
will invalidate the assumptions behind p-values.
***
## 7.6 Forecasting with Regression
### Ex-ante vs Ex-post Forecast
__Ex-ante__
Forecasts made using only the information that is available in advance. In order
to generate an ex-ante forecast, the model requires forecasts of the predictors.
Typically using a simple method from 5.2 or a pure time series approach from
chapters 8 or 9
__Ex-post__
Forecasts made using later information on the predictors. Not genuine forecasts,
but useful for studying the behavior of forecasting models. They should not assume
any knowledge of the $y$ variable to be forecast.
__Example__
Normally you cannot know future values in advance, but in the below, the trend
and season are the only predictors and thus the future values can be used in advance.
```{r}
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
fit_beer <- recent_production %>%
model(TSLM(Beer ~ trend() + season()))
fc_beer <- forecast(fit_beer)
fc_beer %>%
autoplot(recent_production) +
labs(
title = "Forecasts of beer production using regression",
y = "megalitres"
)
```
### Scenario Based Forecasting
Here, the forecaster assumes possible scenarios for the predictor variables.
For example, a US policy maker may be interested in comparing the predicted
change in consumption when there is a constant growth of 1% and 0.5% respectively
for income and savings with no change in the employment rate, versus a respective
decline of 1% and 0.5%
```{r}
fit.consBest <- us_change %>%
model(
lm = TSLM(Consumption ~ Income + Savings + Unemployment)
)
future_scenarios <- scenarios(
Increase = new_data(us_change, 4) %>%
mutate(Income=1, Savings = 0.5, Unemployment=0),
Decrease = new_data(us_change, 4) %>%
mutate(Income= -1, Savings= -.05, Unemployment=0),
names_to = "Scenario")
fc <- forecast(fit.consBest, new_data = future_scenarios)
us_change %>%
autoplot(Consumption) +
autolayer(fc) +
labs(title = 'US Consumption', y="% Change")
```
### Building a Predictive Regression Model
Regression models are great for capturing important relationships between the
forecast variable and predictors, and are great for scenario-based forecasting.
A challenge is that in order to generate ex-ante forecasts, the model requires future
values of each predictor.
An alternative way is to use lagged predictors.
***
## 7.7 Nonlinear Regression
Easiest way to model a non-linear relationship is transform the forecast variable
and/or the predictor variable before estimating a regression. This provides a
non-linear functional form that is still linear in parameters.
__log-log__: the natural log is taken of both x and y
__log-linear__: log of only x
__linear-log__: log of only y
__Tip__: for a log transform, all variables must be > 0. If $x$ contains zeros,
use $log(x+1)$. This method has a neat side-effect that zeros in $x$ will still
be zeros in the transformed scale.
### Forecasting with a Nonlinear Trend
__piecewise linear__: introducing points where the slope of $f$ (of $f(x)$) can change.
These points are called __knots__
__Example__
Boston Marathon winning times
```{r}
boston_men <- boston_marathon %>%
filter(Year >= 1924) %>%
filter(Event == "Men's open division") %>%
mutate(Minutes = as.numeric(Time)/60)
```
```{r}
fit_trends <- boston_men %>%
model(linear = TSLM(Minutes ~ trend()),
exponential = TSLM(log(Minutes) ~ trend()),
piecewise = TSLM(Minutes ~ trend(knots = c(1950, 1980)))
)
fc_trends <- fit_trends %>% forecast(h = 10)
boston_men %>%
autoplot(Minutes) +
geom_line(data = fitted(fit_trends),
aes(y = .fitted, colour = .model)) +
autolayer(fc_trends, alpha = 0.5, level = 95) +
labs(y = 'Minutes',
title = 'Boston Maraton Winning Times')
```
Above, the piecewise method shows its strength.
***
## 7.8 Correlation, Causation and Forecasting
Correlation is not causation.
__confounder__: a variable not included in the forecasting model that influences
$y$ and at least one $x$ variable.
### Forecasting with Correlated Predictors
Not exactly a problem unless the correlation coefficient is approaching +1 or -1
### Multicollinearlity and Forecasting
This happens when two predictors are highly correlated as mentioned above. It is
best to drop one of these predictors.