-
Notifications
You must be signed in to change notification settings - Fork 0
/
EpiconceptStegenCaseStudy.Rmd
1455 lines (928 loc) · 48.9 KB
/
EpiconceptStegenCaseStudy.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Stegen case study: Are you confused or ready to interact when eating Tiramisu and drinking beer?"
output: worded::rdocx_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, echo = F, warning= F, message= F}
#load packages
#Those required for creating this markdown:
#"Worded": for styling and pagebreaks
#"knitr": for styling tables
required_packages <- c("worded", "knitr")
for (i in seq(along = required_packages)) {
library(required_packages[i], character.only = TRUE)
}
```
<!---CHUNK_PAGEBREAK--->
# Copyright and License
**Source:**
This case study was first designed by Alain Moren and Gilles Desve, EPIET. It is based on an investigation conducted by Anja Hauri, RKI, Berlin, 1998
Minor revisions were brought to this case study by IntoEpi 2009,2010, Hong Kong 2016
It was then translated in to *R* by Alexander Spina in 2018.
**Revisions:**
*If you modify this case study, please indicate below your name and changes you made*
**You are free:**
- **to Share** — to copy, distribute and transmit the work
- **to Remix** — to adapt the work
**Under the following conditions:**
- **Attribution** — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). The best way to do this is to keep as it is the list of contributors: sources, authors and reviewers.
- **Share Alike** — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. Your changes must be documented. Under that condition, you are allowed to add your name to the list of contributors.
- You cannot sell this work alone but you can use it as part of a teaching.
**With the understanding that:**
- **Waiver** — Any of the above conditions can be waived if you get permission from the copyright holder.
- **Public Domain** — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- **Other Rights** — In no way are any of the following rights affected by the license:
- Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
- The author's moral rights;
- Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
- **Notice** — For any reuse or distribution, you must make clear to others the license terms of this work by keeping together this work and the current license.
This licence is based on http://creativecommons.org/licenses/by-sa/3.0/
<!---CHUNK_PAGEBREAK--->
# Objectives
At the end of the case study, using *R* for stratified analysis and logistic regression, participants should be able to analyse data from a food borne outbreak investigation and to sort out the respective roles played by several food vehicles.
<!---CHUNK_PAGEBREAK--->
# Introduction
# Session 1
On 26 June 1998 the St Sebastian High School in Stegen, Germany, celebrated the graduation from school by organising a party to which 250 to 350 participants were expected. Attendants included graduates from St Sebastian High School, their families and friends, teachers, 12th grade students and some graduates from the nearby Marie-Curie school of Kirchzarten.
A self service party buffet was supplied by a commercial caterer from Freiburg.
Food was prepared the day of the party and transported in a refrigerated van to the school.
Festivities started with a dinner buffet open from 8.30 pm and followed by a dessert buffet offered from 10 pm. The party and the buffet extended late during the night and alcoholic beverages were quite popular. All agreed it was a party to be remembered.
## The alert
On 2nd July 1998, the Freiburg Health office of the Federal Council Office of Breisgau-Hochschwarzwald reported to the Robert Koch Institute (RKI) in Berlin the occurrence of many cases of gastroenteritis following the graduation party described above. More than 100 cases were suspected among participants and some of them were admitted to nearby hospitals. Sick people suffered from fever, nausea, diarrhoea and vomiting lasting for several days. Most believed that the tiramisu consumed at dinner was responsible for their illness.
Salmonella Enteritidis was isolated from 19 stool samples.
The Freiburg health office sent a team to investigate the kitchen of the caterer. Food preparation procedures were reviewed. Food samples, except tiramisu (none was left over), were sent to the laboratory of Freiburg University. Microbiological analyses were performed on samples of the following: brown chocolate mousse, caramel cream, remoulade sauce, yoghurt dill sauce, and 10 raw eggs.
The Freiburg health office requested help from the RKI in the investigation to assess the magnitude of the outbreak and identify potential vehicle(s) and risk factors for transmission in order to better control the outbreak.
Cases were defined as any person attending the party at St Sebastian High School who suffered from diarrhoea (>= 3 loose stool for 24 hours) between 27 June and 29 June 1998; or who suffered from at least three of the following symptoms: vomiting, fever>= 38.5 ° C, nausea, abdominal pain, headache.
Students from both schools attending the party were asked through phone interviews to provide names of persons who attended the party.
Overall 291 responded to enquiries and 103 cases were identified (Attack rate = 35%). Among these cases, 84 received medical treatments and four were admitted to hospitals. Attack rates by age group were 36.6% for persons < 20 years, 32.1% for persons 20 to 29 years, and 36.8% for persons older than 29 years.
![](IntroCurve.png)
## Q1. In order to prepare your analysis, summarise the above information and formulate hypotheses to be tested. (5 min discussion)
<!---CHUNK_PAGEBREAK--->
The shape of the epidemic curve, the attendance to a single event (a buffet) pointed towards a food borne outbreak related to a point common source of infection.
Using the updated list of attendants, a retrospective cohort study including all attendants to the party (that could be reached) was conducted. All had received a standard questionnaire asking for demographic information, signs, symptoms and duration, admission to hospital, and food and beverages consumption at the party including amount consumed. Food specific attack rates were computed for more than 50 food items and beverages.
## Q2. Establish a plan of analysis. (10 min discussion)
<!---CHUNK_PAGEBREAK--->
## Help task 2
**Perform data cleaning**
- For each variable, look at range, unexpected values, missing values.
- Correct data using original forms if needed
**Describe each variable**
- For each variable, describe frequency distributions including missing values and if needed means, median, modes, quartiles, SD, outliers
- Make appropriate histograms and box plots
- Choose relevant characteristics to describe the population
**Identify the outbreak vehicle if any**
- Chose the appropriate measure of association (RR, RD or OR)
- Chose the appropriate statistical tests and appropriate level of confidence
- Compute food specific attack rates
- Look at proportion of cases exposed
- Compute attributable risk % among exposed
- Search for any dose response if appropriate
- Interpret the results
**Do a stratified analysis**
- Identify the variables that are potential effect modifiers (EM) and confounders
- Design appropriate stratification tables
- Stratify on each level taken by the EM and confounders
- Compute appropriate measurements to identify confounding and effect modification
- Apply appropriate statistical tests
- Interpret the results
**Do a multivariable analysis**
- Prepare the data set for the multivariable analysis
- Identify numerical, nominal, discrete, continuous variables and decide how to analyse them
- Create additional variables as needed (age groups, dummy variables, etc.)
- Recode as necessary
- Add variables one by one
- Check for confounding and interaction
- Select a final model
## Q3. Check and clean the dataset “stegen.dta”. Create labels where appropriate.
Verify your working directory and open stegen.dta. The data set “stegen.dta” includes the following variables:
```{r, echo = F}
kable(read.csv("IntroTable.csv", sep = ";" ,
stringsAsFactors = FALSE ))
```
<!---CHUNK_PAGEBREAK--->
## Help task 3
Describe your dataset: frequency distributions, means, median, modes, quartiles, SD, outliers.
Make appropriate histograms and box plots.
### Setting your working directory
You can check the path for your current working directory using the *getwd* function.
```{r, eval = F}
#Check your current working directory
getwd()
```
To set your working directory you can use the *setwd* function.
```{r, eval = F}
setwd("C:/Users/Username/Desktop/EpiconceptStegen")
```
### Reading in files
Import the dataset from a comma seperated value (.csv) file using the *read.csv* function, storing it as a dataframe within *R* called *tira.data*.For a CSV file the separator is normally a comma (","), however depending on the language of your operating system this can also be other values, for example a semi-colon (";"). Here we also specify that we do not want to read in string (character or grouped variables as factors).
```{r}
tira.data <- read.csv("stegen.csv", sep = ";", stringsAsFactors = FALSE )
```
### Describe your dataset
For example:
```{r, eval = F}
summary(tira.data)
table(tira.data$sex)
table(tira.data$beer, useNA = "always")
summary(tira.data$age)
aggregate(tira.data$age, by = list(tira.data$sex), FUN = summary)
prop.table( table( tira.data$ill) )
```
*Example: ill*
```{r, eval = F}
#get counts of ill
#save table as "counts"
counts <- table(tira.data$ill)
#get proportions for counts table
prop.table(counts)
#you could also multiple by 100 and round to 2 digits
round(prop.table(counts)*100, digits = 2)
```
*Example: sex*
```{r, eval = F}
#get counts
#save table as "counts"
counts <- table(tira.data$sex)
#get proportions for counts table
prop.table(counts)
#you could also multiple by 100 and round to 2 digits
round(prop.table(counts)*100, digits = 2)
```
It is also possible to use a custom function to pull these various lines of code together in a custom function. You do not need to understand this code at current. You can run this code below which saves the *big.table* function in your environment; then you can use it the same way any other function works.
```{r}
# Function to make tables with counts, proportions and cumulative sum
big.table <- function(vars, data, useNA = "always") {
# Create an empty list to hold the output of your loop
output <- list()
# Apply big.table to each element of the object in vars.
#In this loop, "var" is the indexing variable; any character can be used e.g. "i"
for (var in vars) {
# Within the [],
# the item before the comma refers to rows
# the item after the comma refers to columns
count <- table(data[ , var], useNA = useNA)
prop <- round(prop.table(count)*100, digits = 2)
cumulative <- cumsum(prop)
total <- t(rbind(count,
prop,
cumulative))
# assign the value of your tables (total) to the output list
#(note: double square brackets "[[]]" are used to subset elements of a list)
output[[var]] <- total
}
output
}
```
You can now use this function as any other function.
```{r}
# specify the variable in quotations and the dataset to use
big.table(var = "sex", data = tira.data)
```
*To show more than one table at a time:*
We could use the big.table function to show more than one table at a time.
```{r, eval = F}
# specify multiple vars using c()
big.table(var = c("tira", "pork", "salmon"), data = tira.data)
```
*Describing continuous variable, example: age*
```{r}
# use the aggregate function to group by sex
# sex must be as a list
# specify the function you would like to use (summary)
aggregate(tira.data$age, by = list(tira.data$sex), FUN = summary)
```
*Histograms*:
A frequency plot is the default for *hist*. In order to plot each unique value of age on the x-axis, specify number of breaks (in this case 100 years); set the x-axis to match this.
```{r, eval = F}
# Plot a histogram of age
# you can specify a bar for each age with "breaks"
# you can set your x axis from 0-100 using "xlim"
hist(tira.data$age,
xlab = "Age",
ylab = "Count",
breaks = 100,
xlim = c(0, 100)
)
```
To have a nicer title:
```{r}
# Plot a histogram of age
# you can specify a bar for each age with "breaks"
# you can set your x axis from 0-100 using "xlim"
# main is where you set your title
hist(tira.data$age,
xlab = "Age",
ylab = "Count",
breaks = 100,
xlim = c(0, 100),
main = "Number of cases"
)
```
To save you can plot, then use *dev.copy* to choose a file type and name; *dev.off* closes the connection.
```{r, eval = F}
# save histogram of age as a png file
dev.copy(png,'age.png')
dev.off()
```
If you believe that two age groups are identifiable you may want to create a new variable with two age classes (< 30 years and above). The following shows one way to do it:
```{r}
# create a binary variable for older than 30 years of age
tira.data$agegroup <- ifelse(tira.data$age >= 30, 1, 0 )
```
```{r, eval = F}
# check the age grouping
table(tira.data$agegroup)
```
The *str* function will provide an overview of which variable types are in your dataset. The *summary* function will give minimum, maximum, first and third quartiles as well as medians and means for variables which are not strings (characters). Each of these commands can be run for individual variables also. You can refer to an individual variable of a data set by using the **$**, for example, if you wanted to obtain a summary of the a numeric age variable, then you would write **summary(tira.data\$age)**.
```{r, eval=F}
# str provides an overview of the number of observations and variable types
str(tira.data$ill)
# summary provides mean, median and max values of your variables (where applicable NAs)
summary(tira.data$ill)
```
Codebook of the variables salmon, pork and horseradish show that a few records have the value 9. As in Q3 you are asked to clean these data, recode these values to missing.
```{r}
#for the rows where salmon is 9, overwrite with NA
tira.data$salmon[tira.data$salmon == 9] <- NA
#same for horseradish and pork
tira.data$horseradish[tira.data$horseradish == 9] <- NA
tira.data$pork[tira.data$pork == 9] <- NA
```
### Creating labels:
In order to add labels in *R* you have to change variables in to factors. This allows you to specify levels (the order in which categories appear in output) and then label these levels.
```{r}
#re-write the tira variable as a factor defining levels and labels
tira.data$tira <- factor(tira.data$tira,
levels = c(0, 1),
labels = c("No", "Yes")
)
#re-write the wmousse variable as a factor defining levels and labels
tira.data$wmousse <- factor(tira.data$wmousse,
levels = c(0, 1),
labels = c("No", "Yes")
)
#re-write the dmousse variable as a factor defining levels and labels
tira.data$dmousse <- factor(tira.data$dmousse,
levels = c(0, 1),
labels = c("No", "Yes")
)
```
You can label more than one variable at a time using a for-loop
```{r}
#define the variables you would like to recode
vars <- c("mousse", "beer", "redjelly",
"fruitsalad", "tomato", "mince",
"salmon", "horseradish", "chickenwin",
"roastbeef", "pork")
#for each var defined in vars above
for (var in vars) {
#select the column of tira.data in square brackets
#overwrite with a factor as above
tira.data[ , var] <- factor(tira.data[ , var],
levels = c(0, 1),
labels = c("No", "Yes")
)
}
```
And define different categories.
```{r}
#define the variables you would like to recode
vars <- c("tportion", "mportion")
#for each var defined in vars above
for (var in vars) {
#select the column of tira.data in square brackets
#overwrite with a factor as above
tira.data[ , var] <- factor(tira.data[ , var],
levels = c(0, 1, 2, 3),
labels = c("None", "One portion",
"Two portions", "Three portions")
)
}
```
<!---CHUNK_PAGEBREAK--->
### R-scripts
You may also want to develop an R-script in which you will keep all relevant commands and annotate / comment each step of your analysis.
You can select the "+" icon and select R-script from the dropdown (alternatively you could click File > New file > R-script) , insert your command and save with a specific “name.R”. An example is shown below:
![](Rscript.png)
You may want to create separate scripts for checking the dataset and for recoding the data (cleaning and creating labels).
<!---CHUNK_PAGEBREAK--->
## Q4. Describe the outbreak in terms of person and time
Note: A “Place” variable is not available. Use the cleaned dataset stegen1.dta dataset.
<!---CHUNK_PAGEBREAK--->
## Help task 4
```{r, eval = F}
load("stegen1.Rda")
big.table("sex", tira.data)
summary(tira.data$age)
big.table("agegroup", tira.data)
big.table("ill", tira.data)
big.table("dateonset", tira.data[tira.data$ill == 1, ])
```
### Key tables for task 4:
You can create these tables in Word or Excel from the data from the Stata output.
**Table 1. Descriptive epidemiology: Study population by sex**
```{r, echo = F}
kable( matrix( c(
"Sex", "N", "%",
"Male", "152", "48",
"Female", "139", "52",
"Total", "291", "100"
), ncol = 3, byrow = T)
)
```
**Table 2. Descriptive epidemiology: Study population by age group**
```{r, echo = F}
kable( matrix( c(
"Age group", "N", "%",
"<30", 215, 74,
"30+", 68, 23,
"Missing", 8, 3,
"Total", 291, 100
), ncol = 3, byrow = T)
)
```
**Mean age:** 26 years
**Median age:** 20 years
**Range:** 12 – 80 years
**Table 3. Descriptive epidemiology: Attack rate**
```{r, echo = F}
kable( matrix( c(
"Ill", "N", "%",
"No", 188, 65,
"Yes", 103, 35,
"Total", 291, 100
), ncol = 3, byrow = T)
)
```
**Table 4. Descriptive epidemiology: Cases by date of onset of illness**
```{r, echo = F}
kable( matrix( c(
"Ill", "N", "%",
"27 June 1998", 48, 47,
"28 June 1998", 46, 45,
"29 June 1998", 8, 8,
"Missing", 1, 1,
"Total", 103, 100
), ncol = 3, byrow = T)
)
```
### Copying tables in to Excel:
To copy tables in to Excel, save your output as an object and can use the *View* function on your output data frame. Then highlight this from the bottom right to the top left, then copy and paste to Excel.
![](Copying.png)
Alternatively if you could use the *write.csv* function to creat a CSV file and open this with Excel.
<!---CHUNK_PAGEBREAK--->
# Session 2
## Q5. Understanding that investigators did a cohort study, test each of the relevant hypotheses.
Using *R* and the cleaned dataset “stegen1.Rda”:
**Q5a)** Compute food specific attack rates and attack rates by age and sex
**Q5b)** Choose the appropriate measure of association and the appropriate statistical tests and appropriate level of confidence
**Q5c)** Look at proportion of cases exposed
**Q5d)** Compute attributable risk % among exposed
**Q5e)** Search for any dose response if appropriate
**Q5f)** Interpret the results and identify the outbreak vehicle if any.
<!---CHUNK_PAGEBREAK--->
## Help task 5
### Help Q5a) Calculating attack rate:
There are several ways to do this. The first involves using base-R code, the second is with a user-written function and the third is to install a package.
```{r}
#load your cleaned dataset
load("stegen1.Rda")
```
```{r, eval = F}
# The first element will be rows and the 2nd will be columns
count <- table(tira.data$sex, tira.data$ill, deparse.level = 2)
# Here we select row % of count by including ,1 in the prop.table section
prop <- round(prop.table(count, 1) * 100, digits = 2)
# We obtain the denominator using the rowSums function
denominator <- rowSums(count)[2]
# We combine all the elements together using cbind (binding by columns)
output <- cbind(Ill = count[2, ], N = denominator, Proportions = prop[2, ])
```
You could alternatively combine the above code in to a user-written function called *attack.rate*. Here too, you do not need to understand this code at this point.
First run the code to save the function in your environment.
```{r}
# Function to provide counts, denominator and proportions (equivalent of attack rate)
attack.rate <- function(exposure, outcome, data, rowcol = "cols") {
#create an empty list to store results
output <- list()
#for each variable named in exposure
for (var in exposure) {
counts <- table(data[, var], data[, outcome] )
if (rowcol == "cols") {
#get column proportions
prop <- round(prop.table(counts, 1) * 100, digits = 2)
#get row totals
denominator <- rowSums(counts)[2]
#pull counts together
intermediate <- cbind(Ill = counts[2, ], N = denominator, Proportions = prop[2, ])
}
if (rowcol == "rows") {
#get column proportions
prop <- round(prop.table(counts, 2) * 100, digits = 2)
#get column totals
denominator <- colSums(counts)[2]
#pull counts together
intermediate <- cbind(Exposed = counts[ , 2],
N = denominator, Proportions = prop[ , 2])
}
if (nrow(counts) > 2) {
#get column proportions
prop <- round(prop.table(counts, 1) * 100, digits = 2)
#get row totals
denominator <- rowSums(counts)
#pull counts together
intermediate <- cbind(Ill = counts[ , 2], N = denominator, Proportions = prop[ , 2])
}
#store your output table in the list
output[[var]] <- intermediate
}
return(output)
}
```
You can now use your function to get attack rates.
```{r, eval = F}
# specify the exposure, the outcome and the dataset
attack.rate( exposure = "sex", outcome = "ill", data = tira.data)
```
Alternatively you could use the *EpiStats* package written by Epiconcept.
```{r, eval = F}
# Install the package if you have not done this yet
install.packages("EpiStats")
```
```{r, message = F}
# Load the package to this session
library(EpiStats)
```
```{r, eval = F}
# Use the CS function as this is a cohort study
CS(tira.data, "ill", "sex")
```
## Help Q5b) Choose the appropriate measure of association and the appropriate statistical tests and appropriate level of confidence
As we are carrying out a cohort study, the appropriate measure of association is relative risk. The appropriate statistical test for determining a p-value is a Chi2 test of comparison of proportions. For our analyses we will use a 95% confidence level, as this is the standard used in public health.
## Help Q5c) Look at proportion of cases exposed
Here you could use the *attack.rate* function and specify that you would like rows.
```{r, eval = F}
# specify rows to get proportion of exposed cases
attack.rate(exposure = c("sex", "tira", "agegroup"),
outcome = "ill", data = tira.data,
rowcol = "rows")
```
Alternatively you could use the CS function and switch exposures and outcomes.
```{r, eval = F}
#reversed cs gives you one count
CS(tira.data, "tira", "ill")
```
We can see that many cases were exposed to Tiramisu.
```{r, echo = F}
attack.rate(exposure = "tira",
outcome = "ill", data = tira.data,
rowcol = "rows")
```
<!---CHUNK_PAGEBREAK--->
## Help Q5d) Compute attributable risk % among exposed
To calculate the attributable risk % among the exposed, we can use the formula:
$$AF_{exp}= \frac{RR-1}{RR} \times 100$$
The attributable risk % is the proportion of the disease among the exposed, which can be attributed to the exposure (or could have been prevented by eliminating the exposure).
We can also find the attributable risk % using the CS function:
```{r, eval = F}
#same command as above for CS
CS(tira.data, "tira", "ill")
```
```{r, echo = F}
kable( matrix( c(
"Risk factor", "Attributable risk %",
"Agegroupagegroup", 5.1,
"Tiramisutiramisu", 94.5,
"wmousse", 64.9,
"dmousse", 77.8,
"mousse", 79.9,
"redjelly", 52.0,
"fruitsalad", 60.0,
"tomato", 22.5,
"mince", 5.4,
"salmon", 3.2,
"horseradish", 20.4,
"chickenwin", 13.9,
"pork", 20.1
), ncol = 2, byrow = T)
)
```
NB: It makes sense to calculate the attributable risk % among the exposed among those with a RR>1. If there is a protective effect (RR<1), then we can calculate the prevented fraction among the exposed:
$\frac{risk_{unexp} - risk_{exp}}{risk_{unexp}} \times 100 = (1-RR) \times 100$
The prevented fraction among the exposed is mainly used in vaccine studies (where the exposure, the vaccine, is protective). In outbreak investigation studies, we rarely have protective exposure. We need to have verified biological plausibility for an exposure to be a risk factor, otherwise the calculation of the prevented fraction does not make sense.
Here, there are three risk factors for getting ill that are associated with a RR<1: sex, beer and roastbeef. The epidemiologists in the outbreak investigation team decide to assess if drinking beer is protective in this outbreak. We can calculate the prevented fraction among the exposed (see below). For the variables sex and roastbeef, it may be harder to determine any plausible biological reasons for them to be protective factors.
```{r, echo = F}
kable( matrix( c(
"Risk factor", "Prevented fraction %",
"Beer", 32.3
), ncol = 2, byrow = T)
)
```
## Help Q5e) Search for any dose response if appropriate
Use the variable tportion and tabulate it. Consider whether you would recode this variable so it has fewer categories, and actually do it.
```{r}
# Tabulate tportion variable against illness using attack.rate function
attack.rate(exposure = "tportion",
outcome = "ill",
data = tira.data)
```
```{r}
# Recode 3 portions of tportion as 2 portions
# Make a new variable called tportion2 that has the same values as tportion
tira.data$tportion2 <- tira.data$tportion
tira.data$tportion2[tira.data$tportion2 == "Three portions"] <- "Two portions"
#drop the resulting NA factor level
tira.data$tportion2 <- droplevels(tira.data$tportion2, NA)
```
```{r}
# Tabulate tportion2 variable against illness using attack.rate function
attack.rate(exposure = "tportion2",
outcome = "ill",
data = tira.data)
```
Here you can see that those who ate 2 or more portions of Tiramisu have a higher attack rate than those that ate only 1 portion of tiramisu. Those who ate 1 portion of tiramisu have a higher attack rate than those who ate no tiramisu.
## Help Q5f) Interpret the results and identify the outbreak vehicle if any.
Use the *CSTable* function from the *EpiStats* package. The output is automatically sorted by p-value however you can also get choose to sort by other values. This can be useful when you would like a summary table of attack rates and RRs.
```{r}
#use the same commands as for CS
CSTable(tira.data, cases = "ill", exposure = c("sex", "agegroup", "tira",
"beer", "mousse", "wmousse",
"dmousse", "redjelly", "fruitsalad",
"tomato", "mince", "salmon", "horseradish",
"chickenwin", "roastbeef", "pork"))
```
<!---CHUNK_PAGEBREAK--->
# Session 3
**Ppt Presentation: **
**Summary of results for food specific attack rates (10 minutes)**
Several food items seem to play a role in the occurrence of illness; tiramisu, dark and white chocolate mousse, fruit salad, and red jelly. They can potentially explain up to respectively (94, 76, 49, 46, and 45 of the 103 cases). Investigators decided to identify their respective role in the occurrence of illness.
From the crude analysis epidemiologists noticed that the occurrence of gastroenteritis was lower among those attendants who had drunk beer. They also decided to assess if beer had a protective effect on occurrence of gastroenteritis.
## Q6. Conduct any relevant stratified analysis, using the dataset stegen1.dta.
**Q6a)** Identify the variables which are potential effect modifiers and confounders: Start by looking if beer is an effect modifier or a confunder, look at the 2X2 tables
**Q6b)** Use the cs command
**Q6c)** To summarize your results, use csinter command
**Q6d)** Then design all appropriate stratification tables
**Q6e)** Stratify on each level taken by the effect modifiers and confounders and compute appropriate measurements to identify confounding and effect modification.
**Q6f)** Interpret the results
<!---CHUNK_PAGEBREAK--->
## Help Task 6
## Help Q6a) Identify the variables which are potential effect modifiers and confounders: Start by looking if beer is an effect modifier or a confounder, look at the 2X2 tables.
Do a tabulation to show the 2 by 2 tables of each stratum.
```{r}
# do a three way table with tira
# drop NAs specifying useNA = "no"
# put variable names using deparse.level = 2
table(tira.data$beer, tira.data$ill,
tira.data$tira, useNA = "no",
deparse.level = 2)
```
## Help Q6c) To summarize your results, use csinter function:
Use the *CSInter* function to summarise. The *CSInter* function produces 2 x 2 tables with stratum specific risk ratios, attributable risk among exposed and population attributable risk.
As a summary it gives the Mantel Haenszel RR and the result of a Woolf test for homogeneity.
```{r, eval = F}
# specify CSInter as otherwise, but also choose a by group
CSInter(tira.data, cases = "ill", exposure = "beer", by = "tira")
```
## Help Q6d) Then design all appropriate stratification tables
Other potential variables include:
- Dark chocolate mousse
- White chocolate mousse
- Beer
- Red jelly
- Fruit salad
- Age
- Sex
- Amount of Tiramisu eaten
## Help Q6e) Stratify on each level taken by the effect modifiers and confounders and compute appropriate measurements to identify confounding and effect modification.
```{r, echo = F}
kable( matrix( c(
"", "RR stratified for Tiramisu (Yes|No)",
"M-H Test for homogeneity (p-value)", "Crude RR", "Adjusted (MH-RR)",
"% change", "Eff.mod or conf?",
"White mousse", "", "", "", "", "", "",
"Dark mousse", "", "", "", "", "", "",
"Beer", "", "", "", "", "", "",
"Red jelly", "", "", "", "", "", "",
"Fruit salad", "", "", "", "", "", "",
"Sex", "", "", "", "", "", "",
"Age", "", "", "", "", "", ""
), ncol = 7, byrow = T)
)
```
By stratifying the analysis on tiramisu consumption we can measure the potential protective role of beer among those who ate tiramisu. The following tables show occurrence of gastroenteritis according to beer and tiramisu consumption.
**Table 5** Cases of gastroenteritis among attendants at a high school graduation ceremony by beer and tiramisu consumption, Stegen, Germany, 1998.
```{r, echo = F}
kable(CSInter(tira.data, cases = "ill", exposure = "beer", by = "tira")$df1)
```
It seems that consumption of beer may reduce the relative effect of tiramisu consumption on occurrence of gastroenteritis. The RR does not significantly differ in the two strata (0.8 vs. 1.0 and confidence intervals overlap). But some effect modification may be present. This association between beer consumption and occurrence of gastroenteritis is however not confounded by Tiramisu consumption since the adjusted RR is 0.8 and the crude RR was 0.7. A similar stratification was conducted assessing dose response for tiramisu consumption among beer drinkers and not.
```{r, echo = F}
kable(CSInter(tira.data, cases = "ill", exposure = "beer", by = "tportion2")$df1)
```
<!---CHUNK_PAGEBREAK--->
# Session 4
**Case study continued**
Results suggest that dark and white chocolate as well as fruit salad and red jelly consumption may have contributed to illness (since RR are high even among those who did not eat tiramisu). Such an association can be real (several contaminated food items, use of a single spoon to serve portions) or due to another unidentified confounding factor. Interpretation of results should also be cautious due to the small number of cases involved in this stratified analysis.
## Q7. Conduct a multivariable analysis without taking interaction into account.
**First explore the result of logistic regression**
- Start with the logit command
- Include one exposure variable (the most likely vehicle = tiramisu, outcome = ill)
- Look at the coefficients obtained from the logistical regression. With beta, compute the OR with a calculator and compare with output provided by *R*.
**Then add the other variables in the logistic regression**
<!---CHUNK_PAGEBREAK--->
## Help task 7
**Perform logistic regression**