forked from andysouth/testuclhbookdown
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path042-example-flowsheet.Rmd
85 lines (48 loc) · 3.25 KB
/
042-example-flowsheet.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# Flowsheet example
Aim : to summarise EPIC questionnaire results captured from flowsheets.
This example demonstrates summarising a questionnaire where we are expecting one answer per patient for each of a series of question.
Starting with two data extractions from Caboodle.
1. Questionnaire results from Flowsheets
2. Patient demographics
The patient cohort was defined in the data extraction by clinic codes and Patient Durable Keys.
## Questionaire results from flowsheets
The flowsheet data are extracted by an SQL query. By default data are returned in a 'long' format with question identifiers stored in two columns (note that these may be renamed in the sql extraction) :
* **DisplayName** contains the text on the original form
* **Name** contains an identifier that is shorter
Values are stored in columns :
* **Value**
* **Unit** optional, may not be specified
It can be useful to reformat these data so that each question is stored in its own column and the values per patient are stored in rows.
To do that 'widening' we can use the `pivot_wider` function from the package `tidyr`. You need to specify which columns the names, values and identifiers come from.
You may need to decide whether to use the DisplayName or Name of the FlowsheetRow works better for what you want to do. DisplayName may be too long to provide a good column name, but Name may not give you a clear indication of what the value contains.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
```{r libraries}
#library(tidyverse)
library(readr)
library(dplyr)
library(tidyr)
```
In this first example, where there is just one answer recorded for each question-patient combination the pivot_wider should work fine.
```{r}
dfqs <- read_csv("data//qlong1.csv")
knitr::kable(dfqs)
```
```{r}
dfqswide <- dfqs %>% pivot_wider( names_from = DisplayName,
values_from = Value,
id_cols = PatientDurableKey )
knitr::kable(dfqswide)
```
[add how to deal with multiple values per patient-question combination either if there is a record-id column or not]
In some cases there can be duplicate values per patient-question combination. This can be because a form was started and not completed for a patient. In that case a warning message from pivot_wider gives an indication of how to detect the duplicates. Duplicate values can also be because the form has been completed multiple times. In that case values of EncounterKey or FlowsheetValueKey can help to identify which row you want.
Many other flowsheet data will contain repeated values per patient. In that case, to get to a single value per patient, you would need to summarise the data first before using pivot_wider.
## Patient Demographics
Patient demographics may be provided in a separate table extracted from Caboodle, also with a PatientDurableKey. The shared column can be used to join the two tables together to allow answers to be summarised by demographics.
```{r}
dfpatient <- read_csv("data//patient1.csv")
dfqswidejoin <- dfqswide %>% left_join(dfpatient, by="PatientDurableKey")
knitr::kable(dfqswidejoin)
```
One the data have been joined they can be plotted using `ggplot2` or further summarised using `dplyr`.