-
Notifications
You must be signed in to change notification settings - Fork 3
/
02-getting_data_in_R.Rmd
178 lines (117 loc) · 6.42 KB
/
02-getting_data_in_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# Getting your data into R
```{block,type='rmdinfo'}
This optional chapter will tell you about how you can **import** data in RStudio. We will also show you a few commands to **manipulate data** directly in R.
```
## Using R projects
One advantage of using an **R project** is that the project directory is automatically set as the working directory. Just copy your data file to the folder that contains the *".Rproj"* file, and you will be able to load files by name.
## Importing Excel Files
```{r include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```
One way to get Excel files directly into R is by using the `readxl` package. Install the package, and try using the read_excel() function to load the data, and assign it to an object called `df`:
```{r, eval = FALSE}
# Run this only once, to download and install the package:
install.packages("readxl")
# Load the package:
library(readxl)
# Read an Excel file into 'df':
df <- read_excel("your_file.xlsx",
sheet = 1)
```
```{r, echo=FALSE, results = "hide", message=FALSE, eval = FALSE}
library(readxl)
df <- read_excel("happy.xlsx",
sheet = 1)
```
### Inspect the data
R does not work with a single spreadsheet (SPSS or Excel). Instead, it can keep many objects in memory. The object `df` is a `data.frame`; an object that behaves similar to a spreadsheet. To see a description of the object, look at the *Environment* tab in the top right of Rstudio, and click the arrow next to `df`.
```{r, echo=FALSE}
library(png)
knitr::include_graphics("environment.PNG")
```
As you can see, the on the top-right pane **Environment**, your file is now listed as a data set in your RStudio environment.
You can make a quick copy of this data set by assigning the `df` object to a new object. This way, you can edit one, and leave the other unchanged. Assign the object `df` to a new object called `df_backup`:
```{r}
df_backup <- df
```
You can also have a look at the contents of `df` by **clicking** the object in the Environment panel, or running the command `head(df)`.
<br><br>
---
## Importing SPSS Files
SPSS files can be loaded using the `foreign` package. All SPSS files for this course are available for download [here](https://github.com/cjvanlissa/TCSM_student).
<!--[here](https://github.com/cjvanlissa/Doing-Meta-Analysis-in-R/blob/master/Problem2.sav?raw=true).-->
```{r, eval = FALSE}
# Install the package, run this only once
install.packages("foreign")
# Load the `foreign` library
library(foreign)
# Read the SPSS data
df <- read.spss("sesam2.sav",
to.data.frame = TRUE)
```
```{r, echo = FALSE}
library(foreign)
# Read the SPSS data
df <- read.spss("TCSM_student/sesam2.sav",
to.data.frame = TRUE)
```
## Data manipulation (optional)
Now that we have the Meta-Analysis data in RStudio, let's do a **few manipulations with the data**. These functions might come in handy when were conducting analyses later on.
Going back to the output of the `str()` function, we see that this also gives us details on the type of column data we have stored in our data. There a different abbreviations signifying different types of data.
```{r,echo=FALSE}
library(kableExtra)
Package<-c("num","chr","log","factor")
type<-c("Numerical","Character","Logical","Factor")
Description<-c("This is all data stored as numbers (e.g. 1.02)","This is all data stored as words","These are variables which are binary, meaning that they signify that a condition is either TRUE or FALSE","Factors are stored as numbers, with each number signifying a different level of a variable. A possible factor of a variable might be 1 = low, 2 = medium, 3 = high")
m<-data.frame(Package,type,Description)
names<-c("Abbreviation", "Type","Description")
colnames(m)<-names
kable(m)
```
### Converting to factors {#convertfactors}
Let's look at the variable `df$VIEWCAT`. This is a categorical variable, coded as a numerical one. We can have a look at this variable by typing the name of our dataset, then adding the selector `$` and then adding the variable we want to have a look at.
This variable is currently a numeric vector. We want it to be a factor: That's a categorical variable.
To convert this to a **factor** variable now, we use the `factor()` function.
```{r, results = "hide"}
df$VIEWCAT <- factor(df$VIEWCAT)
```
We now see that the variable has been **converted to a factor with the levels "1", "2", "3", and "4"**. We can assign different value labels as follows:
```{r, results = "hide"}
df$VIEWCAT <- factor(df$VIEWCAT, labels = c("Rarely", "Sometimes", "Regularly", "Often"))
```
### Selecting specific cases {#select}
It may often come in handy to **select certain cases for further analyses**, or to **exclude some studies in further analyses** (e.g., if they are outliers).
To do this, we can use the `[]` operator to index our data.
Let's say we want to get only the first 5 cases. We can select them like so:
```{r}
df[1:5, ]
```
Or let's say we only want the children younger than 36 months in the dataset. In this case, we can use **boolean indexing**: We create a TRUE / FALSE statement, and select the cases that are TRUE:
```{r}
df[df$AGE < 36, ]
```
Note that this approach can be used for any other type of data and variable. We can also use it to e.g., only select studies where VIEWCAT was equal to "Often" "typical":
```{r, eval = FALSE}
df[df$VIEWCAT == "Often", ]
```
```{r, echo = FALSE}
df[df$VIEWCAT == "Often", ][1:5, ]
```
### Changing cell values
Sometimes, even when preparing your data in EXCEL, you might want to **change values in RStudio once you have imported your data**.
To do this, we have to select a cell in our data frame in RStudio. This can be done by adding `[x,y]` to our dataset name, where **x** signifies the number of the **row** we want to select, and **y** signifies the number of the **column**.
To see how this works, let's select a variable using this command first:
```{r}
df[8,1]
```
We now see the **8th study** in our dataframe, and the value of this study for **Column 1 (participant ID)** is displayed. Let's say we had a typo in this name and want to have it changed. In this case, we have to give this exact cell a new value.
```{r}
df[8,1] <- 1001
```
Let's check if the value has changed.
```{r}
df[8,1]
```
You can also use this function to change any other type of data, including numericals and logicals. Only for characters, you have to put the values you want to insert in `""`.
<br><br>
---