-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathonboarding.Rmd
108 lines (74 loc) · 6.01 KB
/
onboarding.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: "INTERACT Onboarding"
author: "Daniel Fuller"
date: '2018-06-18'
output:
html_document:
keep_md: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R + RStudio
The main tutorial, Intro To R, linked below, focuses on data wrangling and not on data analysis. If possible, please complete this tutorial using the RStudio software downloaded to your local computer, and not by using RStudio in your internet browser. Materials for the tutorial, including the instructional slides, can be downloaded as a ZIP file.
### Tutorial
Coding style is important. Follow the [Hadley](http://adv-r.had.co.nz/Style.html) or [Goolge](https://google.github.io/styleguide/Rguide.xml) style guide. We also use `snake_case` in INTERACT projects. All variables and data should be all lower case with an underscore between each word. All new variables you create or data you export should follow the same convention.
Complete this tutorial. https://github.com/AmeliaMN/IntroToR/blob/master/README.md
* Not that you should not use on the online RStudio environment but run everything locally on your computer. That means you will need to download the repo as discussed at the beginning of the tutorial. It also means that you will need to be familiary with installing R packages. Here is a quick demo:
- R can do many statistical and data analyses. They are organized in so-called packages or libraries. With the standard installation, most common packages are installed. There are lots of packages. It’s probable that if you have thought about the analysis, there is a package that is able to do it already. There are two basic steps to using a package:
**Installing the package**
`install.packages("ggplot2")`
**Loading the package**
`library(ggplot2)`
##### Common Errors in R ([from Chester Ismay and Patrick C. Kennedy](https://ismayc.github.io/rbasics-book/index.html))
**1. Error: `could not find function`**
This error usually occurs when a package has not been loaded into R via library, so R does not know where to find the specified function. It’s a good habit to use the library functions on all of the packages you will be using in the top R chunk in your R Markdown file, which is usually given the chunk name setup.
**2. Error: `object not found`**
This error usually occurs when your R Markdown document refers to an object that has not been defined in an R chunk at or before that chunk. You’ll frequently see this when you’ve forgotten to copy code from your R Console sandbox back into a chunk in R Markdown.
**3. Misspellings**
One of the most frustrating errors you can encounter in R is when you misspell the name of an object or function. R is not forgiving on this, and it won’t try to automatically figure out what you are referring to. You’ll usually be able to quite easily figure out that you made a typo because you’ll receive an object not found error.
Remember that R is also case-sensitive, so if you called an object Name and then try to call it name later on without name being defined, you’ll receive an error.
**4. Unmatched parenthesis**
Another common error is forgetting or neglecting to finish a call to a function with a closing ). An example of this follows:
`mean(x = c(1, 5, 10, 52)`
```
Error in parse(text = x, srcfile = src) :
<text>:2:0: unexpected end of input
1: mean(x = c(1, 5, 10, 52)
^
Calls: <Anonymous> ... evaluate -> parse_all -> parse_all.character -> parse
Execution halted
Exited with status 1.
```
In this case, there needs to be one more parenthesis added at the end of your call to mean:
`mean(x = c(1, 5, 10, 52))`
**5. General guidelines**
Try your best to not be intimidated by R errors. Oftentimes, you will find that you are able to understand what they mean by carefully reading over them. When you can’t, carefully look over your R Markdown file again. You might also want to clear out all of your R environment and start at the top by running the chunks. Remember to only include what you deem your reader will need to follow your analysis.
Even people who have worked with R and programmed for years still use Google and support websites like Stack Overflow to ask for help with their R errors or when they aren’t sure how to do something in R. I think you’ll be pleasantly surprised at just how much support is available.
### INTERACT Tasks
##### CCHS Data
1. Read in the data `cchs.csv` (use `read_csv`)
2. Quickly view the first 10 rows of data (use `head`)
3. Display the type of variable for all variables in the dataset
4. Create a scatterplot on height and weight (use `ggplot2`)
5. Clean the height and weight data so missing data coded as numbers (a common SPSS practice) become *NA* in R (use `forcats`)
6. Create a new scatterplot on height and weight with the clean data (use `ggplot2`)
7. Recode BMI to represent weight categories from underweight to obese (use `forcats`)
8. Recode Province to include the province names instead of numbers
9. Compute the mean and standard deviation of height and weight (use `dplyr::summarize`)
10. Compute the mean and standard deviation of height and weight based on Province. Then compute the mean and standard deviation of height and weight based on weight categories. (use `group_by` and `dplyr::summarize`)
##### Accel Data
1. Read in the 2 data files `accel.csv` (use `read_csv`)
2. Create a new variable that indicates participante 1 and participant 2
3. Append (stack) the 2 files together (use `dplyr::bind_rows`)
4. Quickly view the first 10 rows of data (use `head`)
5. Display the type of variable for all variables in the dataset
6. Create a scatterplot on x_axis and y_axis (use `ggplot2`)
7. Convert the time data to time format (use `lubridate`)
8. Compute the sum each of axis by second and by participant (use `group_by` and `dplyr::summarize`)
9. Compute the gravity subtracted vector magnitude `sqrt(x^2, y^2, z^2)-1` on the new data for each participant
## R Markdown
### Tutorial
https://www.youtube.com/watch?v=-apyD5f9nwg
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
### INTERACT Tasks