-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathdata-frames.Rmd
executable file
·60 lines (48 loc) · 1.42 KB
/
data-frames.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: "statar"
author: "Matthieu Gomez"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Data.frames function}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
### sum_up = summarize
`sum_up` prints detailed summary statistics (corresponds to Stata `summarize`)
```R
N <- 100
df <- tibble(
id = 1:N,
v1 = sample(5, N, TRUE),
v2 = sample(1e6, N, TRUE)
)
sum_up(df)
df %>% sum_up(starts_with("v"), d = TRUE)
df %>% group_by(v1) %>% sum_up()
```
## tab = tabulate
`tab` prints distinct rows with their count. Compared to the dplyr function `count`, this command adds frequency, percent, and cumulative percent.
```R
N <- 1e2 ; K = 10
df <- tibble(
id = sample(c(NA,1:5), N/K, TRUE),
v1 = sample(1:5, N/K, TRUE)
)
tab(df, id)
tab(df, id, na.rm = TRUE)
tab(df, id, v1)
```
## join = merge
`join` is a wrapper for dplyr merge functionalities, with two added functions
- The option `check` checks there are no duplicates in the master or using data.tables (as in Stata).
```r
# merge m:1 v1
join(x, y, kind = "full", check = m~1)
```
- The option `gen` specifies the name of a new variable that identifies non matched and matched rows (as in Stata).
```r
# merge m:1 v1, gen(_merge)
join(x, y, kind = "full", gen = "_merge")
```
- The option `update` allows to update missing values of the master dataset by the value in the using dataset