vignettes/data-frames.Rmd

---
title: "statar"
author: "Matthieu Gomez"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data.frames function}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---
### sum_up = summarize
`sum_up` prints detailed summary statistics (corresponds to Stata `summarize`)

```R
N <- 100
df <- tibble(
  id = 1:N,
  v1 = sample(5, N, TRUE),
  v2 = sample(1e6, N, TRUE)
)
sum_up(df)
df %>% sum_up(starts_with("v"), d = TRUE)
df %>% group_by(v1) %>%  sum_up()
```

## tab = tabulate
`tab` prints distinct rows with their count. Compared to the dplyr function `count`, this command adds frequency, percent, and cumulative percent.

```R
N <- 1e2 ; K = 10
df <- tibble(
  id = sample(c(NA,1:5), N/K, TRUE),
  v1 = sample(1:5, N/K, TRUE)       
)
tab(df, id)
tab(df, id, na.rm = TRUE)
tab(df, id, v1)
```

## join = merge
`join` is a wrapper for dplyr merge functionalities, with two added functions

- The option `check` checks there are no duplicates in the master or using data.tables (as in Stata).

  ```r
  # merge m:1 v1
  join(x, y, kind = "full", check = m~1) 
  ```
- The option `gen` specifies the name of a new variable that identifies non matched and matched rows (as in Stata).

  ```r
  # merge m:1 v1, gen(_merge) 
  join(x, y, kind = "full", gen = "_merge") 
  ```

- The option `update` allows to update missing values of the master dataset by the value in the using dataset