-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
125 lines (99 loc) · 3.27 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: crosswalkr
output: md_document
---
# crosswalkr <img src="man/figures/logo.png" align="right" />
```{r, include = FALSE}
options(width = 100)
```
[![R build
status](https://github.com/btskinner/crosswalkr/workflows/R-CMD-check/badge.svg)](https://github.com/btskinner/crosswalkr/actions)
[![GitHub release](https://img.shields.io/github/release/btskinner/crosswalkr.svg)](https://github.com/btskinner/crosswalkr)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/crosswalkr)](http://cran.r-project.org/package=crosswalkr)
## Overview
This package offers a pair of functions, `renamefrom()` and
`encodefrom()`, for renaming and encoding data frames using external
crosswalk files. It is especially useful when constructing master
data sets from multiple smaller data sets that do not name or encode
variables consistently across files. Based on `renamefrom` and
`encodefrom` [Stata commands written by Sally Hudson and
team](https://github.com/slhudson/rename-and-encode).
## Installation
Install the latest release version from CRAN with
```{r, eval = FALSE}
install.packages('crosswalkr')
```
Install the latest development version from Github with
```{r, eval = FALSE}
devtools::install_github('btskinner/crosswalkr')
```
## Usage
```{r, message = FALSE}
library(crosswalkr)
library(dplyr)
library(haven)
```
```{r}
## starting data frame
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
fips = c(21,47,51),
region = c('South','South','South'))
df
## crosswalk with which to convert old names to new names with labels
cw <- data.frame(old_name = c('state','fips'),
new_name = c('stname','stfips'),
label = c('Full state name', 'FIPS code'))
cw
```
### Renaming
Convert old variable names to new names and add labels from crosswalk.
```{r}
df1 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, label = label)
df1
```
Convert old variable names to new names using old names as labels
(ignoring labels in crosswalk).
```{r}
df2 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, name_label = TRUE)
df2
```
Convert old variable names to new names, but keep unmatched old names
in the data frame.
```{r}
df3 <- renamefrom(df, cw_file = cw, raw = old_name, clean = new_name, drop_extra = FALSE)
df3
```
### Encoding
```{r}
## starting data frame
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
stfips = c(21,47,51),
cenregnm = c('South','South','South'))
df
## use state crosswalk data file from package
cw <- get(data(stcrosswalk))
cw
```
Create a new column with factor-encoded values
```{r}
df$state2 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr)
df
```
Create a new column with labelled values.
```{r}
## convert to tbl_df
df <- tibble::as_tibble(df)
df$state3 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr)
```
Create new column with factor-encoded values (ignores the fact that `df` is a tibble)
```{r}
df$state4 <- encodefrom(df, var = state, cw_file = cw, raw = stname, clean = stfips, label = stabbr, ignore_tibble = TRUE)
```
Show factors with labels:
```{r}
as_factor(df)
```
Show factors without labels:
```{r}
zap_labels(df)
```