Skip to content

Commit

Permalink
Changes to data_medstar_epcr_02_variable_management.Rmd
Browse files Browse the repository at this point in the history
- Part of #27
- Replaced spaces with underscores in address street name.
- Deleted data_medstar_epcr_02_variable_management.nb.html. It's unnecissary and just takes up extra space.
- Saved medstar_epcr_02_variable_management.rds as RDS instead of Feather. It doesn't seem like Feather ever really caught on.
  • Loading branch information
mbcann01 committed Feb 22, 2020
1 parent 8836b9f commit 45f59be
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 983 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ markdown/pairs_possible_matches.rds
markdown/rpairs_epiwt.rds

markdown/rpairs_jar.rds

NOTES.txt
139 changes: 29 additions & 110 deletions markdown/data_medstar_epcr_02_variable_management.Rmd
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
---
title: "Manage Variables in MedStar EPCR Data"
date: "Created: 2018-12-26 <br> Updated: `r Sys.Date()`"
output:
html_notebook:
toc: true
toc_float: true
css: custom-css.css
---

# Overview
Expand All @@ -24,28 +19,21 @@ Sys.setenv(TZ = "US/Central")

```{r message=FALSE}
library(tidyverse)
library(bfuncs)
```

medstar_epcr.feather was created in data_medstar_epcr_01_import.Rmd

```{r}
medstar_epcr <- feather::read_feather("/Volumes/sph_research/DETECT/one_year_data/medstar_epcr_01_import.feather")
```{bash}
open 'smb://uctnascifs.uthouston.edu/sph_research/DETECT'
```

```{r}
about_data(medstar_epcr) # 35,557 observations and 32 variables
medstar_epcr <- feather::read_feather("/Volumes/DETECT/one_year_data/medstar_epcr_01_import.feather")
```

[top](#top)








```{r}
dim(medstar_epcr) # 35,557 32
```


# Standardize character strings
Expand All @@ -72,14 +60,6 @@ rm(vars)
[top](#top)










# Remove "city of" from address_city value

```{r}
Expand All @@ -90,14 +70,6 @@ medstar_epcr <- medstar_epcr %>%
[top](#top)










# Separate names, dob's, and street addresses

* Some names have three parts (e.g., Mary Jo Blake). Here, we split up full name into first name and last name. For now, we ignore middle name(s). We may need to change this later.
Expand All @@ -119,19 +91,18 @@ medstar_epcr <- medstar_epcr %>%
)
```

Replaces spaces with underscores in address street name.

```{r}
about_data(medstar_epcr) # 35,557 observations and 39 variables
medstar_epcr <- medstar_epcr %>%
mutate(
address_street_name = stringr::str_replace_all(address_street_name, "\\s", "_")
)
```

[top](#top)








```{r}
dim(medstar_epcr) # 35,557 39
```


# Recode categories
Expand All @@ -158,9 +129,7 @@ medstar_epcr <- medstar_epcr %>%
) %>%
mutate_at(
vars(starts_with("detect")),
funs(
if_else(. == "N/A", NA_character_, .)
)
~ if_else(. == "N/A", NA_character_, .)
)
```

Expand All @@ -185,19 +154,9 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,557 observations and 40 variables
dim(medstar_epcr) # 35,557 40
```

[top](#top)










# Create indicator for completed DETECT screening

Expand All @@ -216,19 +175,9 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,557 observations and 56 variables
dim(medstar_epcr) # 35,557 56
```

[top](#top)










# Process numeric variables

Expand All @@ -240,19 +189,9 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,557 observations and 56 variables
dim(medstar_epcr) # 35,557 56
```

[top](#top)










# Duplicate (almost) rows

Expand Down Expand Up @@ -282,7 +221,7 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,557 observations and 57 variables
dim(medstar_epcr) # 35,557 57
```

How many pairs of duplicate pcr numbers are there?
Expand All @@ -302,14 +241,14 @@ For each of those pcr numbers, if the only thing that differs between the two ro
So, for each variable of interest, create a dummy variable that indicates if if values are different within incident pcr number

```{r}
medstar_epcr <- medstar_epcr %>%
medstar_epcr <- medstar_epcr %>%
mutate_at(
.vars = vars(
arrival_time, response_num, incident_pcr, incident_complaint, age,
name_full, dob, address_street, address_city, address_state,
address_zip, gender, race, symptoms, crew_sig, disposition
),
.funs = funs(diff = as.numeric(length(unique(.)) > 1))
.funs = list(diff = ~ as.numeric(length(unique(.)) > 1))
) %>%
ungroup()
```
Expand All @@ -325,8 +264,8 @@ medstar_epcr <- medstar_epcr %>%

```{r}
# Data checking
# medstar_epcr %>%
# filter(pcr_dup) %>%
# medstar_epcr %>%
# filter(pcr_dup) %>%
# select(incident_pcr, pcr_dup, ends_with("_diff"), aps_report, answered_count, diff_count)
```

Expand Down Expand Up @@ -365,7 +304,7 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,556 observations and 58 variables
dim(medstar_epcr) # 35,556 58
```


Expand All @@ -389,7 +328,7 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 35,555 observations and 58 variables
dim(medstar_epcr) # 35,555 58
```


Expand Down Expand Up @@ -449,39 +388,19 @@ medstar_epcr <- medstar_epcr %>%
```

```{r}
about_data(medstar_epcr) # 28,228 observations and 56 variables
dim(medstar_epcr) # 28,228 56
```

[top](#top)










# Save data

```{r}
feather::write_feather(
readr::write_rds(
medstar_epcr,
"/Volumes/sph_research/Detect/one_year_data/medstar_epcr_02_variable_management.feather"
"/Volumes/DETECT/one_year_data/medstar_epcr_02_variable_management.rds"
)
```

[top](#top)










# Session information

Expand Down
873 changes: 0 additions & 873 deletions markdown/data_medstar_epcr_02_variable_management.nb.html

This file was deleted.

0 comments on commit 45f59be

Please sign in to comment.