forked from Hendrik147/HR_Analytics_in_R_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
16-commuting-time.Rmd
181 lines (137 loc) · 5.55 KB
/
16-commuting-time.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# Commuting time {#commuting-time}
```{r commuting-time, include=FALSE}
chap <- 17
lc <- 0
rq <- 0
# **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`**
# **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`**
knitr::opts_chunk$set(
tidy = FALSE,
out.width = '\\textwidth',
fig.height = 4,
warning = FALSE
)
options(scipen = 99, digits = 3)
# Set random number generator see value for replicable pseudorandomness. Why 76?
# https://www.youtube.com/watch?v=xjJ7FheCkCU
set.seed(76)
```
```{r}
do_eval=file.exists("commute_times.csv")
```
Commuting time is often mentioned as a valid reason for leaving one employer for another.
You are asked to calculate the commuting time for the employee working at your company. You decide that hyphotetically they must all arrive in the office at 8.30 am. In the Google you identified that there is google api for doing this, called [Distance Matrix API](https://developers.google.com/maps/documentation/distance-matrix/start?csw=1), which works for a matrix of origins and destinations.
The information returned is based on the recommended route between start and end points and consists of rows containing duration and distance values for each pair." To use the Distance Matrix API, you must first activate the API in the Google Cloud Platform Console and obtain the proper authentication credentials. You need to provide your own API key in each request.
The documentation on how to do this is located here: https://developers.google.com/maps/documentation/distance-matrix/intro
A file with 200 fake names and addresses has been put together for you. The task is to add the commuting time next to each staff member. Normally google will charge us one dollar for checking 200 commuting times.
Please note that the departure_time specifies the desired time of departure must be in POSIXct. format and must be in the future (i.e.greater than sys.time()). If no value is specified it defaults to Sys.time().
Please note you can only specify one of arrival_time or departure_time, not both. If both are supplied, departure_time will be used.
```{r}
library(googleway)
library(tidyverse)
library(lubridate)
# Set up here your api key key=
key = 'AIzaSyCOly69PDrlPlM42I378p2lmvNs8I2w'
d <- dmy_hms("10/09/2018 08:30")
arrival <- as.numeric(d)
from <- c("SE3+8UQ,UK")
to <- c("E14+5EU,UK")
test <- google_distance(origins = from,
destinations = to,
mode = "walking",
arrival_time = as.POSIXct(arrival, origin = "1970-01-01", tz = "UTC"),
key=key)
test$rows$elements[[1]]$distance
```
To make all this simpler, we put together an R function which can be used to analyse many results. Note this calls all modes of transport for every employee making it more costly than just running one mode per employee.
```{r}
#' Get distance data between two points based on all the travel mode options. Works for many origin points.
#'
#' @param x A vector of origins in address or postcode format
#' @param dest A single destinationin address or postocde format
#' @param arrival_time A POSIXct datetime that folks need to arrive by
#' @param key A google distance API key
#' @param ... Additional options to pass to `google_distance()`
#'
#' @return Data.frame containing (typically) 4 rows per input element
google_distance_all = function(x, dest, arrival_time, key, ...){
# simple hygeine stuff
gd = purrr::possibly(
memoise::memoise(
google_distance)
, "Fail"
)
# Prep dataset
interested_in = expand.grid(from=x,
mode=c("driving", "walking", "bicycling", "transit"),
stringsAsFactors = FALSE)
# Perform google_distance calls for all combos
purrr::map2(interested_in$from,interested_in$mode,
~gd(.x, dest, mode=.y,
arrival_time = arrival_time,
key=key)
) %>%
# Extract relevant section
purrr::map("rows") %>%
purrr::map("elements") %>%
purrr::flatten() %>%
# Simplify the data.frames
purrr::map(unclass) %>%
purrr::map_df(purrr::flatten) %>%
# Add original lookup values
cbind(interested_in)
}
```
This needs some packages installed to work
```{r}
if(!require(purrr)) install.packages("purrr")
if(!require(memoise)) install.packages("memoise")
if(!require(googleway)) install.packages("googleway")
```
We need some information about our google API account, where our office is, and when we're testing commute times for.
```{r}
office = "E14 5EU"
monday_9am = as.POSIXct("2018-12-03 09:00")
```
Then we can use this function to get data for our 200 employees
```{r eval=FALSE}
the_200 <- read_csv("https://hranalytics.netlify.com/data/200_staff_members.csv")
```
```{r read_data_commuting, echo=FALSE, warning=FALSE, message=FALSE}
the_200 <- read_csv("data/200_staff_members.csv")
```
```{r eval=do_eval}
results = google_distance_all(
the_200$Postcode,
office,
arrival_time = monday_9am,
key = key
)
write_csv(results, file.path("data"), "commute_times.csv")
```
```{r echo=FALSE, message=FALSE, warning=FALSE}
results = read_csv("data/commute_times.csv")
```
```{r}
results
```
Now that we have this data we can answer questions about our employee's commute times.
Who has the longest commutes by car?
```{r}
results %>%
filter(mode == "driving") %>%
mutate(hours=value/60^2) %>%
top_n(10, hours) %>%
arrange(desc(hours))
```
We can look at the distributions overall too.
```{r}
results %>%
mutate(hours=value/60^2) %>%
ggplot() +
aes(x=hours) +
geom_density() +
scale_x_log10() +
facet_wrap(~mode) +
theme_minimal()
```