forked from MattCowgill/readabs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
167 lines (117 loc) · 6.92 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.retina = 2,
fig.path = "man/figures/README-"
)
version <- as.vector(read.dcf('DESCRIPTION')[, 'Version'])
version <- gsub('-', '.', version)
```
# readabs <img src="man/figures/logo.png" align="right" height="139" />
<!-- badges: start -->
[![R build status](https://github.com/mattcowgill/readabs/workflows/R-CMD-check/badge.svg)](https://github.com/mattcowgill/readabs/actions)
[![codecov status](https://img.shields.io/codecov/c/github/mattcowgill/readabs.svg)](https://codecov.io/gh/MattCowgill/readabs)
[![CRAN status](https://www.r-pkg.org/badges/version/readabs)](https://cran.r-project.org/package=readabs)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)
<!-- badges: end -->
## Overview
{readabs} helps you easily download, import, and tidy data from the Australian Bureau of Statistics within R.
This saves you time manually downloading and tediously tidying data and allows you to spend more time on your analysis.
## Installation
Install the latest CRAN version of {readabs} with:
```{r cran-installation, eval = FALSE}
install.packages("readabs")
```
You can install the development version of {readabs} from GitHub with:
```{r gh-installation, eval = FALSE}
# if you don't have devtools installed, first run:
# install.packages("devtools")
devtools::install_github("mattcowgill/readabs")
```
## Usage
The main function in {readabs} is `read_abs()`, which downloads, imports, and tidies time series data from the ABS website.
There are some other functions you may find useful.
* `read_abs_local()` imports and tidies time series data from ABS spreadsheets stored on a local drive. Thanks to Hugh Parsonage for contributing to this functionality.
* `separate_series()` splits the `series` column of a tidied ABS time series spreadsheet into multiple columns, reducing the manual wrangling that's needed to work with the data. Thanks to David Diviny for writing this function.
* `download_abs_data_cube()` downloads a data cube (ie. non-time series spreadsheet) from the ABS website. Thanks to David Diviny for writing this function.
* `read_cpi()` imports the Consumer Price Index numbers as a two-column tibble: `date` and `cpi`. This is useful for joining to other series to adjust data for changes in consumer prices.
* `read_payrolls()` downloads, imports, and tidies tables from the ABS Weekly Payroll Jobs dataset.
* `read_awe()` returns a long time series of Average Weekly Earnings data.
## Download, import, and tidy ABS time series data
To download all the time series data from an ABS catalogue number to your disk, and import the data to R as a single tidy data frame, use `read_abs()`.
First we'll load {readabs} and the {tidyverse}:
```{r load-packages, results=FALSE, warning=FALSE}
library(readabs)
library(tidyverse)
library(readxl)
```
Now we'll create one data frame that contains all the time series data from the Wage Price Index, catalogue number 6345.0:
```{r all-wpi}
all_wpi <- read_abs("6345.0")
```
This is what it looks like:
```{r str-wpi}
str(all_wpi)
```
It only takes you a few lines of code to make a graph from your data:
```{r all-in-one-example}
all_wpi %>%
filter(series == "Percentage Change From Corresponding Quarter of Previous Year ; Australia ; Total hourly rates of pay excluding bonuses ; Private and Public ; All industries ;",
!is.na(value)) %>%
ggplot(aes(x = date, y = value, col = series_type)) +
geom_line() +
theme_minimal() +
labs(y = "Annual wage growth (per cent)")
```
In the example above we downloaded all the time series from a catalogue number. This will often be overkill. If you know the data you need is in a particular table, you can just get that table like this:
```{r wpi1}
wpi_t1 <- read_abs("6345.0", tables = 1)
```
If you want multiple tables, but not the whole catalogue, that's easy too:
```{r wpi1_5}
wpi_t1_t5 <- read_abs("6345.0", tables = c("1", "5a"))
```
In most cases, the `series` column will contain multiple components, separated by ';'. The `separate_series()` function can help wrangling this column.
For more examples, please see the vignette on working with time series data (run `browseVignettes("readabs")`).
## Download ABS data cubes
The ABS (generally) releases time series data in a standard format, which allows `read_abs()` to download, import and tidy it (see above). But not all ABS data is time series data - the ABS also releases data as 'data cubes'. These are all formatted in their own, unique way.
Unfortunately, because data cubes are all formatted in their own way, there is no one function that can import tidy data cubes for you in the same way that `read_abs()` works with all time series. But `{readabs}` still has functions that can help.
### Doing it manually
The `download_abs_data_cube()` function can download an ABS data cube for you. It works with any data cube on the ABS website.
For example, let's say you wanted to download table 4 from _Weekly Payroll Jobs and Wages in Australia_. This code would do the trick:
```{r download-data-cube}
payrolls_t4_path <- download_abs_data_cube("weekly-payroll-jobs-and-wages-australia", "004")
payrolls_t4_path
```
The `download_abs_data_cube()` function downloads the file and returns the full file path to the saved file. You can then pipe that in to another function:
```{r}
payrolls_t4_path %>%
read_excel(sheet = "Payroll jobs index",
skip = 5)
```
### Using convenience functions for select data cubes
As it happens, if you want the ABS Weekly Payrolls data, you don't need to use `download_abs_data_cube()` directly. Instead, there is a convenience function available that downloads, imports, and tidies the data for you:
```{r}
read_payrolls()
```
There is also a convenience function available for data cube GM1 from the monthly Labour Force data, which contains labour force gross flows:
```{r}
read_lfs_grossflows()
```
## Bug reports and feedback
GitHub issues containing error reports or feature requests are welcome. Please try to make a [reprex](https://reprex.tidyverse.org) (a minimal, reproducible example) if possible.
Alternatively you can email the package maintainer at mattcowgill at gmail dot com.
## Disclaimer
The `{readabs}` package is not associated with the Australian Bureau of Statistics.
All data is provided subject to any restrictions and licensing arrangements
noted on the ABS website.
## Awesome Official Statistics Software
[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](https://github.com/SNStatComp/awesome-official-statistics-software)
We're pleased to be included in a [list of software](https://github.com/SNStatComp/awesome-official-statistics-software) that can be used to work with official statistics.