Demo-ing automation for CA-CODE simple update to 2021.
- Clone repository to computer
- Add data inputs from CA-CODE_Warehouse folder on Dropbox to local
/data
folder - The current files in
/src/data-management
are for the Simple Update 2000-2021. If producing estimates for a different set of years, replace with appropriatedata-management
code in/src/archive
. - Manually set variables in
/src/prepare-session/set-inputs
- Do not make changes to any other scripts
- Run make file
- View results locally in
/gen/results/output
and/gen/visualizations/output
This project framework was conceptualized using resources from the Tilburg Science Hub, in accordance with recommended workflow and data management principles for research projects.
Source code is made available in the src
folder, with sub-folders for each stage of the project pipeline. Source code contains all code that is required to execute the project's pipeline. There is a make.R
file in the main directory folder which makes explicit how the source code needs to be run.
Our pipeline consists of seven main stages:
prepare-session
data-management
estimation
prediction
squeezing
uncertainty
results
There are additional folders in /src
which contain code not referenced in the make.R
file. These folders are:
adhoc-requests
: Code used to complete one-off requests that are not part of routine estimation process.aggregation
: Age/sex aggregation of estimates. Can only be run after results are generated for all age/sex groups.archive
: Containsdata-management
source code from previous update rounds.visualizations
: Code used to generate ad-hoc visualizations after producing results.
Generated files are all files that are created by running the source code (/src
) on the raw data (/data
). They are stored in the gen
folder. The /gen
subdirectories match the pipeline stages.
Each subdirectory in gen
contains the following subdirectories:
input
: any required input files to run this step of the pipelinetemp
: temporary files, such as an Excel dataset that needs to be converted into a CSVoutput
: stores the final result of the pipeline stageaudit
: quality checks, diagnostic information on the performance of each step in the pipeline. For example, in/data-management/audit
this could be a txt file with information on missing observations in the final dataset.