An end-to-end example IDI research project for training and encouraging good practice.
New Zealand’s Integrated Data Infrastructure (IDI) enables incredible research opportunities. However, it can be an intimidating environment to work in for unfamiliar researchers. This exemplar guides new researchers to the IDI through a simple end-to-end project - focused on the practical aspects of managing a project and manipulating the data. The project reflects our current best practice, and we hope that it provides a useful guide for researchers to learn from.
This repository provides the code and working files for the exemplar project. It should be used alongside the guidance document: IDI exemplar project: Guidance and training. This contains a description of the project and guides a new research through the project steps. We have also included a range of tips to help researchers develop good practices in the IDI.
The exemplar project includes demonstrations of our R tools for assembling, summarising, confidentialising, and checking. While you can follow the exemplar project without using any of these tools, they improve the efficiency of our delivery, and hence we recommend them to other researchers. Reference material on these tools can be found on the Agency's guidance page (here). Key references for these are as follows:
- Assembly tool primer and guide
- Assembly tool intro and training
- Assembly tool training presentation
- Assembly tool demonstration
- Summarise and confidentialise training
- Self-checking tools training
The exemplar can be downloaded as a zip file from GitHub. We also will aim to make a version of the exemplar available inside the data lab.
A small number of files have been excluded from the version of the exemplar on GitHub. These are primarily files it is not appropriate to remove from the data lab due to confidentiality. For example, the version in the lab should include files before and after confidentiality rules are applied to demonstrate the application of the rules. But only the files with confidentiality rules applied can be removed from the lab.
Most of these omitted files are generated by running the code included in the GitHub version of the exemplar. So researchers should gain similar learnings regardless of which version they use.
To begin using the exemplar:
- Either download it from GitHub or locate it on the data lab wiki
- Copy it to your working location and unzip it.
The exemplar also contains a copy of the Dataset Assembly Tool. If you plan to use this tool, please see the separate repositry and guidance document for the latest version and installation instructions.
This exemplar has been written to guide researchers working with integrated data in the data lab. While some parts of this guide are specific to the data lab environment (such as the confidentiality and output process), many of the underlying ideas are applicable to analytic projects in general. Hence, we would encourage researchers looking for an effective way to arrange their analytic projects to use this exemplar as a starting point for developing their best practice.
Social Investment Agency (2022). IDI exemplar project. Source code. https://github.com/nz-social-investment-agency/idi_exemplar_project
From time-to-time the Social Investment Agency provides trainings in the use of the IDI, including the exemplar.
General and training enquiries can be sent to [email protected]