-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from aim-rsf/prep-share
copy files from cprd-share repo
- Loading branch information
Showing
23 changed files
with
3,143 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
{ | ||
"files": [ | ||
"README.md" | ||
], | ||
"imageSize": 100, | ||
"commit": false, | ||
"commitType": "docs", | ||
"commitConvention": "angular", | ||
"contributors": [ | ||
{ | ||
"login": "RayStick", | ||
"name": "Rachael Stickland", | ||
"avatar_url": "https://avatars.githubusercontent.com/u/50215726?v=4", | ||
"profile": "http://linkedin.com/in/rstickland-phd", | ||
"contributions": [ | ||
"maintenance", | ||
"projectManagement", | ||
"code", | ||
"doc", | ||
"ideas" | ||
] | ||
}, | ||
{ | ||
"login": "Rainiefantasy", | ||
"name": "Mahwish Mohammad", | ||
"avatar_url": "https://avatars.githubusercontent.com/u/43926907?v=4", | ||
"profile": "https://github.com/Rainiefantasy", | ||
"contributions": [ | ||
"maintenance", | ||
"projectManagement", | ||
"code", | ||
"doc", | ||
"ideas" | ||
] | ||
}, | ||
{ | ||
"login": "BatoolMM", | ||
"name": "Batool Almarzouq", | ||
"avatar_url": "https://avatars.githubusercontent.com/u/53487593?v=4", | ||
"profile": "https://batool-almarzouq.netlify.app/", | ||
"contributions": [ | ||
"review" | ||
] | ||
}, | ||
{ | ||
"login": "amallon", | ||
"name": "Ann-Marie Mallon", | ||
"avatar_url": "https://avatars.githubusercontent.com/u/35258603?v=4", | ||
"profile": "https://github.com/amallon", | ||
"contributions": [ | ||
"projectManagement", | ||
"ideas" | ||
] | ||
} | ||
], | ||
"contributorsPerLine": 7, | ||
"skipCi": true, | ||
"repoType": "github", | ||
"repoHost": "https://github.com", | ||
"projectName": "cprd-data-wrangle", | ||
"projectOwner": "aim-rsf" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
name: Bug report | ||
about: Create a report to help us improve | ||
title: '' | ||
labels: 'bug' | ||
assignees: '' | ||
|
||
--- | ||
|
||
**Description of the bug:** | ||
<!--A clear description of what the bug is.--> | ||
- | ||
- | ||
|
||
**Steps to reproduce the behaviour:** | ||
<!-- include screenshots if helpful --> | ||
<!-- 1. Go to '...' --> | ||
<!-- 2. Click on '....' --> | ||
<!-- 3. Scroll down to '....' --> | ||
<!-- 4. See error --> | ||
- | ||
- | ||
|
||
**Expected behaviour:** | ||
<!-- A clear description of what you expected to happen. --> | ||
- | ||
- | ||
|
||
**My software set-up:** | ||
<!-- E.g. Windows, MacOS --> | ||
<!-- E.g. Python version 3.9 --> | ||
- | ||
- | ||
|
||
**Additional context:** | ||
<!-- Add any other context about the problem here.--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
name: Feature request | ||
about: Suggest a specific idea for this project | ||
title: '' | ||
labels: '' | ||
assignees: '' | ||
|
||
--- | ||
|
||
<!-- Please add the appropriate labels to this issue, if possible --> | ||
|
||
**I would like to see this change implemented:** | ||
<!-- A clear description of what change you want to see and why. --> | ||
<!-- A clear description of any solutions or features you've considered.--> | ||
- | ||
- | ||
|
||
**Additional context or further questions:** | ||
<!-- Add any other context about the feature request here, including any questions you might have.--> | ||
- | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
name: Question or Discussion | ||
about: Not a bug report or direct feature request | ||
title: '' | ||
labels: 'question' | ||
assignees: '' | ||
|
||
--- | ||
|
||
**Question/topic for discussion:** | ||
<!--A clear description of what you are asking and why.--> | ||
- | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Configuration for welcome - https://github.com/behaviorbot/welcome | ||
|
||
# Configuration for new-issue-welcome - https://github.com/behaviorbot/new-issue-welcome | ||
|
||
# Comment to be posted to on first time issues | ||
newIssueWelcomeComment: > | ||
π Thank you for opening your first issue in this repo! Please check our [contribution guidelines](CONTRIBUTING.md) and be guided by the issue template. | ||
# Configuration for new-pr-welcome - https://github.com/behaviorbot/new-pr-welcome | ||
|
||
# Comment to be posted to on PRs from first time contributors in your repository | ||
newPRWelcomeComment: > | ||
π Thanks for opening your first pull request in this repo! Please check our [contribution guidelines](CONTRIBUTING.md) and be guided by the PR template. | ||
# Configuration for first-pr-merge - https://github.com/behaviorbot/first-pr-merge | ||
|
||
# Comment to be posted to on pull requests merged by a first time user | ||
firstPRMergeComment: > | ||
π Congrats on merging your first pull request in this repo! We appreciate your contribution! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Documentation: | ||
- changed-files: | ||
- any-glob-to-any-file: ['README.md','CONTRIBUTING.md','cprd-code-browser.md'] | ||
|
||
Internal: | ||
- changed-files: | ||
- any-glob-to-any-file: ['.github/*','.all-contributorsrc','.gitignore'] | ||
|
||
# Testing: | ||
# - changed-files: | ||
# - any-glob-to-any-file: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
<!-- Note which issues are linked to this pull request (PR) --> | ||
<!-- If this PR is enough to close them you can write something like "Closes #10 and closes #12" --> | ||
<!-- If you just want to reference them without closing them, you can add something like "References #10" --> | ||
Closes # | ||
|
||
<!-- Add a short description of the PR here - what you have changed and why --> | ||
|
||
## Proposed Changes | ||
<!-- List major changes here, so that the reviewers can have a bit more context --> | ||
- | ||
- | ||
- | ||
|
||
## Checklist before review: | ||
<!-- You're invited to open a draft PR so people can see what you are working on sooner --> | ||
- [ ] Please comment on my PR while it's a draft and give me feedback on the development! | ||
- [ ] I added everything I wanted to add to this PR, please review! | ||
- [ ] The title of this PR is clear and self-explanatory. | ||
- [ ] I added any appropriate labels to this PR. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
name: Auto Author Assign | ||
|
||
on: | ||
issues: | ||
types: [ opened, reopened ] | ||
pull_request_target: | ||
types: [ opened, reopened ] | ||
|
||
permissions: | ||
pull-requests: write | ||
issues: write | ||
|
||
jobs: | ||
assign-author: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: toshimaru/[email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
name: auto-label | ||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.event.number }}-${{ github.event.ref }} | ||
cancel-in-progress: true | ||
on: # yamllint disable-line rule:truthy | ||
pull_request_target | ||
|
||
jobs: | ||
pr: | ||
permissions: | ||
contents: read | ||
pull-requests: write | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/labeler@v5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Contributing to cprd-data-wrangle | ||
|
||
We warmly welcome contributions to the cprd-data-wrangle project, however small or large. | ||
|
||
This document provides guidelines for contributing to this repository. | ||
|
||
## How to Contribute | ||
|
||
### Reporting Issues | ||
|
||
- **Bug Reports**: If you find a bug, please open an issue with a clear description of the problem and steps to reproduce it. | ||
- **Feature Requests**: Suggestions for new features or improvements are always welcome. Please open an issue to discuss your ideas. | ||
|
||
### Making Changes | ||
|
||
1. **Fork the Repository**: Start by forking the repository to your GitHub account. | ||
2. **Create a Feature Branch**: Create a new branch for your feature or fix. | ||
3. **Make Your Changes**: Implement your changes, adhering to coding standards and best practices for Markdown and Python. | ||
4. **Test Your Changes**: Ensure your changes do not break any existing functionality. | ||
5. **Document Your Changes**: Include comments alongside your code, and update and relevant documentation files, as needed. | ||
6. **Submit a Pull Request**: Open a pull request from your feature branch to the main branch of the original repository. | ||
|
||
## Questions or Need Help? | ||
|
||
If you have questions or need help, feel free to open an issue for discussion or reach out to the maintainers directly: | ||
|
||
Rachael Stickland ([email protected]) and Mahwish Mohammad ([email protected]). | ||
|
||
Thank you for contributing to cprd-data-wrangle! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,86 @@ | ||
# cprd-data-wrangle | ||
|
||
This repository is for anyone new to working with datasets released by the Clinical Practice Research Datalink (CPRD). Researchers tasked with understanding the database tables, then querying and filtering to create a research cohort, may find our pre-processing pipeline and interactive notebooks a helpful guide to getting started. | ||
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> | ||
[![All Contributors](https://img.shields.io/badge/all_contributors-4-orange.svg?style=flat-square)](#contributors-) | ||
<!-- ALL-CONTRIBUTORS-BADGE:END --> | ||
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) | ||
|
||
π’ This repository will be populated soon! π’ In the meantime, check out this other repository for an introduction to synthetic data, in the context of health-care and biomedical research: https://github.com/aim-rsf/Synthetic-Data | ||
|
||
# π Welcome | ||
|
||
## π₯ Who is this repository for? | ||
|
||
This repository is for anyone new to working with datasets released by the [Clinical Practice Research Datalink (CPRD)](https://cprd.com). Researchers tasked with understanding the database tables, then querying and filtering to create a research cohort, may find our pre-processing pipeline and interactive notebooks a helpful guide to getting started. | ||
|
||
**Please note:** | ||
|
||
- **You need your own copy of CPRD's synthetic/real data to run the code. This repository does not contain any data files.** | ||
|
||
- **CPRD are moving towards a TRE model of data access, instead of a researcher downloading data onto their own computer. Read more [here](https://www.cprd.com/cprd-safe-our-trusted-research-environment).** | ||
|
||
- **This is a work in progress repository. If you would like to suggest or contribute a change, please read our [contributor guide](CONTRIBUTING.md).** | ||
|
||
# π₯ Project Goals | ||
|
||
We aim to streamline the process for researchers using CPRD datasets, with the creation of clear documentation, efficient data management strategies and analytical pipelines. We will start with development of workflows utilising CPRD's medium fidelity synthetic datasets because they resemble | ||
> "the real world CPRD data with respect to the data types, data values, data formats, data structure and table relationships" [ref](https://cprd.com/synthetic-data). | ||
**New to Synthetic Data?** Read an introduction [here](https://github.com/aim-rsf/Synthetic-Data). | ||
|
||
We will create and share documentation & code, in openly available languages. We will start by loading the data into a relational database and summarising some of its main features. | ||
|
||
By working with our research collaborators, we aim to test workflows written with synthetic datasets on the real datasets to ensure transferability and utility. An anticipated mismatch will be the size of the data files and possibly the variability in file format. Please reach out to us if you want to test our code on your real CPRD data, or have any feedback on improving transferability and utility. | ||
|
||
CPRD's most recently released data specifications can be found [here for the real datasets](https://cprd.com/primary-care-data-public-health-research) and [here for the synthetic datasets](https://cprd.com/synthetic-data). | ||
|
||
# π» Current content | ||
|
||
We include information on [CPRD's Code Browser tool](cprd-code-browser.md) and how to request access to it. | ||
|
||
The [code-for-aurum](code-for-aurum) folder uses `Python` and `postgreSQL` to create a pre-processing workflow for CPRD Aurum data which includes a conversion of data file format for compatibility, and then reading the data into tables in a relational database. Workbooks have been created to familiarise a user with the CPRD Aurum tables, including how they link together and how to build a sample cohort: | ||
|
||
https://github.com/user-attachments/assets/9a636d4c-8170-4145-b6fc-60ac0f4c16d1 | ||
|
||
# π€ Contributions and Acknowledgments | ||
|
||
We acknowledge and thank these groups for making this project possible: | ||
|
||
- [National Institute for Health and Care Research (NIHR)](https://www.nihr.ac.uk/) for funding the AIM-RSF programme of work [NIHR202647] - see below. | ||
- The [AI for Multiple Long Term Conditions Research Support Facility (AIM-RSF)](https://github.com/aim-rsf) programme for facilitating the delivery of this project. | ||
- This repository was created and is maintained by the AIM-RSF, led by [Data Wranglers](https://book.the-turing-way.org/collaboration/research-infrastructure-roles/data-wrangler.html) Rachael Stickland & Mahwish Mohammad. | ||
- [Clinical Practice Research Datalink (CPRD)](CPRD) for access to synthetic versions of their datasets [synthetic data request no: SD000021]. | ||
- [The Alan Turing Institue](https://www.turing.ac.uk/). This project was supported in part through computational resources provided by The Alan Turing Institute under EPSRC grant EP/N510129/1. | ||
|
||
The views expressed within any file in this repository are those of the author(s) within the AIM-RSF programme, and not necessarily those of the: NIHR, Department of Health and Social Care, Medicines and Healthcare products Regulatory Agency (MHRA) or CPRD. | ||
|
||
## Thanks to specific contributors | ||
|
||
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification, using the [emoji key](https://allcontributors.org/docs/en/emoji-key): | ||
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> | ||
<!-- prettier-ignore-start --> | ||
<!-- markdownlint-disable --> | ||
<table> | ||
<tbody> | ||
<tr> | ||
<td align="center" valign="top" width="14.28%"><a href="http://linkedin.com/in/rstickland-phd"><img src="https://avatars.githubusercontent.com/u/50215726?v=4?s=100" width="100px;" alt="Rachael Stickland"/><br /><sub><b>Rachael Stickland</b></sub></a><br /> <a href="#projectManagement-RayStick" title="Project Management">π</a> <a href="#maintenance-RayStick" title="Maintenance">π§</a> <a href="https://github.com/aim-rsf/cprd/commits?author=RayStick" title="Code">π»</a> <a href="https://github.com/aim-rsf/cprd/commits?author=RayStick" title="Documentation">π</a> <a href="#ideas-RayStick" title="Ideas, Planning, & Feedback">π€</a></td> | ||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/Rainiefantasy"><img src="https://avatars.githubusercontent.com/u/43926907?v=4?s=100" width="100px;" alt="Mahwish Mohammad"/><br /><sub><b>Mahwish Mohammad</b></sub></a><br /><a href="#maintenance-Rainiefantasy" title="Maintenance">π§</a> <a href="https://github.com/aim-rsf/cprd/commits?author=Rainiefantasy" title="Code">π»</a> <a href="https://github.com/aim-rsf/cprd/commits?author=Rainiefantasy" title="Documentation">π</a> <a href="#ideas-Rainiefantasy" title="Ideas, Planning, & Feedback">π€</a> <a href="https://github.com/aim-rsf/cprd/pulls?q=is%3Apr+reviewed-by%3ABatoolMM" title="Reviewed Pull Requests">π</a></td> | ||
<td align="center" valign="top" width="14.28%"><a href="https://batool-almarzouq.netlify.app/"><img src="https://avatars.githubusercontent.com/u/53487593?v=4?s=100" width="100px;" alt="Batool Almarzouq"/><br /><sub><b>Batool Almarzouq</b></sub></a><br /><a href="https://github.com/aim-rsf/cprd/pulls?q=is%3Apr+reviewed-by%3ABatoolMM" title="Reviewed Pull Requests">π</a> <a href="#ideas-amallon" title="Ideas, Planning, & Feedback">π€</a></td> | ||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/amallon"><img src="https://avatars.githubusercontent.com/u/35258603?v=4?s=100" width="100px;" alt="Ann-Marie Mallon"/><br /><sub><b>Ann-Marie Mallon</b></sub></a><br /><a href="#projectManagement-amallon" title="Project Management">π</a> <a href="#ideas-amallon" title="Ideas, Planning, & Feedback">π€</a></td> | ||
<td align="center" valign="top" width="14.28%"><a href="https://github.com/amallon"><img src="https://avatars.githubusercontent.com/u/3626306?v=4" width="100px;" alt="Kirstie Whitaker"/><br /><sub><b>Kirstie Whitaker</b></sub></a><br /> <a href="#ideas-KirstieJane" title="Ideas, Planning, & Feedback">π€</a></td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
<!-- markdownlint-restore --> | ||
<!-- prettier-ignore-end --> | ||
|
||
<!-- ALL-CONTRIBUTORS-LIST:END --> | ||
|
||
**Would you like to contribute?** Please read our [contributor guide](CONTRIBUTING.md). | ||
|
||
## β»οΈ Licences | ||
|
||
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details. | ||
|
||
--- | ||
|
||
You got to the end of the README? You get our :seal: of approval! |
Oops, something went wrong.