Skip to content

Commit

Permalink
Merge pull request #1 from aim-rsf/prep-share
Browse files Browse the repository at this point in the history
copy files from cprd-share repo
  • Loading branch information
RayStick authored Sep 5, 2024
2 parents 9f1649d + 186d5bb commit c02c91b
Show file tree
Hide file tree
Showing 23 changed files with 3,143 additions and 3 deletions.
62 changes: 62 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"files": [
"README.md"
],
"imageSize": 100,
"commit": false,
"commitType": "docs",
"commitConvention": "angular",
"contributors": [
{
"login": "RayStick",
"name": "Rachael Stickland",
"avatar_url": "https://avatars.githubusercontent.com/u/50215726?v=4",
"profile": "http://linkedin.com/in/rstickland-phd",
"contributions": [
"maintenance",
"projectManagement",
"code",
"doc",
"ideas"
]
},
{
"login": "Rainiefantasy",
"name": "Mahwish Mohammad",
"avatar_url": "https://avatars.githubusercontent.com/u/43926907?v=4",
"profile": "https://github.com/Rainiefantasy",
"contributions": [
"maintenance",
"projectManagement",
"code",
"doc",
"ideas"
]
},
{
"login": "BatoolMM",
"name": "Batool Almarzouq",
"avatar_url": "https://avatars.githubusercontent.com/u/53487593?v=4",
"profile": "https://batool-almarzouq.netlify.app/",
"contributions": [
"review"
]
},
{
"login": "amallon",
"name": "Ann-Marie Mallon",
"avatar_url": "https://avatars.githubusercontent.com/u/35258603?v=4",
"profile": "https://github.com/amallon",
"contributions": [
"projectManagement",
"ideas"
]
}
],
"contributorsPerLine": 7,
"skipCi": true,
"repoType": "github",
"repoHost": "https://github.com",
"projectName": "cprd-data-wrangle",
"projectOwner": "aim-rsf"
}
36 changes: 36 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: 'bug'
assignees: ''

---

**Description of the bug:**
<!--A clear description of what the bug is.-->
-
-

**Steps to reproduce the behaviour:**
<!-- include screenshots if helpful -->
<!-- 1. Go to '...' -->
<!-- 2. Click on '....' -->
<!-- 3. Scroll down to '....' -->
<!-- 4. See error -->
-
-

**Expected behaviour:**
<!-- A clear description of what you expected to happen. -->
-
-

**My software set-up:**
<!-- E.g. Windows, MacOS -->
<!-- E.g. Python version 3.9 -->
-
-

**Additional context:**
<!-- Add any other context about the problem here.-->
21 changes: 21 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
name: Feature request
about: Suggest a specific idea for this project
title: ''
labels: ''
assignees: ''

---

<!-- Please add the appropriate labels to this issue, if possible -->

**I would like to see this change implemented:**
<!-- A clear description of what change you want to see and why. -->
<!-- A clear description of any solutions or features you've considered.-->
-
-

**Additional context or further questions:**
<!-- Add any other context about the feature request here, including any questions you might have.-->
-
-
13 changes: 13 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
name: Question or Discussion
about: Not a bug report or direct feature request
title: ''
labels: 'question'
assignees: ''

---

**Question/topic for discussion:**
<!--A clear description of what you are asking and why.-->
-
-
22 changes: 22 additions & 0 deletions .github/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Configuration for welcome - https://github.com/behaviorbot/welcome

# Configuration for new-issue-welcome - https://github.com/behaviorbot/new-issue-welcome

# Comment to be posted to on first time issues
newIssueWelcomeComment: >
πŸŽ‰ Thank you for opening your first issue in this repo! Please check our [contribution guidelines](CONTRIBUTING.md) and be guided by the issue template.
# Configuration for new-pr-welcome - https://github.com/behaviorbot/new-pr-welcome

# Comment to be posted to on PRs from first time contributors in your repository
newPRWelcomeComment: >
πŸŽ‰ Thanks for opening your first pull request in this repo! Please check our [contribution guidelines](CONTRIBUTING.md) and be guided by the PR template.
# Configuration for first-pr-merge - https://github.com/behaviorbot/first-pr-merge

# Comment to be posted to on pull requests merged by a first time user
firstPRMergeComment: >
πŸŽ‰ Congrats on merging your first pull request in this repo! We appreciate your contribution!
11 changes: 11 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Documentation:
- changed-files:
- any-glob-to-any-file: ['README.md','CONTRIBUTING.md','cprd-code-browser.md']

Internal:
- changed-files:
- any-glob-to-any-file: ['.github/*','.all-contributorsrc','.gitignore']

# Testing:
# - changed-files:
# - any-glob-to-any-file: []
19 changes: 19 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<!-- Note which issues are linked to this pull request (PR) -->
<!-- If this PR is enough to close them you can write something like "Closes #10 and closes #12" -->
<!-- If you just want to reference them without closing them, you can add something like "References #10" -->
Closes #

<!-- Add a short description of the PR here - what you have changed and why -->

## Proposed Changes
<!-- List major changes here, so that the reviewers can have a bit more context -->
-
-
-

## Checklist before review:
<!-- You're invited to open a draft PR so people can see what you are working on sooner -->
- [ ] Please comment on my PR while it's a draft and give me feedback on the development!
- [ ] I added everything I wanted to add to this PR, please review!
- [ ] The title of this PR is clear and self-explanatory.
- [ ] I added any appropriate labels to this PR.
17 changes: 17 additions & 0 deletions .github/workflows/auto-author-assign.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Auto Author Assign

on:
issues:
types: [ opened, reopened ]
pull_request_target:
types: [ opened, reopened ]

permissions:
pull-requests: write
issues: write

jobs:
assign-author:
runs-on: ubuntu-latest
steps:
- uses: toshimaru/[email protected]
15 changes: 15 additions & 0 deletions .github/workflows/auto-label.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: auto-label
concurrency:
group: ${{ github.workflow }}-${{ github.event.number }}-${{ github.event.ref }}
cancel-in-progress: true
on: # yamllint disable-line rule:truthy
pull_request_target

jobs:
pr:
permissions:
contents: read
pull-requests: write
runs-on: ubuntu-latest
steps:
- uses: actions/labeler@v5
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

29 changes: 29 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Contributing to cprd-data-wrangle

We warmly welcome contributions to the cprd-data-wrangle project, however small or large.

This document provides guidelines for contributing to this repository.

## How to Contribute

### Reporting Issues

- **Bug Reports**: If you find a bug, please open an issue with a clear description of the problem and steps to reproduce it.
- **Feature Requests**: Suggestions for new features or improvements are always welcome. Please open an issue to discuss your ideas.

### Making Changes

1. **Fork the Repository**: Start by forking the repository to your GitHub account.
2. **Create a Feature Branch**: Create a new branch for your feature or fix.
3. **Make Your Changes**: Implement your changes, adhering to coding standards and best practices for Markdown and Python.
4. **Test Your Changes**: Ensure your changes do not break any existing functionality.
5. **Document Your Changes**: Include comments alongside your code, and update and relevant documentation files, as needed.
6. **Submit a Pull Request**: Open a pull request from your feature branch to the main branch of the original repository.

## Questions or Need Help?

If you have questions or need help, feel free to open an issue for discussion or reach out to the maintainers directly:

Rachael Stickland ([email protected]) and Mahwish Mohammad ([email protected]).

Thank you for contributing to cprd-data-wrangle!
87 changes: 84 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,86 @@
# cprd-data-wrangle

This repository is for anyone new to working with datasets released by the Clinical Practice Research Datalink (CPRD). Researchers tasked with understanding the database tables, then querying and filtering to create a research cohort, may find our pre-processing pipeline and interactive notebooks a helpful guide to getting started.
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
[![All Contributors](https://img.shields.io/badge/all_contributors-4-orange.svg?style=flat-square)](#contributors-)
<!-- ALL-CONTRIBUTORS-BADGE:END -->
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

πŸ“’ This repository will be populated soon! πŸ“’ In the meantime, check out this other repository for an introduction to synthetic data, in the context of health-care and biomedical research: https://github.com/aim-rsf/Synthetic-Data

# πŸ‘‹ Welcome

## πŸ‘₯ Who is this repository for?

This repository is for anyone new to working with datasets released by the [Clinical Practice Research Datalink (CPRD)](https://cprd.com). Researchers tasked with understanding the database tables, then querying and filtering to create a research cohort, may find our pre-processing pipeline and interactive notebooks a helpful guide to getting started.

**Please note:**

- **You need your own copy of CPRD's synthetic/real data to run the code. This repository does not contain any data files.**

- **CPRD are moving towards a TRE model of data access, instead of a researcher downloading data onto their own computer. Read more [here](https://www.cprd.com/cprd-safe-our-trusted-research-environment).**

- **This is a work in progress repository. If you would like to suggest or contribute a change, please read our [contributor guide](CONTRIBUTING.md).**

# πŸ₯… Project Goals

We aim to streamline the process for researchers using CPRD datasets, with the creation of clear documentation, efficient data management strategies and analytical pipelines. We will start with development of workflows utilising CPRD's medium fidelity synthetic datasets because they resemble
> "the real world CPRD data with respect to the data types, data values, data formats, data structure and table relationships" [ref](https://cprd.com/synthetic-data).
**New to Synthetic Data?** Read an introduction [here](https://github.com/aim-rsf/Synthetic-Data).

We will create and share documentation & code, in openly available languages. We will start by loading the data into a relational database and summarising some of its main features.

By working with our research collaborators, we aim to test workflows written with synthetic datasets on the real datasets to ensure transferability and utility. An anticipated mismatch will be the size of the data files and possibly the variability in file format. Please reach out to us if you want to test our code on your real CPRD data, or have any feedback on improving transferability and utility.

CPRD's most recently released data specifications can be found [here for the real datasets](https://cprd.com/primary-care-data-public-health-research) and [here for the synthetic datasets](https://cprd.com/synthetic-data).

# πŸ’» Current content

We include information on [CPRD's Code Browser tool](cprd-code-browser.md) and how to request access to it.

The [code-for-aurum](code-for-aurum) folder uses `Python` and `postgreSQL` to create a pre-processing workflow for CPRD Aurum data which includes a conversion of data file format for compatibility, and then reading the data into tables in a relational database. Workbooks have been created to familiarise a user with the CPRD Aurum tables, including how they link together and how to build a sample cohort:

https://github.com/user-attachments/assets/9a636d4c-8170-4145-b6fc-60ac0f4c16d1

# 🀝 Contributions and Acknowledgments

We acknowledge and thank these groups for making this project possible:

- [National Institute for Health and Care Research (NIHR)](https://www.nihr.ac.uk/) for funding the AIM-RSF programme of work [NIHR202647] - see below.
- The [AI for Multiple Long Term Conditions Research Support Facility (AIM-RSF)](https://github.com/aim-rsf) programme for facilitating the delivery of this project.
- This repository was created and is maintained by the AIM-RSF, led by [Data Wranglers](https://book.the-turing-way.org/collaboration/research-infrastructure-roles/data-wrangler.html) Rachael Stickland & Mahwish Mohammad.
- [Clinical Practice Research Datalink (CPRD)](CPRD) for access to synthetic versions of their datasets [synthetic data request no: SD000021].
- [The Alan Turing Institue](https://www.turing.ac.uk/). This project was supported in part through computational resources provided by The Alan Turing Institute under EPSRC grant EP/N510129/1.

The views expressed within any file in this repository are those of the author(s) within the AIM-RSF programme, and not necessarily those of the: NIHR, Department of Health and Social Care, Medicines and Healthcare products Regulatory Agency (MHRA) or CPRD.

## Thanks to specific contributors

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification, using the [emoji key](https://allcontributors.org/docs/en/emoji-key):
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- prettier-ignore-start -->
<!-- markdownlint-disable -->
<table>
<tbody>
<tr>
<td align="center" valign="top" width="14.28%"><a href="http://linkedin.com/in/rstickland-phd"><img src="https://avatars.githubusercontent.com/u/50215726?v=4?s=100" width="100px;" alt="Rachael Stickland"/><br /><sub><b>Rachael Stickland</b></sub></a><br /> <a href="#projectManagement-RayStick" title="Project Management">πŸ“†</a> <a href="#maintenance-RayStick" title="Maintenance">🚧</a> <a href="https://github.com/aim-rsf/cprd/commits?author=RayStick" title="Code">πŸ’»</a> <a href="https://github.com/aim-rsf/cprd/commits?author=RayStick" title="Documentation">πŸ“–</a> <a href="#ideas-RayStick" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/Rainiefantasy"><img src="https://avatars.githubusercontent.com/u/43926907?v=4?s=100" width="100px;" alt="Mahwish Mohammad"/><br /><sub><b>Mahwish Mohammad</b></sub></a><br /><a href="#maintenance-Rainiefantasy" title="Maintenance">🚧</a> <a href="https://github.com/aim-rsf/cprd/commits?author=Rainiefantasy" title="Code">πŸ’»</a> <a href="https://github.com/aim-rsf/cprd/commits?author=Rainiefantasy" title="Documentation">πŸ“–</a> <a href="#ideas-Rainiefantasy" title="Ideas, Planning, & Feedback">πŸ€”</a> <a href="https://github.com/aim-rsf/cprd/pulls?q=is%3Apr+reviewed-by%3ABatoolMM" title="Reviewed Pull Requests">πŸ‘€</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://batool-almarzouq.netlify.app/"><img src="https://avatars.githubusercontent.com/u/53487593?v=4?s=100" width="100px;" alt="Batool Almarzouq"/><br /><sub><b>Batool Almarzouq</b></sub></a><br /><a href="https://github.com/aim-rsf/cprd/pulls?q=is%3Apr+reviewed-by%3ABatoolMM" title="Reviewed Pull Requests">πŸ‘€</a> <a href="#ideas-amallon" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/amallon"><img src="https://avatars.githubusercontent.com/u/35258603?v=4?s=100" width="100px;" alt="Ann-Marie Mallon"/><br /><sub><b>Ann-Marie Mallon</b></sub></a><br /><a href="#projectManagement-amallon" title="Project Management">πŸ“†</a> <a href="#ideas-amallon" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/amallon"><img src="https://avatars.githubusercontent.com/u/3626306?v=4" width="100px;" alt="Kirstie Whitaker"/><br /><sub><b>Kirstie Whitaker</b></sub></a><br /> <a href="#ideas-KirstieJane" title="Ideas, Planning, & Feedback">πŸ€”</a></td>
</tr>
</tbody>
</table>

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->

<!-- ALL-CONTRIBUTORS-LIST:END -->

**Would you like to contribute?** Please read our [contributor guide](CONTRIBUTING.md).

## ♻️ Licences

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

---

You got to the end of the README? You get our :seal: of approval!
Loading

0 comments on commit c02c91b

Please sign in to comment.