Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
Updating and fixing each module's documentation for the help pages on the website
  • Loading branch information
nhall6 committed Oct 21, 2024
1 parent 6be5c12 commit 391933c
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 21 deletions.
37 changes: 29 additions & 8 deletions vignettes/Characterization.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Characterization"
author: "Nathan Hall"
author: "Nathan Hall and Jenna Reps"
date: '`r Sys.Date()`'
header-includes:
- \usepackage{fancyhdr}
Expand Down Expand Up @@ -35,18 +35,39 @@ knitr::opts_chunk$set(echo = TRUE)

# Introduction

Characterization, a fundamental aspect of observational health data research, serves as a cornerstone for understanding and analyzing populations based on a myriad of characteristics. The methodologies of characterization play a pivotal role in generating hypotheses about the determinants of health and disease by providing descriptive insights into population demographics, medical history, treatment patterns, and incidence rates of outcomes. There are various methods for characterization, including database-level characterization, cohort characterization, treatment pathways analysis, and incidence measurement. Each of these methods aims to describe populations relative to an event known as the index date, which anchors the analysis of baseline, pre-index, and post-index time periods. Through the lens of use-cases such as disease natural history, treatment utilization, and quality improvement, characterizing cohorts of patients empowers researchers to glean actionable insights from observational healthcare databases.
The OHDSI Characterization package lets users extract descriptive analyses from observational healthcare data sets mapped to the OMOP CDM. There are currently four different types of characterizations analyses (incidence rates, time-to-event, dechallenge-rechallenge and various aggregate covariate cohort comparisons).

The Characterization package currently lets users answer the following questions:

* **Incidence Rate**: How often does ```<add outcome>``` occur within ```<add time-at-risk>``` after first record of ```<add exposure/indication>```?
* **Time-to-event**: When does ```<add outcome>``` occur relative to the first recorded of ```<add exposure/indication>```? Is it more common before or after ```<add exposure>```?
* **Dechallenge-rechallenge**: Is there any evidence of ```<add outcome>``` causing ```<add exposure>``` to be discontinued and then ```<add outcome>``` re-occurring once ```<add exposure>``` restarts?
* **Cohort Comparison**: What is different at index between patients in ```<add exposure/outcome/indication>``` and patients in ```<add different exposure/outcome/indication>```?
* **Database Comparison**: What is different at index between patients in ```<add exposure/outcome/indication>``` across two or more OMOP CDM databases?
* **Risk factors**: What are the risk factors of ```<add outcome>``` occurring within ```<add time-at-risk>``` for those exposed to ```<add exposure>```?
* **Case-series**: What happens to cases (those exposure to ```<add exposure>``` who have ```<add outcome>``` during ```<add time-at-risk>```) before exposure, between exposure and outcome start and after outcome start? How bad are the cases prognosis?


# Features and Functionalities

The Characterization module is dedicated to investigating these factors within and between cohorts, and it contains several useful features that allow for this exploration, including:
Defining a **target cohort** as a set of patients with an exposure or interest and/or with evidence of having an indication of interest and an **outcome cohort** as a set of patients with evidence of the outcome of interest, we run the following analyses:

1. **Cohort Summary** - computes aggregate covariate summaries for cohorts (targets and/or outcomes), offering a granular view of the cohort's demographics, conditions, drug exposures, and more. This enables a deeper understanding of the cohort's characteristics at various time points at or relative to the index date.:
+ **Database Comparison** Lets you compare the same cohort across two or more databases and adds in the standardized mean different calculation when exactly two databases are selected. This is a measure of association between the feature and the cohort, therefore identifying which features differ across databases.
+ **Cohort Comparison** Lets you compare two or more cohorts across a database and adds in the standardized mean different calculation when exactly two cohort are selected. This is a measure of association between the feature and the cohort, therefore identifying which features differ across databases.

2. **Exposed Case Series** - characterizations that look at people in the target cohort who have the outcome during some specified time-at-risk:
+ **Risk Factor** Compares aggregate covariate summaries for patients in the target who have the outcome during the time-at-risk period vs patients in the target who do not have the outcome during the time-at-risk period. The standardized mean difference is added to identify covariates that differ between the cohorts.
+ **Case Series** Compares aggregate covariate summaries before target start, between target start up to outcome start and after outcome start for people in the target cohort who have the outcome during some specified time-at-risk. This lets you see covariates that are common before exposure and what happens afterwards.
+ **Time to Event**: Shows the distribution of when the outcome occurs relative to the target start. This can show you whether the outcome occurs more after or before target exposure.
+ **Dechallenge Rechallenge**: Offers the ability to compute dechallenge (withdrawal of a drug or treatment) and rechallenge (reintroduction) results. This analysis is critical for understanding the causality between exposures and outcomes, especially in pharmacovigilance studies and when adverse events following exposure to a drug may occur.

1. **Target Viewer**: Computes detailed results for target cohorts, offering a granular view of the cohort's demographics, conditions, drug exposures, and more. This enables a deeper understanding of the cohort's characteristics at various time points at or relative to the index date.
2. **Outcome Stratified**: For both target and outcome cohorts, the package calculates binary features during the designated time at risk. This analysis helps in identifying specific attributes or exposures that are present or absent in the cohort members, aiding in the differentiation/comparability between cohorts.
3. **Incidence Rate**: Utilizing the [CohortIncidence](https://github.com/OHDSI/CohortIncidence "CohortIncidence") R package, this set of analyses computes incidence rates for both target and outcome cohorts during the time at risk selected. This feature is essential for assessing the frequency of outcomes or conditions within the specified timeframe, providing a quantitative measure of risk or occurrence. Incidence measures are provided in both tabular and graphical form, and can be stratified across calendar year, age, and sex.
4. **Time to Event**: Generates plots for the number of events across different time periods (1, 30, or 365 days) for the selected target and outcome cohorts. These plots visualize the temporal distribution of events, allowing researchers to observe patterns over time and make temporal comparisons between cohorts.
5. **Dechallenge Rechallenge**: Offers the ability to compute dechallenge (withdrawal of a drug or treatment) and rechallenge (reintroduction) results. This analysis is critical for understanding the causality between exposures and outcomes, especially in pharmacovigilance studies and when adverse events following exposure to a drug may occur.

# Utility and Application

Characterization serves as a powerful tool for researchers aiming to dissect and understand the nuances of patient cohorts in observational health data. Its capabilities allow for the detailed examination of cohort attributes, the incidence of health outcomes, and the effects of treatment exposures over time. By facilitating a comprehensive analysis of target and comparator cohorts, Characterization enables researchers to draw meaningful conclusions about patient care, treatment efficacy, and health outcomes, thereby contributing to the advancement of evidence-based medicine. For more information on the Characterization R package, please see [here](https://github.com/OHDSI/Characterization "Characterization").
Characterization serves as a powerful tool for researchers aiming to dissect and understand the nuances of patient cohorts in observational health data. Its capabilities allow for the detailed examination of cohort attributes, the incidence of health outcomes, and the effects of treatment exposures over time. By facilitating a comprehensive analysis of target and comparator cohorts, Characterization enables researchers to draw meaningful conclusions about patient care, treatment efficacy, and health outcomes, thereby contributing to the advancement of evidence-based medicine.

To find out more about the analyses execution details and see examples, please see [here](https://ohdsi.github.io/OhdsiShinyModules/articles/Characterization.html).

To see the code behind the Characterization R package, please see [here](https://github.com/OHDSI/Characterization).
19 changes: 10 additions & 9 deletions vignettes/CohortDiagnostics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,27 +35,28 @@ knitr::opts_chunk$set(echo = TRUE)

# Introduction

In the realm of observational research, where data heterogeneity and complexity are common, assessing and diagnosing the characteristics of cohorts is fundamental to ensuring the reliability and credibility of research findings. This is also an essential step in phenotype development. The OHDSI community has developed an R package, CohortDiagnostics, which provides researchers with a systematic approach to examine various facets of cohorts, enabling them to identify potential biases, assess data completeness, and validate the suitability of cohorts for analysis. This tool is crucial for researchers working within the Observational Health Data Sciences and Informatics (OHDSI) ecosystem, enabling them to ensure the accuracy and reliability of cohort definitions through a detailed examination of incidence rates, cohort characteristics, and the specific codes triggering cohort inclusion criteria. CohortDiagnostics streamlines the process of cohort evaluation by:
Asessubg and diagnosing the characteristics of cohorts & phenotypes is fundamental to ensuring the reliability and credibility of observational research for OMOP CDM-compliant data. This is also an essential step in phenotype development. The OHDSI community has developed an R package, CohortDiagnostics, which provides researchers with a systematic approach to examine various facets of cohorts, enabling them to identify potential biases, assess data completeness and compare characteristics of cohorts, and validate the suitability of cohorts for analysis. This tool is crucial for researchers working within the Observational Health Data Sciences and Informatics (OHDSI) ecosystem, enabling them to ensure the accuracy and reliability of cohort definitions through a detailed examination of incidence rates, cohort characteristics, and the specific codes triggering cohort inclusion criteria. CohortDiagnostics streamlines the process of cohort evaluation by:

1. Generating a broad spectrum of diagnostics against a CDM database - see more details here: [Features and Functionalities]
2. Providing an interactive R Shiny application within the package for an intuitive exploration and visualization of these diagnostics. For more information on R Shiny, see [here](https://www.rstudio.com/products/shiny/ "R Shiny").
3. For more detailed information and documentation on CohortDiagnostics, visit the Github site for the package [here](https://ohdsi.github.io/CohortDiagnostics/index.html).

# Features and Functionalities

CohortDiagnostics offers a suite of features designed to deepen the understanding of cohort dynamics and the intricacies of cohort definitions, including:

1. **Cohort Definition**: Facilitates the examination and validation of the logic behind cohort definitions, ensuring they accurately capture the intended population.
2. **Concepts in Data Source**: Identifies the specific concepts present within the data source that are relevant to the cohort definitions, enabling a deeper understanding of data coverage and content.
1. **Cohort Definition**: Facilitates the examination and validation of the logic behind cohort definitions, ensuring they accurately capture the intended population and allows for the asessment of cohort inclusion rule logic & attrition.
2. **Concepts in Data Source**: Identifies the specific concepts present within the data source that are relevant to the cohort definitions, enabling a deeper understanding of standard and non-standard concepts present in the underlying patient population for each database.
3. **Orphan Concepts**: Highlights concepts that, despite their relevance, are not captured within a cohort's definition. This helps in refining concept sets and cohort criteria to ensure comprehensiveness and relevance.
4. **Cohort Counts**: Provides counts of individuals and records within cohorts, offering a basic measure of cohort size and scope.
5. **Incidence Rate**: Calculates the incidence rate of cohorts, stratified by various demographic and temporal factors such as age, sex, and calendar year, to assess the frequency of patients/records in the cohort and potential patterns over these strata.
6. **Time Distributions**: Examines the distribution of time-related variables within cohorts, such as observation time before and after cohort index date as well as cohort duration, offering insights into cohort dynamics over time and available observation time.
7. **Index Event Breakdown**: Breaks down the specific events that qualify individuals for cohort inclusion, providing clarity on how inclusion criteria are met.
8. **Visit Context**: Analyzes the healthcare context (e.g., inpatient, outpatient) of the index events, offering insights into where and how cohort members are identified within the healthcare system.
9. **Cohort Overlap**: Assesses the degree of overlap between cohorts, which can inform on potential biases, errors, or redundancies in cohort construction, as well as shared characteristics between cohorts of patients.
7. **Index Event Breakdown**: Summarizes the specific concepts (both standard and non-standard) that patients in the cohort are entering on across each database.
8. **Visit Context**: Analyzes the healthcare context (e.g., inpatient, outpatient, laboratory visit) of the index events, highlighting the relationship between the cohort start date and the visits recorded in each database, both before, during, and after cohort entry.
9. **Cohort Overlap**: Assesses the degree of patient overlap between cohorts, which can inform on potential biases, errors, or redundancies in cohort construction.
10. **Cohort Characterization**: Characterizes cohorts by detailing prevalent conditions, medication use, procedures, and more, to understand the clinical profile of cohort members over various time periods relative to index.
11. **Compare Cohort Characterization**: Enables the direct comparison of characteristics between cohorts, facilitating the identification of unique or shared features across different cohorts and across time points.
12. **Meta Data**: Provides meta-information about the data and analyses conducted, ensuring transparency and reproducibility of the cohort diagnostics process.
11. **Compare Cohort Characterization**: Enables the direct comparison of characteristics between cohorts, facilitating the identification of unique or shared features across different cohorts and across time points (both before and after index).
12. **Meta Data**: Provides meta-information about the data and analyses conducted.

Together, these features equip researchers with the tools necessary for a thorough examination of cohort definitions, enhancing the quality and reliability of observational health research.

Expand All @@ -64,6 +65,6 @@ Together, these features equip researchers with the tools necessary for a thorou
CohortDiagnostics significantly contributes to the field of observational health research by providing a robust framework for the evaluation and validation of cohort definitions. Its utility spans several critical areas:

1. **Enhancing Cohort Definition Confidence**: By offering detailed diagnostics, CohortDiagnostics helps researchers refine their cohort definitions, ensuring they accurately capture the intended population. This is a critical step in phenotype development, which is a cornerstone of modern observational health data research.
2. **Identifying Data Quality Issues**: Through the identification of orphan concepts and the detailed breakdown of index events, researchers can pinpoint data quality issues or gaps in cohort definitions. Iterating over multiple potential cohort definitions after analyzing these diagnostics is an encouraged and common practice.
2. **Identifying Missing Concepts & Cohort Entry Events**: Through the identification of orphan concepts and the detailed breakdown of index events, researchers can pinpoint gaps or misspecifications in cohort definitions. Iterating over multiple potential cohort definitions after analyzing these diagnostics is an encouraged and common practice.
3. **Facilitating the Ideas Behind Comparative Analyses**: The package's capabilities to characterize and compare cohorts, as well as to analyze cohort overlaps, are invaluable for researchers looking to understand the nuances and dynamics of their study populations. These diagnostics can help inform comparative studies in the future, after the cohorts and phenotypes are refined and finalized.
4. **Supporting Transparent Research**: By enabling the listing of source codes, data source information, and providing a platform for detailed diagnostics exploration, CohortDiagnostics fosters a culture of transparency and reproducibility in observational research.
8 changes: 8 additions & 0 deletions vignettes/Cohorts.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,11 @@ OHDSI offers a suite of open-source tools to support cohort identification withi
1. [ATLAS](http://atlas-demo.ohdsi.org/ "ATLAS"): A web-based tool that enables researchers to define, execute, and share cohort definitions using standardized terminologies and criteria. ATLAS streamlines the cohort creation process by providing an intuitive interface for specifying cohort criteria and visualizing cohort characteristics.
2. [SQL](https://www.geeksforgeeks.org/what-is-sql/ "What is SQL?"): Structured Query Language (SQL) provides a powerful means for defining cohorts through custom queries. OHDSI encourages the use of SQL for advanced cohort definition tasks and complex analyses.
3. [CohortGenerator](https://github.com/OHDSI/CohortGenerator?tab=readme-ov-file "Cohort Generator"): The CohortGenerator R package is a tool within the Observational Health Data Sciences and Informatics (OHDSI) ecosystem designed to facilitate the creation of cohorts from observational healthcare data stored in databases adhering to the OMOP Common Data Model (CDM). It is an R Package that streamlines the cohort creation process, allowing for efficient and reproducible cohort identification across different datasets.

# The Cohorts Module

In the Cohorts tab of the OHDSI Analysis Viewer, there are 3 main sections, each with their own tab, the user can explore:

1. **Cohort Counts**: Gives the counts of both the number of subjects and the number of records for each cohort and each database.
2. **Cohort Generation**: Gives the cohort generation information for each cohort and each database, including an indicator if the cohort was generated (or not), the generation start and end time, and the duration (in minutes) of the cohort generation.
3. **Inclusion Rules & Attrition**: Gives both a tabular and graphical representation of cohort attrition statistics for each cohort and each database. The user may select whether they want to view the results at the subject or record-level.
Loading

0 comments on commit 391933c

Please sign in to comment.