diff --git a/DESCRIPTION b/DESCRIPTION index 5ef0f9e6..5887ab8d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: CohortMethod Type: Package Title: New-User Cohort Method with Large Scale Propensity and Outcome Models Version: 5.2.0 -Date: 2023-09-04 +Date: 2023-12-21 Authors@R: c( person("Martijn", "Schuemie", , "schuemie@ohdsi.org", role = c("aut", "cre")), person("Marc", "Suchard", role = c("aut")), diff --git a/_pkgdown.yml b/_pkgdown.yml index 407e84fd..13a10877 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -59,6 +59,7 @@ reference: - plotCovariateBalanceScatterPlot - plotCovariatePrevalence - plotTimeToEvent + - getGeneralizabilityTable - title: "Running multiple analyses" desc: > Functions for running multiple analyses in an efficient way. @@ -83,10 +84,14 @@ reference: - getInteractionResultsSummary - createCmDiagnosticThresholds - exportToCsv - - getResultsDataModel + - getResultsDataModelSpecifications - insertExportedResultsInSqlite - launchResultsViewerUsingSqlite + - createResultsDataModel + - migrateDataModel + - getDataMigrator - uploadExportedResults + - uploadResults - launchResultsViewer - title: "Simulation" desc: > diff --git a/docs/404.html b/docs/404.html index 82ce24d5..98f4ed77 100644 --- a/docs/404.html +++ b/docs/404.html @@ -32,7 +32,7 @@
diff --git a/docs/articles/MultipleAnalyses.html b/docs/articles/MultipleAnalyses.html index acf4fd72..42f81f0c 100644 --- a/docs/articles/MultipleAnalyses.html +++ b/docs/articles/MultipleAnalyses.html @@ -33,7 +33,7 @@ @@ -89,7 +89,7 @@vignettes/MultipleAnalyses.Rmd
MultipleAnalyses.Rmd
-connectionDetails <- createConnectionDetails(dbms = "postgresql",
- server = "localhost/ohdsi",
- user = "joe",
+connectionDetails <- createConnectionDetails(dbms = "postgresql",
+ server = "localhost/ohdsi",
+ user = "joe",
password = "supersecret")
cdmDatabaseSchema <- "my_cdm_data"
-resultsDatabaseSchema <- "my_results"
-options(sqlRenderTempEmulationSchema = NULL)
-outputFolder <- "./CohortMethodOutput"
The last three lines define the cdmDatabaseSchema
,
-resultSchema
, and outputFolder
variables.
-We’ll use these later to tell R where the data in CDM format live, where
-we want to write intermediate tables, and where the intermediate and
-output files should be stored in the local file system. Note that for
+cohortDatabaseSchema <- "my_results"
+cohortTable <- "my_cohorts"
+options(sqlRenderTempEmulationSchema = NULL)
+
The last few lines define the cdmDatabaseSchema
,
+cohortDatabaseSchema
, and cohortTable
+variables. We’ll use these later to tell R where the data in CDM format
+live, and where we want to write intermediate tables. Note that for
Microsoft SQL Server, databaseschemas need to specify both the database
and the schema, so for example
-cdmDatabaseSchema <- "my_cdm_data.dbo"
.
We also need to prepare our exposures and outcomes of interest. The -drug_era table in the OMOP Common Data Model already contains -prespecified cohorts of users at the ingredient level, so we will use -that for the exposures. For the outcomes, we want to restrict our -analysis only to those outcomes that are recorded in an inpatient -setting, so we will need to create a custom cohort table. For this -example, we want to include GI bleed (concept ID 192671) as well as a -set of 35 negative controls. Negative controls are defined as those -outcomes where there is no evidence that either the target drug -(celexocib) or comparator drug (diclofenac) causes the outcome.
-We create a text file called VignetteOutcomes.sql with the -following content:
-/***********************************
-File VignetteOutcomes.sql
-***********************************/
-DROP TABLE IF EXISTS @resultsDatabaseSchema.outcomes;
-
-SELECT ancestor_concept_id AS cohort_definition_id,
-AS cohort_start_date,
- condition_start_date AS cohort_end_date,
- condition_end_date AS subject_id
- condition_occurrence.person_id INTO @resultsDatabaseSchema.outcomes
-FROM @cdmDatabaseSchema.condition_occurrence
-INNER JOIN @cdmDatabaseSchema.visit_occurrence
-ON condition_occurrence.visit_occurrence_id = visit_occurrence.visit_occurrence_id
- INNER JOIN @cdmDatabaseSchema.concept_ancestor
-ON condition_concept_id = descendant_concept_id
- WHERE ancestor_concept_id IN (192671, 24609, 29735, 73754, 80004, 134718, 139099,
-141932, 192367, 193739, 194997, 197236, 199074, 255573, 257007, 313459, 314658,
-316084, 319843, 321596, 374366, 375292, 380094, 433753, 433811, 436665, 436676,
-436940, 437784, 438134, 440358, 440374, 443617, 443800, 4084966, 4288310)
-AND visit_occurrence.visit_concept_id IN (9201, 9203);
This is parameterized SQL which can be used by the
-SqlRender
package. We use parameterized SQL so we do not
-have to pre-specify the names of the CDM and result schemas. That way,
-if we want to run the SQL on a different schema, we only need to change
-the parameter values; we do not have to change the SQL code. By also
-making use of translation functionality in SqlRender
, we
-can make sure the SQL code can be run in many different
-environments.
-library(SqlRender)
-sql <- readSql("VignetteOutcomes.sql")
-sql <- render(sql,
- cdmDatabaseSchema = cdmDatabaseSchema,
- resultsDatabaseSchema = resultsDatabaseSchema)
-sql <- translate(sql, targetDialect = connectionDetails$dbms)
+cdmDatabaseSchema <- "my_cdm_data.dbo"
. For database
+platforms that do not support temp tables, such as Oracle, it is also
+necessary to provide a schema where the user has write access that can
+be used to emulate temp tables. PostgreSQL supports temp tables, so we
+can set options(sqlRenderTempEmulationSchema = NULL)
(or
+not set the sqlRenderTempEmulationSchema
at all.)
+We need to define the exposures and outcomes for our study. Here, we
+will define our exposures using the OHDSI Capr
package. We
+define two cohorts, one for celecoxib and one for diclofenac. For each
+cohort we require a prior diagnosis of ‘osteoarthritis of knee’, and 365
+days of continuous prior observation. we restrict to the first exposure
+per person:
+
-In this code, we first read the SQL from the file into memory. In the
-next line, we replace the two parameter names with the actual values. We
-then translate the SQL into the dialect appropriate for the DBMS we
-already specified in the connectionDetails
. Next, we
-connect to the server, and submit the rendered and translated SQL.
-
The first group of arguments define the target, comparator, and -outcome. Here we demonstrate how to create one set, and add that set to -a list:
+osteoArthritisOfKneeConceptId <- 4079750 +celecoxibConceptId <- 1118084 +diclofenacConceptId <- 1124300 +osteoArthritisOfKnee <- cs( + descendants(osteoArthritisOfKneeConceptId), + name = "Osteoarthritis of knee" +) +attrition = attrition( + "prior osteoarthritis of knee" = withAll( + atLeast(1, condition(osteoArthritisOfKnee), + duringInterval(eventStarts(-Inf, 0))) + ) +) +celecoxib <- cs( + descendants(celecoxibConceptId), + name = "Celecoxib" +) +diclofenac <- cs( + descendants(diclofenacConceptId), + name = "Diclofenac" +) +celecoxibCohort <- cohort( + entry = entry( + drug(celecoxib, firstOccurrence()), + observationWindow = continuousObservation(priorDays = 365) + ), + attrition = attrition, + exit = exit(endStrategy = drugExit(celecoxib, + persistenceWindow = 30, + surveillanceWindow = 0)) +) +diclofenacCohort <- cohort( + entry = entry( + drug(diclofenac, firstOccurrence()), + observationWindow = continuousObservation(priorDays = 365) + ), + attrition = attrition, + exit = exit(endStrategy = drugExit(diclofenac, + persistenceWindow = 30, + surveillanceWindow = 0)) +)We’ll pull the outcome definition from the OHDSI
+PhenotypeLibrary
:
+library(PhenotypeLibrary)
+outcomeCohorts <- getPlCohortDefinitionSet(77) # GI bleed
In addition to the outcome of interest, we also want to include a +large set of negative control outcomes:
-outcomeOfInterest <- createOutcome(outcomeId = 192671,
- outcomeOfInterest = TRUE)
-
-negativeControlIds <- c(192671, 29735, 140673, 197494,
+negativeControlIds <- c(29735, 140673, 197494,
198185, 198199, 200528, 257315,
314658, 317376, 321319, 380731,
432661, 432867, 433516, 433701,
@@ -280,26 +276,80 @@ Specifying hypotheses of interest 444252, 444429, 4131756, 4134120,
4134454, 4152280, 4165112, 4174262,
4182210, 4270490, 4286201, 4289933)
+negativeControlCohorts <- tibble(
+ cohortId = negativeControlIds,
+ cohortName = sprintf("Negative control %d", negativeControlIds),
+ outcomeConceptId = negativeControlIds
+)
We combine the exposure and outcome cohort definitions, and use
+CohortGenerator
to generate the cohorts:
+library(CirceR)
+# For exposures, create a cohort definition set table as required by CohortGenerator:
+exposureCohorts <- tibble(cohortId = c(1,2),
+ cohortName = c("Celecoxib", "Diclofenac"),
+ json = c(as.json(celecoxibCohort),
+ as.json(diclofenacCohort)))
+exposureCohorts$sql <- sapply(exposureCohorts$json,
+ buildCohortQuery,
+ options = createGenerateOptions())
+allCohorts <- bind_rows(outcomeCohorts,
+ exposureCohorts)
+library(CohortGenerator)
+cohortTableNames <- getCohortTableNames(cohortTable = cohortTable)
+createCohortTables(connectionDetails = connectionDetails,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTableNames = cohortTableNames)
+generateCohortSet(connectionDetails = connectionDetails,
+ cdmDatabaseSchema = cdmDatabaseSchema,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTableNames = cohortTableNames,
+ cohortDefinitionSet = allCohorts)
+generateNegativeControlOutcomeCohorts(
+ connectionDetails = connectionDetails,
+ cdmDatabaseSchema = cdmDatabaseSchema,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTable = cohortTable,
+ negativeControlOutcomeCohortSet = negativeControlCohorts
+)
If all went well, we now have a table with the cohorts of interest. +We can see how many entries per cohort:
+
+connection <- DatabaseConnector::connect(connectionDetails)
+sql <- "SELECT cohort_definition_id, COUNT(*) AS count FROM @cohortDatabaseSchema.@cohortTable GROUP BY cohort_definition_id"
+DatabaseConnector::renderTranslateQuerySql(
+ connection = connection,
+ sql = sql,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTable = cohortTable
+)
+DatabaseConnector::disconnect(connection)
The first group of arguments define the target, comparator, and +outcome. Here we demonstrate how to create one set, and add that set to +a list:
+
+outcomeOfInterest <- createOutcome(outcomeId = 77,
+ outcomeOfInterest = TRUE)
negativeControlOutcomes <- lapply(
negativeControlIds,
function(outcomeId) createOutcome(outcomeId = outcomeId,
outcomeOfInterest = FALSE,
trueEffectSize = 1)
)
-
-
-
tcos <- createTargetComparatorOutcomes(
- targetId = 1118084,
- comparatorId = 1124300,
+ targetId = 1,
+ comparatorId = 2,
outcomes = append(list(outcomeOfInterest),
negativeControlOutcomes)
)
-
targetComparatorOutcomesList <- list(tcos)
We first define the outcome of interest (GI-bleed, concept ID -192671), explicitly stating this is an outcome of interest +
We first define the outcome of interest (GI-bleed, cohort ID 77),
+explicitly stating this is an outcome of interest
(outcomeOfInterest = TRUE
), meaning we want the full set of
artifacts generated for this outcome. We then create a set of negative
control outcomes. Because we specify
@@ -308,8 +358,8 @@
A convenient way to save targetComparatorOutcomesList
to
file is by using the saveTargetComparatorOutcomesList
function, and we can load it again using the
@@ -325,11 +375,9 @@
createTrimByPsArgs()
function. These companion functions
can be used to create the arguments to be used during execution:
-
-nsaids <- 21603933
-
-covarSettings <- createDefaultCovariateSettings(
- excludedCovariateConceptIds = nsaids,
+
+covarSettings <- createDefaultCovariateSettings(
+ excludedCovariateConceptIds = c(1118084, 1124300),
addDescendantsToExclude = TRUE
)
@@ -356,7 +404,7 @@ Specifying analyses
+
cmAnalysis1 <- createCmAnalysis(
analysisId = 1,
description = "No matching, simple outcome model",
@@ -371,7 +419,7 @@ Specifying analyses
+
createPsArgs <- createCreatePsArgs() # Use default settings only
matchOnPsArgs <- createMatchOnPsArgs(maxRatio = 100)
@@ -427,7 +475,7 @@ Specifying analyses= fitOutcomeModelArgs3
)
## Note: Using propensity scores but not computing covariate balance
-
+
fitOutcomeModelArgs4 <- createFitOutcomeModelArgs(
useCovariates = TRUE,
modelType = "cox",
@@ -444,8 +492,12 @@ Specifying analyses= fitOutcomeModelArgs4
)
## Note: Using propensity scores but not computing covariate balance
-
-interactionCovariateIds <- c(8532001, 201826210, 21600960413) # Female, T2DM, concurent use of antithrombotic agents
+
+interactionCovariateIds <- c(
+ 8532001, # Female
+ 201826210, # T2DM
+ 21600960413 # concurrent use of antithrombotic agents
+)
fitOutcomeModelArgs5 <- createFitOutcomeModelArgs(
modelType = "cox",
@@ -464,7 +516,7 @@ Specifying analyses)
## Note: Using propensity scores but not computing covariate balance
These analyses can be combined in a list:
-
+
cmAnalysisList <- list(cmAnalysis1,
cmAnalysis2,
cmAnalysis3,
@@ -512,18 +564,17 @@ Executing multiple analysesrunCmAnalyses()
.
-
+
multiThreadingSettings <- createDefaultMultiThreadingSettings(parallel::detectCores())
result <- runCmAnalyses(
connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
- exposureDatabaseSchema = cdmDatabaseSchema,
- exposureTable = "drug_era",
- outcomeDatabaseSchema = resultsDatabaseSchema,
- outcomeTable = "outcomes",
+ exposureDatabaseSchema = cohortDatabaseSchema,
+ exposureTable = cohortTable,
+ outcomeDatabaseSchema = cohortDatabaseSchema,
+ outcomeTable = cohortTable,
outputFolder = folder,
- cdmVersion = cdmVersion,
cmAnalysisList = cmAnalysisList,
targetComparatorOutcomesList = targetComparatorOutcomesList,
multiThreadingSettings = multiThreadingSettings
@@ -536,7 +587,7 @@ Executing multiple analysesparallel::detectCores()
function).
We call runCmAnalyses()
, providing the arguments for
connecting to the database, which schemas and tables to use, as well as
-the analyses and hypotheses of interest. The outputFolder
+the analyses and hypotheses of interest. The folder
specifies where the outcome models and intermediate files will be
written.
@@ -557,42 +608,28 @@ Retrieving the results
-
-psFile <- result$psFile[result$targetId == 1118084 &
- result$comparatorId == 1124300 &
- result$outcomeId == 192671 &
- result$analysisId == 5]
-ps <- readRDS(file.path(outputFolder, psFile))
+
+psFile <- result %>%
+ filter(targetId == 1,
+ comparatorId == 2,
+ outcomeId == 77,
+ analysisId == 5) %>%
+ pull(psFile)
+ps <- readRDS(file.path(folder, psFile))
plotPs(ps)
-
Note that some of the file names will appear several times in the
table. For example, analysis 3 and 5 only differ in terms of the outcome
model, and will share the same propensity score and stratification
files.
We can always retrieve the file reference table again using the
getFileReference()
function:
-
+
result <- getFileReference(folder)
We can get a summary of the results using
getResultsSummary()
:
-
-resultsSum <- getResultsSummary(outputFolder)
-head(resultsSum)
-## # A tibble: 6 x 27
-## analysisId targetId comparatorId outcomeId trueEffectSize targetSubjects
-## <int> <int> <int> <int> <dbl> <int>
-## 1 1 1118084 1124300 29735 1 86294
-## 2 1 1118084 1124300 140673 1 86718
-## 3 1 1118084 1124300 192671 NA 84447
-## 4 1 1118084 1124300 197494 1 86718
-## 5 1 1118084 1124300 198185 1 86718
-## 6 1 1118084 1124300 198199 1 86718
-## # i 21 more variables: comparatorSubjects <int>, targetDays <dbl>,
-## # comparatorDays <dbl>, targetOutcomes <dbl>, comparatorOutcomes <dbl>,
-## # rr <dbl>, ci95Lb <dbl>, ci95Ub <dbl>, p <dbl>, logRr <dbl>, seLogRr <dbl>,
-## # llr <dbl>, mdrr <dbl>, attritionFraction <dbl>, calibratedRr <dbl>,
-## # calibratedCi95Lb <dbl>, calibratedCi95Ub <dbl>, calibratedP <dbl>,
-## # calibratedLogRr <dbl>, calibratedSeLogRr <dbl>, ease <dbl>
+
+resultsSum <- getResultsSummary(folder)
+resultsSum
This tells us, per target-comparator-outcome-analysis combination,
the estimated relative risk and 95% confidence interval, as well as the
number of people in the treated and comparator group (after trimming and
@@ -608,63 +645,96 @@
Empirical calib
the yellow diamond represents our health outcome of interest: GI bleed.
An unbiased, well-calibrated analysis should have 95% of the negative
controls between the dashed lines (ie. 95% should have p > .05).
-
+
install.packages("EmpiricalCalibration")
library(EmpiricalCalibration)
# Analysis 1: No matching, simple outcome model
-negCons <- resultsSum[resultsSum$analysisId == 1 & resultsSum$outcomeId != 192671, ]
-hoi <- resultsSum[resultsSum$analysisId == 1 & resultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(negCons$logRr, negCons$seLogRr, hoi$logRr, hoi$seLogRr, null)
-
-
+ncs <- resultsSum %>%
+ filter(analysisId == 1,
+ outcomeId != 77)
+hoi <- resultsSum %>%
+ filter(analysisId == 1,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
+ logRrPositives = hoi$logRr,
+ seLogRrPositives = hoi$seLogRr, null)
+
# Analysis 2: Matching
-negCons <- resultsSum[resultsSum$analysisId == 2 & resultsSum$outcomeId != 192671, ]
-hoi <- resultsSum[resultsSum$analysisId == 2 & resultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(negCons$logRr, negCons$seLogRr, hoi$logRr, hoi$seLogRr, null)
-
-
+ncs <- resultsSum %>%
+ filter(analysisId == 2,
+ outcomeId != 77)
+hoi <- resultsSum %>%
+ filter(analysisId == 2,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
+ logRrPositives = hoi$logRr,
+ seLogRrPositives = hoi$seLogRr, null)
+
# Analysis 3: Stratification
-negCons <- resultsSum[resultsSum$analysisId == 3 & resultsSum$outcomeId != 192671, ]
-hoi <- resultsSum[resultsSum$analysisId == 3 & resultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(negCons$logRr, negCons$seLogRr, hoi$logRr, hoi$seLogRr, null)
-
-
+ncs <- resultsSum %>%
+ filter(analysisId == 3,
+ outcomeId != 77)
+hoi <- resultsSum %>%
+ filter(analysisId == 3,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
+ logRrPositives = hoi$logRr,
+ seLogRrPositives = hoi$seLogRr, null)
+
# Analysis 4: Inverse probability of treatment weighting
-negCons <- resultsSum[resultsSum$analysisId == 4 & resultsSum$outcomeId != 192671, ]
-hoi <- resultsSum[resultsSum$analysisId == 4 & resultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(negCons$logRr, negCons$seLogRr, hoi$logRr, hoi$seLogRr, null)
-
-
+ncs <- resultsSum %>%
+ filter(analysisId == 4,
+ outcomeId != 77)
+hoi <- resultsSum %>%
+ filter(analysisId == 4,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
+ logRrPositives = hoi$logRr,
+ seLogRrPositives = hoi$seLogRr, null)
+
# Analysis 5: Stratification plus full outcome model
-negCons <- resultsSum[resultsSum$analysisId == 5 & resultsSum$outcomeId != 192671, ]
-hoi <- resultsSum[resultsSum$analysisId == 5 & resultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(negCons$logRr, negCons$seLogRr, hoi$logRr, hoi$seLogRr, null)
-
+ncs <- resultsSum %>%
+ filter(analysisId == 5,
+ outcomeId != 77)
+hoi <- resultsSum %>%
+ filter(analysisId == 5,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
+ logRrPositives = hoi$logRr,
+ seLogRrPositives = hoi$seLogRr, null)
Analysis 6 explored interactions with certain variables. The
estimates for these interaction terms are stored in a separate results
summary. We can examine whether these estimates are also consistent with
-the null. In this example we consider the interaction with ‘gender =
-female’ (covariate ID 8532001):
-
-interactionResultsSum <- getInteractionResultsSummary(outputFolder)
-
+the null. In this example we consider the interaction with ‘concurrent
+use of antithrombotic agents’ (covariate ID 21600960413):
+
+interactionResultsSum <- getInteractionResultsSummary(folder)
# Analysis 6: Stratification plus interaction terms
-negCons <- interactionResultsSum[interactionResultsSum$analysisId == 6 & interactionResultsSum$outcomeId != 192671, ]
-hoi <- interactionResultsSum[interactionResultsSum$analysisId == 6 & interactionResultsSum$outcomeId == 192671, ]
-null <- fitNull(negCons$logRr, negCons$seLogRr)
-plotCalibrationEffect(logRrNegatives = negCons$logRr,
- seLogRrNegatives = negCons$seLogRr,
+ncs <- interactionResultsSum %>%
+ filter(analysisId == 6,
+ interactionCovariateId == 21600960413,
+ outcomeId != 77)
+hoi <- interactionResultsSum %>%
+ filter(analysisId == 6,
+ interactionCovariateId == 21600960413,
+ outcomeId == 77)
+null <- fitNull(ncs$logRr, ncs$seLogRr)
+plotCalibrationEffect(logRrNegatives = ncs$logRr,
+ seLogRrNegatives = ncs$seLogRr,
logRrPositives = hoi$logRr,
seLogRrPositives = hoi$seLogRr, null)
-## Warning: Removed 1 rows containing missing values (`geom_vline()`).
-
@@ -679,9 +749,9 @@ Exporting to CSV
-
+
exportToCsv(
- outputFolder,
+ folder,
exportFolder = file.path(folder, "export"),
databaseId = "My CDM",
minCellCount = 5,
@@ -694,24 +764,24 @@ Exporting to CSVminCellCount = 5
, the count will be reported to be -5,
which in the Shiny app will be displayed as ‘<5’.
Information on the data model used to generate the CSV files can be
-retrieved using getResultsDataModel()
:
-
-## # A tibble: 171 x 7
-## table_name column_name data_type is_required primary_key min_cell_count
-## <chr> <chr> <chr> <chr> <chr> <chr>
-## 1 cm_attrition sequence_n~ int Yes Yes No
-## 2 cm_attrition description varchar Yes No No
-## 3 cm_attrition subjects int Yes No Yes
-## 4 cm_attrition exposure_id int Yes Yes No
-## 5 cm_attrition target_id int Yes Yes No
-## 6 cm_attrition comparator~ int Yes Yes No
-## 7 cm_attrition analysis_id int Yes Yes No
-## 8 cm_attrition outcome_id int Yes Yes No
-## 9 cm_attrition database_id varchar Yes Yes No
-## 10 cm_follow_up_di~ target_id int Yes Yes No
-## # i 161 more rows
-## # i 1 more variable: description <chr>
+retrieved using getResultsDataModelSpecifications()
:
+
+## # A tibble: 188 × 8
+## tableName columnName dataType isRequired primaryKey minCellCount deprecated
+## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
+## 1 cm_attriti… sequence_… int Yes Yes No No
+## 2 cm_attriti… descripti… varchar Yes No No No
+## 3 cm_attriti… subjects int Yes No Yes No
+## 4 cm_attriti… exposure_… int Yes Yes No No
+## 5 cm_attriti… target_id int Yes Yes No No
+## 6 cm_attriti… comparato… int Yes Yes No No
+## 7 cm_attriti… analysis_… int Yes Yes No No
+## 8 cm_attriti… outcome_id int Yes Yes No No
+## 9 cm_attriti… database_… varchar Yes Yes No No
+## 10 cm_follow_… target_id int Yes Yes No No
+## # ℹ 178 more rows
+## # ℹ 1 more variable: description <chr>
View results in a Shiny app
@@ -724,12 +794,12 @@ View results in a Shiny app
+
cohorts <- data.frame(
cohortId = c(
- 1118084,
- 1124300,
- 192671),
+ 1,
+ 2,
+ 77),
cohortName = c(
"Celecoxib",
"Diclofenac",
@@ -743,7 +813,7 @@ View results in a Shiny app= cohorts
)
Next we launch the Shiny app using:
-
+
launchResultsViewerUsingSqlite(
sqliteFileName = file.path(folder, "myResults.sqlite")
)
@@ -758,7 +828,7 @@ Acknowledgments
Considerable work has been dedicated to provide the
CohortMethod
package.
-
+
citation("CohortMethod")
##
## To cite package 'CohortMethod' in publications use:
@@ -771,16 +841,14 @@ Acknowledgments## A BibTeX entry for LaTeX users is
##
## @Manual{,
-## title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome
-## Models},
+## title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome Models},
## author = {Martijn Schuemie and Marc Suchard and Patrick Ryan},
## year = {2023},
-## note = {https://ohdsi.github.io/CohortMethod,
-## https://github.com/OHDSI/CohortMethod},
+## note = {https://ohdsi.github.io/CohortMethod, https://github.com/OHDSI/CohortMethod},
## }
Further, CohortMethod
makes extensive use of the
Cyclops
package.
-
+
citation("Cyclops")
##
## To cite Cyclops in publications use:
diff --git a/docs/articles/SingleStudies.html b/docs/articles/SingleStudies.html
index 7c2e6d54..c8a5061a 100644
--- a/docs/articles/SingleStudies.html
+++ b/docs/articles/SingleStudies.html
@@ -33,7 +33,7 @@
@@ -89,7 +89,7 @@ Single studies using the CohortMethod
Martijn J.
Schuemie, Marc A. Suchard and Patrick Ryan
- 2023-04-17
+ 2023-12-21
Source: vignettes/SingleStudies.Rmd
SingleStudies.Rmd
@@ -113,8 +113,8 @@ IntroductionInstallation instructions
Before installing the CohortMethod
package make sure you
-have Java available. Java can be downloaded from www.java.com. For Windows users, RTools
-is also necessary. RTools can be downloaded from CRAN.
+have Java available. For Windows users, RTools is also necessary. See these instructions
+for properly configuring your R environment.
The CohortMethod
package is currently maintained in a Github repository, and
has dependencies on other packages in Github. All of these packages can
be downloaded and installed from within R using the drat
@@ -138,154 +138,145 @@
Configuring the connection to
We need to tell R how to connect to the server where the data are.
CohortMethod
uses the DatabaseConnector
package, which provides the createConnectionDetails
-function. Type ?createConnectionDetails
for the specific
+function. Type ?createConnectionDetails
for the specific
settings required for the various database management systems (DBMS).
For example, one might connect to a PostgreSQL database using this
code:
-connectionDetails <- createConnectionDetails(dbms = "postgresql",
+connectionDetails <- createConnectionDetails(dbms = "postgresql",
server = "localhost/ohdsi",
user = "joe",
password = "supersecret")
cdmDatabaseSchema <- "my_cdm_data"
-resultsDatabaseSchema <- "my_results"
+cohortDatabaseSchema <- "my_results"
+cohortTable <- "my_cohorts"
options(sqlRenderTempEmulationSchema = NULL)
-The last two lines define the cdmDatabaseSchema
and
-resultSchema
variables. We’ll use these later to tell R
-where the data in CDM format live, and where we want to write
-intermediate tables. Note that for Microsoft SQL Server, databaseschemas
-need to specify both the database and the schema, so for example
-cdmDatabaseSchema <- "my_cdm_data.dbo"
.
+The last few lines define the cdmDatabaseSchema
,
+cohortDatabaseSchema
, and cohortTable
+variables. We’ll use these later to tell R where the data in CDM format
+live, and where we want to write intermediate tables. Note that for
+Microsoft SQL Server, databaseschemas need to specify both the database
+and the schema, so for example
+cdmDatabaseSchema <- "my_cdm_data.dbo"
. For database
+platforms that do not support temp tables, such as Oracle, it is also
+necessary to provide a schema where the user has write access that can
+be used to emulate temp tables. PostgreSQL supports temp tables, so we
+can set options(sqlRenderTempEmulationSchema = NULL)
(or
+not set the sqlRenderTempEmulationSchema
at all.)
Preparing the exposures and outcome(s)
-We need to define the exposures and outcomes for our study. One could
-use an external cohort definition tools, but in this example we do this
-by writing SQL statements against the OMOP CDM that populate a table of
-events in which we are interested. The resulting table should have the
-same structure as the cohort
table in the CDM. This means
-it should have the fields cohort_definition_id
,
-cohort_start_date
, cohort_end_date
,and
-subject_id
.
-For our example study, we have created a file called
-coxibVsNonselVsGiBleed.sql with the following contents:
-/***********************************
-File coxibVsNonselVsGiBleed.sql
-***********************************/
-
-DROP TABLE IF EXISTS @resultsDatabaseSchema.coxibVsNonselVsGiBleed;
-
-CREATE TABLE @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
-INT,
- cohort_definition_id DATE,
- cohort_start_date DATE,
- cohort_end_date
- subject_id BIGINT
- );
-INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
-
- cohort_definition_id,
- cohort_start_date,
- cohort_end_date,
- subject_id
- )SELECT 1, -- Exposure
-
- drug_era_start_date,
- drug_era_end_date,
- person_idFROM @cdmDatabaseSchema.drug_era
-WHERE drug_concept_id = 1118084;-- celecoxib
-
-INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
-
- cohort_definition_id,
- cohort_start_date,
- cohort_end_date,
- subject_id
- )SELECT 2, -- Comparator
-
- drug_era_start_date,
- drug_era_end_date,
- person_idFROM @cdmDatabaseSchema.drug_era
-WHERE drug_concept_id = 1124300; --diclofenac
-
-INSERT INTO @resultsDatabaseSchema.coxibVsNonselVsGiBleed (
-
- cohort_definition_id,
- cohort_start_date,
- cohort_end_date,
- subject_id
- )SELECT 3, -- Outcome
-
- condition_start_date,
- condition_end_date,
- condition_occurrence.person_idFROM @cdmDatabaseSchema.condition_occurrence
-INNER JOIN @cdmDatabaseSchema.visit_occurrence
-ON condition_occurrence.visit_occurrence_id = visit_occurrence.visit_occurrence_id
- WHERE condition_concept_id IN (
-SELECT descendant_concept_id
- FROM @cdmDatabaseSchema.concept_ancestor
- WHERE ancestor_concept_id = 192671 -- GI - Gastrointestinal haemorrhage
-
- )AND visit_occurrence.visit_concept_id IN (9201, 9203);
-This is parameterized SQL which can be used by the
-SqlRender
package. We use parameterized SQL so we do not
-have to pre-specify the names of the CDM and result schemas. That way,
-if we want to run the SQL on a different schema, we only need to change
-the parameter values; we do not have to change the SQL code. By also
-making use of translation functionality in SqlRender
, we
-can make sure the SQL code can be run in many different
-environments.
-
-library(SqlRender)
-sql <- readSql("coxibVsNonselVsGiBleed.sql")
-sql <- render(sql,
- cdmDatabaseSchema = cdmDatabaseSchema,
- resultsDatabaseSchema = resultsDatabaseSchema)
-sql <- translate(sql, targetDialect = connectionDetails$dbms)
+We need to define the exposures and outcomes for our study. Here, we
+will define our exposures using the OHDSI Capr
package. We
+define two cohorts, one for celecoxib and one for diclofenac. For each
+cohort we require a prior diagnosis of ‘osteoarthritis of knee’, and 365
+days of continuous prior observation. we restrict to the first exposure
+per person:
+
-In this code, we first read the SQL from the file into memory. In the
-next line, we replace the two parameter names with the actual values. We
-then translate the SQL into the dialect appropriate for the DBMS we
-already specified in the connectionDetails
. Next, we
-connect to the server, and submit the rendered and translated SQL.
-If all went well, we now have a table with the events of interest. We
-can see how many events per type:
+osteoArthritisOfKneeConceptId <- 4079750
+celecoxibConceptId <- 1118084
+diclofenacConceptId <- 1124300
+osteoArthritisOfKnee <- cs(
+ descendants(osteoArthritisOfKneeConceptId),
+ name = "Osteoarthritis of knee"
+)
+attrition = attrition(
+ "prior osteoarthritis of knee" = withAll(
+ atLeast(1, condition(osteoArthritisOfKnee), duringInterval(eventStarts(-Inf, 0)))
+ )
+)
+celecoxib <- cs(
+ descendants(celecoxibConceptId),
+ name = "Celecoxib"
+)
+diclofenac <- cs(
+ descendants(diclofenacConceptId),
+ name = "Diclofenac"
+)
+celecoxibCohort <- cohort(
+ entry = entry(
+ drug(celecoxib, firstOccurrence()),
+ observationWindow = continuousObservation(priorDays = 365)
+ ),
+ attrition = attrition,
+ exit = exit(endStrategy = drugExit(celecoxib,
+ persistenceWindow = 30,
+ surveillanceWindow = 0))
+)
+diclofenacCohort <- cohort(
+ entry = entry(
+ drug(diclofenac, firstOccurrence()),
+ observationWindow = continuousObservation(priorDays = 365)
+ ),
+ attrition = attrition,
+ exit = exit(endStrategy = drugExit(diclofenac,
+ persistenceWindow = 30,
+ surveillanceWindow = 0))
+)
+We’ll pull the outcome definition from the OHDSI
+PhenotypeLibrary
:
+
+library(PhenotypeLibrary)
+outcomeCohorts <- getPlCohortDefinitionSet(77) # GI bleed
+We combine the exposure and outcome cohort definitions, and use
+CohortGenerator
to generate the cohorts:
-sql <- paste("SELECT cohort_definition_id, COUNT(*) AS count",
- "FROM @resultsDatabaseSchema.coxibVsNonselVsGiBleed",
- "GROUP BY cohort_definition_id")
-sql <- render(sql, resultsDatabaseSchema = resultsDatabaseSchema)
-sql <- translate(sql, targetDialect = connectionDetails$dbms)
+library(CirceR)
+# For exposures, create a cohort definition set table as required by CohortGenerator:
+exposureCohorts <- tibble(cohortId = c(1,2),
+ cohortName = c("Celecoxib", "Diclofenac"),
+ json = c(as.json(celecoxibCohort),
+ as.json(diclofenacCohort)))
+exposureCohorts$sql <- sapply(exposureCohorts$json,
+ buildCohortQuery,
+ options = createGenerateOptions())
+allCohorts <- bind_rows(outcomeCohorts,
+ exposureCohorts)
-querySql(connection, sql)
-## cohort_concept_id count
-## 1 1 50000
-## 2 2 50000
-## 3 3 15000
+library(CohortGenerator)
+cohortTableNames <- getCohortTableNames(cohortTable = cohortTable)
+createCohortTables(connectionDetails = connectionDetails,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTableNames = cohortTableNames)
+generateCohortSet(connectionDetails = connectionDetails,
+ cdmDatabaseSchema = cdmDatabaseSchema,
+ cohortDatabaseSchema = cohortDatabaseSchema,
+ cohortTableNames = cohortTableNames,
+ cohortDefinitionSet = allCohorts)
+If all went well, we now have a table with the cohorts of interest.
+We can see how many entries per cohort:
+
+connection <- DatabaseConnector::connect(connectionDetails)
+sql <- "SELECT cohort_definition_id, COUNT(*) AS count FROM @cohortDatabaseSchema.@cohortTable GROUP BY cohort_definition_id"
+DatabaseConnector::renderTranslateQuerySql(connection, sql, cohortDatabaseSchema = cohortDatabaseSchema, cohortTable = cohortTable)
+DatabaseConnector::disconnect(connection)
+## cohort_concept_id count
+## 1 1 109307
+## 2 2 176675
+## 3 77 733601
Extracting the data from the server
-Now we can tell CohortMethod
to define the cohorts based
-on our events, construct covariates, and extract all necessary data for
-our analysis.
+Now we can tell CohortMethod
to extract the cohorts,
+construct covariates, and extract all necessary data for our
+analysis.
Important: The target and comparator drug must not
be included in the covariates, including any descendant concepts. You
will need to manually add the drugs and descendants to the
excludedCovariateConceptIds
of the covariate settings. In
-this example code we exclude all NSAIDs from the covariates by pointing
-to the concept ID of the NSAID class and specifying
-addDescendantsToExclude = TRUE
.
-
-nsaids <- 21603933
-
-# Define which types of covariates must be constructed:
-covSettings <- createDefaultCovariateSettings(excludedCovariateConceptIds = nsaids,
- addDescendantsToExclude = TRUE)
+this example code we exclude the concepts for celecoxib and diclofenac
+and specify addDescendantsToExclude = TRUE
:
+
+# Define which types of covariates must be constructed:
+covSettings <- createDefaultCovariateSettings(
+ excludedCovariateConceptIds = c(diclofenacConceptId, celecoxibConceptId),
+ addDescendantsToExclude = TRUE
+)
#Load data:
cohortMethodData <- getDbCohortMethodData(
@@ -293,42 +284,14 @@ Extracting the data from the server
cdmDatabaseSchema = cdmDatabaseSchema,
targetId = 1,
comparatorId = 2,
- outcomeIds = 3,
- studyStartDate = "",
- studyEndDate = "",
- exposureDatabaseSchema = resultsDatabaseSchema,
- exposureTable = "coxibVsNonselVsGiBleed",
- outcomeDatabaseSchema = resultsDatabaseSchema,
- outcomeTable = "coxibVsNonselVsGiBleed",
- cdmVersion = cdmVersion,
- firstExposureOnly = TRUE,
- removeDuplicateSubjects = "remove all",
- restrictToCommonPeriod = FALSE,
- washoutPeriod = 180,
+ outcomeIds = 77,
+ exposureDatabaseSchema = cohortDatabaseSchema,
+ exposureTable = cohortTable,
+ outcomeDatabaseSchema = cohortDatabaseSchema,
+ outcomeTable = cohortTable,
covariateSettings = covSettings
)
cohortMethodData
-## # CohortMethodData object
-##
-## Target cohort ID: 1
-## Comparator cohort ID: 2
-## Outcome cohort ID(s): 3
-##
-## Inherits from CovariateData:
-## # CovariateData object
-##
-## All cohorts
-##
-## Inherits from Andromeda:
-## # Andromeda object
-## # Physical location: C:\Users\admin_mschuemi\AppData\Local\Temp\2\RtmpQZKV1W\file312c37fb2b73.sqlite
-##
-## Tables:
-## $analysisRef (analysisId, analysisName, domainId, startDay, endDay, isBinary, missingMeansZero)
-## $cohorts (rowId, personSeqId, personId, treatment, cohortStartDate, daysFromObsStart, daysToCohortEnd, daysToObsEnd)
-## $covariateRef (covariateId, covariateName, analysisId, conceptId)
-## $covariates (rowId, covariateId, covariateValue)
-## $outcomes (rowId, outcomeId, daysToEvent)
There are many parameters, but they are all documented in the
CohortMethod
manual. The
createDefaultCovariateSettings
function is described in the
@@ -349,22 +312,6 @@
Extracting the data from the server
view some more information of the data we extracted:
summary(cohortMethodData)
-## CohortMethodData object summary
-##
-## Target cohort ID: 1
-## Comparator cohort ID: 2
-## Outcome cohort ID(s): 3
-##
-## Target persons: 50000
-## Comparator persons: 50000
-##
-## Outcome counts:
-## Event count Person count
-## 3 36380 7447
-##
-## Covariates:
-## Number of covariates: 62004
-## Number of non-zero covariate values: 41097434
Saving the data to file
@@ -374,7 +321,7 @@ Saving the data to fileAndromeda, we cannot use R’s regular save function.
Instead, we’ll have to use the saveCohortMethodData()
function:
-
+
saveCohortMethodData(cohortMethodData, "coxibVsNonselVsGiBleed.zip")
We can use the loadCohortMethodData()
function to load
the data in a future session.
@@ -388,8 +335,8 @@ Defining new users
-When creating the cohorts in the database, for example when using a
-cohort definition tool.
+When creating the cohorts in the database, for example using
+Capr
.
When loading the cohorts using the
getDbCohortMethodData
function, you can use the
firstExposureOnly
, removeDuplicateSubjects
,
@@ -423,7 +370,7 @@ Defining the study population
+
studyPop <- createStudyPopulation(
cohortMethodData = cohortMethodData,
outcomeId = 3,
@@ -456,16 +403,8 @@ Defining the study population
+
getAttritionTable(studyPop)
-## # A tibble: 5 x 5
-## description targetPersons comparatorPersons targetExposures comparatorExposures
-## <chr> <dbl> <dbl> <dbl> <dbl>
-## 1 Original cohorts 856973 915830 1946114 1786318
-## 2 First exp. only & removed s ... 373874 541386 373874 541386
-## 3 Random sample 50000 50000 50000 50000
-## 4 No prior outcome 48700 48715 48700 48715
-## 5 Have at least 1 days at ris ... 48667 48688 48667 48688
One additional filtering step that is often used is matching or
trimming on propensity scores, as will be discussed next.
@@ -483,7 +422,7 @@ Fitting a propensity model
We can fit a propensity model using the covariates constructed by the
getDbcohortMethodData()
function:
-
+
ps <- createPs(cohortMethodData = cohortMethodData, population = studyPop)
The createPs()
function uses the Cyclops
package to fit a large-scale regularized logistic regression.
@@ -501,36 +440,31 @@ Propensity score diagnostics
+
computePsAuc(ps)
-## [1] 0.81
We can also plot the propensity score distribution, although we
prefer the preference score distribution:
-
+
plotPs(ps,
scale = "preference",
showCountsLabel = TRUE,
showAucLabel = TRUE,
showEquiposeLabel = TRUE)
-
It is also possible to inspect the propensity model itself by showing
the covariates that have non-zero coefficients:
-
+
getPsModel(ps, cohortMethodData)
-## # A tibble: 6 x 3
-## coefficient covariateId covariateName
-## <dbl> <dbl> <chr>
-## 1 -3.47 1150871413 ...gh 0 days relative to index: misoprostol
-## 2 -2.45 2016006 index year: 2016
-## 3 -2.43 2017006 index year: 2017
-## 4 -2.39 2018006 index year: 2018
-## 5 -2.37 2015006 index year: 2015
-## 6 -2.30 2014006 index year: 2014
One advantage of using the regularization when fitting the propensity
model is that most coefficients will shrink to zero and fall out of the
model. It is a good idea to inspect the remaining variables for anything
that should not be there, for example variations of the drugs of
interest that we forgot to exclude.
+Finally, we can inspect the percent of the population in equipoise,
+meaning they have a prefence score between 0.3 and 0.7:
+
+CohortMethod::computeEquipoise(ps)
+A low equipoise indicates there is little overlap between the target
+and comparator populations.
Using the propensity score
@@ -538,43 +472,30 @@ Using the propensity scoreWe can use the propensity scores to trim, stratify, match, or weigh
our population. For example, one could trim to equipoise, meaning only
subjects with a preference score between 0.25 and 0.75 are kept:
-
+
trimmedPop <- trimByPsToEquipoise(ps)
plotPs(trimmedPop, ps, scale = "preference")
-
Instead (or additionally), we could stratify the population based on
the propensity score:
-
+
stratifiedPop <- stratifyByPs(ps, numberOfStrata = 5)
plotPs(stratifiedPop, ps, scale = "preference")
-
We can also match subjects based on propensity scores. In this
example, we’re using one-to-one matching:
-
+
matchedPop <- matchOnPs(ps, caliper = 0.2, caliperScale = "standardized logit", maxRatio = 1)
plotPs(matchedPop, ps)
-
Note that for both stratification and matching it is possible to
specify additional matching criteria such as age and sex using the
stratifyByPsAndCovariates()
and
matchOnPsAndCovariates()
functions, respectively.
We can see the effect of trimming and/or matching on the population
using the getAttritionTable
function:
-
+
getAttritionTable(matchedPop)
-## # A tibble: 6 x 5
-## description targetPersons comparatorPersons targetExposures comparatorExposures
-## <chr> <dbl> <dbl> <dbl> <dbl>
-## 1 Original cohorts 856973 915830 1946114 1786318
-## 2 First exp. only & removed s ... 373874 541386 373874 541386
-## 3 Random sample 50000 50000 50000 50000
-## 4 No prior outcome 48700 48715 48700 48715
-## 5 Have at least 1 days at ris ... 48667 48688 48667 48688
-## 6 Matched on propensity score 22339 22339 22339 22339
Or, if we like, we can plot an attrition diagram:
-
+
drawAttritionDiagram(matchedPop)
-
Evaluating covariate balance
@@ -582,15 +503,12 @@ Evaluating covariate balance
+
balance <- computeCovariateBalance(matchedPop, cohortMethodData)
-
+
plotCovariateBalanceScatterPlot(balance, showCovariateCountLabel = TRUE, showMaxLabel = TRUE)
-## Warning: Removed 23590 rows containing missing values (`geom_point()`).
-
-
+
plotCovariateBalanceOfTopVariables(balance)
-
The ‘before matching’ population is the population as extracted by
the getDbCohortMethodData
function, so before any further
filtering steps.
@@ -603,113 +521,34 @@ Inspecting select populati
matching/stratification/trimming. This is usually the first table, and
so will be referred to as ‘table 1’. To generate this table, you can use
the createCmTable1
function:
-
+
createCmTable1(balance)
- Before matching After matching
- Target Comparator Target Comparator
- Characteristic % % Std. diff % % Std. diff
- Age group
- 25 - 29 0.0 0.0
- 30 - 34 0.0 0.0
- 40 - 44 0.0 0.0 0.00 0.0 0.0 -0.01
- 45 - 49 0.1 0.1 0.00 0.0 0.1 -0.01
- 50 - 54 0.2 0.2 0.00 0.2 0.2 0.00
- 55 - 59 0.5 0.5 -0.01 0.5 0.6 -0.01
- 60 - 64 0.8 1.2 -0.04 1.1 1.1 -0.01
- 65 - 69 25.8 29.0 -0.07 29.6 29.2 0.01
- 70 - 74 25.0 25.0 0.00 25.1 24.9 0.01
- 75 - 79 20.6 18.9 0.04 18.8 19.0 -0.01
- 80 - 84 15.2 13.2 0.06 13.4 13.7 -0.01
- 85 - 89 8.3 8.0 0.01 7.8 7.8 0.00
- 90 - 94 3.0 3.2 -0.01 2.9 2.8 0.01
- 95 - 99 0.6 0.7 -0.02 0.6 0.6 0.00
- 100 - 104 0.1 0.1 0.00 0.1 0.0 0.02
- Gender: female 59.5 61.8 -0.05 60.0 60.3 0.00
- Medical history: General
- Acute respiratory disease 18.4 22.2 -0.10 20.2 20.4 0.00
- Attention deficit hyperactivity disorder 0.1 0.2 -0.03 0.2 0.2 0.00
- Chronic liver disease 0.8 1.4 -0.05 1.0 1.1 -0.01
- Chronic obstructive lung disease 10.2 10.4 -0.01 9.4 9.6 -0.01
- Crohn's disease 0.3 0.4 -0.02 0.3 0.3 0.00
- Dementia 2.9 3.3 -0.03 2.8 2.9 -0.01
- Depressive disorder 6.2 9.3 -0.12 8.0 7.8 0.01
- Diabetes mellitus 18.6 25.3 -0.16 21.7 21.5 0.00
- Gastroesophageal reflux disease 10.0 15.1 -0.16 13.3 13.0 0.01
- Gastrointestinal hemorrhage 3.6 3.3 0.02 2.3 2.2 0.01
- Human immunodeficiency virus infection 0.0 0.1 -0.02 0.0 0.1 -0.01
- Hyperlipidemia 31.5 49.0 -0.36 42.7 41.8 0.02
- Hypertensive disorder 50.2 61.3 -0.22 56.8 56.5 0.01
- Lesion of liver 0.6 0.8 -0.02 0.6 0.6 0.00
- Obesity 3.8 7.3 -0.15 5.6 5.3 0.01
- Osteoarthritis 47.0 49.3 -0.05 50.6 50.0 0.01
- Pneumonia 4.7 4.7 0.00 4.0 4.2 -0.01
- Psoriasis 1.0 1.4 -0.03 1.3 1.2 0.01
- Renal impairment 4.7 10.5 -0.22 6.3 6.1 0.01
- Rheumatoid arthritis 2.4 3.0 -0.04 2.9 2.9 0.00
- Schizophrenia 0.1 0.1 -0.01 0.1 0.1 -0.01
- Ulcerative colitis 0.4 0.5 -0.02 0.4 0.4 0.00
- Urinary tract infectious disease 8.9 10.8 -0.06 9.7 9.6 0.00
- Viral hepatitis C 0.1 0.3 -0.04 0.2 0.2 0.00
- Medical history: Cardiovascular disease
- Atrial fibrillation 8.7 9.6 -0.03 8.2 8.4 -0.01
- Cerebrovascular disease 10.3 10.5 -0.01 9.5 9.6 0.00
- Coronary arteriosclerosis 17.2 18.4 -0.03 16.4 16.7 -0.01
- Heart disease 38.5 38.7 0.00 35.8 36.1 -0.01
- Heart failure 7.8 8.2 -0.01 6.3 6.4 0.00
- Ischemic heart disease 8.7 8.6 0.00 7.1 7.3 -0.01
- Peripheral vascular disease 7.3 10.1 -0.10 7.9 8.0 0.00
- Pulmonary embolism 0.7 0.8 -0.02 0.6 0.7 -0.01
- Venous thrombosis 1.9 2.5 -0.04 2.2 2.2 0.00
- Medical history: Neoplasms
- Malignant lymphoma 0.8 0.9 -0.01 0.8 0.8 0.00
- Malignant neoplasm of anorectum 0.5 0.3 0.03 0.4 0.3 0.00
- Malignant neoplastic disease 17.3 17.8 -0.01 17.4 17.5 0.00
- Malignant tumor of breast 3.0 3.2 -0.01 3.1 3.1 0.00
- Malignant tumor of colon 0.9 0.7 0.02 0.8 0.7 0.00
- Malignant tumor of lung 0.7 0.5 0.02 0.6 0.6 0.01
- Malignant tumor of urinary bladder 0.8 0.8 0.00 0.8 0.8 0.00
- Primary malignant neoplasm of prostate 3.9 3.4 0.02 3.6 3.6 0.00
- Medication use
- Agents acting on the renin-angiotensin system 44.2 49.6 -0.11 47.9 48.2 -0.01
- Antibacterials for systemic use 60.9 65.6 -0.10 61.8 62.4 -0.01
- Antidepressants 25.7 26.7 -0.02 26.2 26.5 -0.01
- Antiepileptics 14.7 17.8 -0.08 16.8 16.8 0.00
- Antiinflammatory and antirheumatic products 25.0 28.6 -0.08 28.9 28.8 0.00
- Antineoplastic agents 4.1 5.1 -0.05 4.8 4.7 0.00
- Antipsoriatics 0.7 1.1 -0.04 0.8 0.9 -0.01
- Antithrombotic agents 22.1 19.7 0.06 19.0 19.4 -0.01
- Beta blocking agents 34.4 38.1 -0.08 35.3 35.6 -0.01
- Calcium channel blockers 26.6 28.7 -0.05 26.9 27.2 -0.01
- Diuretics 42.4 41.2 0.02 40.5 41.3 -0.02
- Drugs for acid related disorders 37.1 38.9 -0.04 35.6 36.0 -0.01
- Drugs for obstructive airway diseases 38.2 44.7 -0.13 42.5 42.4 0.00
- Drugs used in diabetes 17.3 21.1 -0.10 18.5 18.6 0.00
- Immunosuppressants 3.0 4.9 -0.10 4.2 4.2 0.00
- Lipid modifying agents 49.1 56.0 -0.14 54.5 54.3 0.00
- Opioids 36.4 34.2 0.05 34.9 35.4 -0.01
- Psycholeptics 30.2 29.8 0.01 30.0 30.5 -0.01
- Psychostimulants, agents used for adhd and nootropics 1.5 1.8 -0.02 1.9 1.9 -0.01
-Inserting the population cohort in the database
+Generalizability
-
For various reasons it might be necessary to insert the study
-population back into the database, for example because we want to use an
-external cohort characterization tool. We can use the
-insertDbPopulation
function for this purpose:
-
-insertDbPopulation(
- population = matchedPop,
- cohortIds = c(101,100),
- connectionDetails = connectionDetails,
- cohortDatabaseSchema = resultsDatabaseSchema,
- cohortTable = "coxibVsNonselVsGiBleed",
- createTable = FALSE,
- cdmVersion = cdmVersion
-)
-This function will store the population in a table with the same
-structure as the cohort
table in the CDM, in this case in
-the same table where we had created our original cohorts.
+The goal of any propensity score adjustments is typically to make the
+target and comparator cohorts comparably, to allow proper causal
+inference. However, in doing so, we often need to modify our population,
+for example dropping subjects that have no counterpart in the other
+exposure cohort. The population we end up estimating an effect for may
+end up being very different from the population we started with. An
+important question is: how different? And it what ways? If the
+populations before and after adjustment are very different, our
+estimated effect may not generalize to the original population (if
+effect modification is present). The
+getGeneralizabilityTable()
function informs on these
+differences:
+
+getGeneralizabilityTable(balance)
+In this case, because we used PS matching, we are likely aiming to
+estimate the average treatment effect in the treated (ATT). For this
+reason, the getGeneralizabilityTable()
function
+automatically selected the target cohort as the basis for evaluating
+generalizability: it shows, for each covariate, the mean value before
+and PS adjustment in the target cohort. Also shown is the standardized
+difference of mean, and the table is reverse sorted by the absolute
+standard difference of mean (ASDM).
@@ -725,7 +564,7 @@ Follow-up and power
+
computeMdrr(
population = studyPop,
modelType = "cox",
@@ -733,13 +572,11 @@ Follow-up and power= 0.8,
twoSided = TRUE
)
-## targetPersons comparatorPersons targetExposures comparatorExposures targetDays comparatorDays totalOutcomes mdrr se
-## 1 48667 48688 48667 48688 7421404 3693928 554 1.26878 0.08497186
In this example we used the studyPop
object, so the
population before any matching or trimming. If we want to know the MDRR
after matching, we use the matchedPop
object we created
earlier instead:
-
+
computeMdrr(
population = matchedPop,
modelType = "cox",
@@ -747,8 +584,6 @@ Follow-up and power= 0.8,
twoSided = TRUE
)
-## targetPersons comparatorPersons targetExposures comparatorExposures targetDays comparatorDays totalOutcomes mdrr se
-## 1 22339 22339 22339 22339 3118703 1801997 226 1.451674 0.133038
Even thought the MDRR in the matched population is higher, meaning we
have less power, we should of course not be fooled: matching most likely
eliminates confounding, and is therefore preferred to not matching.
@@ -757,16 +592,12 @@ Follow-up and power
+
getFollowUpDistribution(population = matchedPop)
-## 100% 75% 50% 25% 0% Treatment
-## 1 2 60 60 126 4184 1
-## 2 2 45 60 67 2996 0
The output is telling us number of days of follow-up each quantile of
the study population has. We can also plot the distribution:
-
+
plotFollowUpDistribution(population = matchedPop)
-
Outcome models
@@ -779,52 +610,28 @@ Fitting a simple outcome model
+
outcomeModel <- fitOutcomeModel(population = studyPop,
modelType = "cox")
outcomeModel
-## Model type: cox
-## Stratified: FALSE
-## Use covariates: FALSE
-## Use inverse probability of treatment weighting: FALSE
-## Status: OK
-##
-## Estimate lower .95 upper .95 logRr seLogRr
-## treatment 1.25115 1.03524 1.51802 0.22406 0.0976
But of course we want to make use of the matching done on the
propensity score:
-
+
outcomeModel <- fitOutcomeModel(population = matchedPop,
modelType = "cox",
stratified = TRUE)
outcomeModel
-## Model type: cox
-## Stratified: TRUE
-## Use covariates: FALSE
-## Use inverse probability of treatment weighting: FALSE
-## Status: OK
-##
-## Estimate lower .95 upper .95 logRr seLogRr
-## treatment 0.9942982 0.7212731 1.3608869 -0.0057181 0.162
Note that we define the sub-population to be only those in the
matchedPop
object, which we created earlier by matching on
the propensity score. We also now use a stratified Cox model,
conditioning on the propensity score match sets.
Instead of matching or stratifying we can also perform Inverse
Probability of Treatment Weighting (IPTW):
-
+
outcomeModel <- fitOutcomeModel(population = ps,
modelType = "cox",
inversePtWeighting = TRUE)
outcomeModel
-## Model type: cox
-## Stratified: FALSE
-## Use covariates: FALSE
-## Use inverse probability of treatment weighting: TRUE
-## Status: OK
-##
-## Estimate lower .95 upper .95 logRr seLogRr
-## treatment 1.15095 0.83724 1.61142 0.14059 0.167
Adding interaction terms
@@ -833,7 +640,7 @@ Adding interaction terms
-
+
interactionCovariateIds <- c(8532001, 201826210, 21600960413)
# 8532001 = Female
# 201826210 = Type 2 Diabetes
@@ -843,28 +650,16 @@ Adding interaction terms stratified = TRUE,
interactionCovariateIds = interactionCovariateIds)
outcomeModel
-## Model type: cox
-## Stratified: TRUE
-## Use covariates: FALSE
-## Use inverse probability of treatment weighting: FALSE
-## Status: OK
-##
-## Estimate lower .95 upper .95 logRr seLogRr
-## treatment 1.24991 0.87272 1.80961 0.22307 0.1860
-## treatment * condition_era group during day -365 through 0 days relative to index: Type 2 diabetes mellitus 1.05089 0.68593 1.62105 0.04964 0.2194
-## treatment * drug_era group during day 0 through 0 days relative to index: ANTITHROMBOTIC AGENTS 0.63846 0.42639 0.96305 -0.44870 0.2078
-## treatment * gender = FEMALE 0.78988 0.54227 1.14572 -0.23587 0.1908
Note that you can use the grepCovariateNames
to find
covariate IDs.
It is prudent to verify that covariate balance has also been achieved
in the subgroups of interest. For example, we can check the covariate
balance in the subpopulation of females:
-
+
balanceFemale <- computeCovariateBalance(population = matchedPop,
cohortMethodData = cohortMethodData,
subgroupCovariateId = 8532001)
plotCovariateBalanceScatterPlot(balanceFemale)
-
Adding covariates to the outcome model
@@ -875,48 +670,33 @@ Adding covariates to the outcome
remove bias. For this we use the regularized Cox regression in the
Cyclops
package. (Note that the treatment variable is
automatically excluded from regularization.)
-
+
outcomeModel <- fitOutcomeModel(population = matchedPop,
cohortMethodData = cohortMethodData,
modelType = "cox",
stratified = TRUE,
useCovariates = TRUE)
outcomeModel
-## Model type: cox
-## Stratified: TRUE
-## Use covariates: TRUE
-## Use inverse probability of treatment weighting: FALSE
-## Status: OK
-## Prior variance: 0.0374185437128226
-##
-## Estimate lower .95 upper .95 logRr seLogRr
-## treatment 0.9985318 0.6870719 1.4438456 -0.0014693 0.1894
Inspecting the outcome model
We can inspect more details of the outcome model:
-
+
-## 900000010805
-## 0.9985318
-
+
-## [1] 0.6870719 1.4438456
We can also see the covariates that ended up in the outcome
model:
-
+
getOutcomeModel(outcomeModel, cohortMethodData)
-## coefficient id name
-## 1 -0.001469294 9e+11 Treatment
Kaplan-Meier plot
We can create the Kaplan-Meier plot:
-
+
plotKaplanMeier(matchedPop, includeZero = FALSE)
-
Note that the Kaplan-Meier plot will automatically adjust for any
stratification, matching, or trimming that may have been applied.
@@ -927,9 +707,9 @@ Time-to-event plot
+
plotTimeToEvent(cohortMethodData = cohortMethodData,
- outcomeId = 3,
+ outcomeId = 77,
firstExposureOnly = FALSE,
washoutPeriod = 0,
removeDuplicateSubjects = "keep all",
@@ -938,7 +718,6 @@ Time-to-event plot= "cohort start",
riskWindowEnd = 30,
endAnchor = "cohort end")
-
Note that this plot does not show any adjustment for the propensity
score.
@@ -948,7 +727,7 @@ Acknowledgments
Considerable work has been dedicated to provide the
CohortMethod
package.
-
+
citation("CohortMethod")
##
## To cite package 'CohortMethod' in publications use:
@@ -959,16 +738,14 @@ Acknowledgments## A BibTeX entry for LaTeX users is
##
## @Manual{,
-## title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome
-## Models},
+## title = {CohortMethod: New-User Cohort Method with Large Scale Propensity and Outcome Models},
## author = {Martijn Schuemie and Marc Suchard and Patrick Ryan},
## year = {2023},
-## note = {https://ohdsi.github.io/CohortMethod,
-## https://github.com/OHDSI/CohortMethod},
+## note = {https://ohdsi.github.io/CohortMethod, https://github.com/OHDSI/CohortMethod},
## }
Further, CohortMethod
makes extensive use of the
Cyclops
package.
-
+
citation("Cyclops")
##
## To cite Cyclops in publications use:
diff --git a/docs/articles/index.html b/docs/articles/index.html
index f2de5a33..f7083f19 100644
--- a/docs/articles/index.html
+++ b/docs/articles/index.html
@@ -17,7 +17,7 @@
diff --git a/docs/authors.html b/docs/authors.html
index cbad701f..c6f6ab79 100644
--- a/docs/authors.html
+++ b/docs/authors.html
@@ -17,7 +17,7 @@
diff --git a/docs/index.html b/docs/index.html
index 39433ada..2567e06a 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -44,7 +44,7 @@
diff --git a/docs/news/index.html b/docs/news/index.html
index af639beb..b4dca012 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -17,7 +17,7 @@
@@ -63,6 +63,29 @@ Changelog
Source: NEWS.md
+
+CohortMethod 5.2.0
+Changes:
+The computeCovariateBalance()
function now also computes standardized difference of mean comparing cohorts before and after PS adjustment, which can inform on generalizability.
+Added the getGeneralizabilityTable()
function.
+Improved computation of overall standard deviation when computing covariate balance (actually computing the SD instead of taking the mean of the target and comparator). Should produce more accurate balance estimations.
+Generated population objects now keep track of likely target estimator (e.g. ‘ATT’, or ‘ATE’). This informs selection of base population when calling getGeneralizabilityTable()
.
+Deprecated the attritionFractionThreshold
argument of the createCmDiagnosticThresholds
function, and instead added the generalizabilitySdmThreshold
argument.
+-
+
The results schema specifications of the exportToCsv()
function has changed:
+- Removed the
attrition_fraction
and attrition_diagnostic
fields from the cm_diagnostics_summary
table.
+- Added the
target_estimator
field to the cm_result
add cm_interaction_result
tables.
+- Added the
generalizability_max_sdm
and generalizabiltiy_diagnostic
fields to the cm_diagnostics_summary
table.
+- Added the
mean_before
, mean_after
, target_std_diff
, comparator_std_diff
, and target_comparator_std_diff
fields to both the cm_covariate_balance
and cm_shared_covariate_balance
tables.
+
+Improve speed of covariate balance computation.
+Adding one-sided (calibrated) p-values to results summary and results model.
+Adding unblind_for_evidence_synthesis
field to cm_diagnostics_summary
table.
+The cm_diagnostics_summary
table now also contains negative controls.
+
Bugfixes:
+Fixing runCmAnalyses()
when using refitPsForEveryOutcome = TRUE
.
+Handling edge case when exporting preference distribution and the target or comparator only has 1 subject.
+
CohortMethod 5.1.0
Changes:
@@ -103,7 +126,7 @@ CohortMethod
Added empirical calibration to the getResultsSummary()
function. Controls can be identified by the trueEffectSize
argument in the createOutcome()
function.
Dropping arguments like createPs
and fitOutcomeModel
from the createCmAnalysis()
function. Instead, not providing createPsArgs
or fitOutcomeModelArgs
is assumed to mean skipping propensity score creation or outcome model fitting, respectively.
-Added the exportToCsv()
function for exporting study results to CSV files that do not contain patient-level information and can therefore be shared between sites. The getResultsDataModel()
function returns the data model for these CSV files.
+Added the exportToCsv()
function for exporting study results to CSV files that do not contain patient-level information and can therefore be shared between sites. The getResultsDataModel()
function returns the data model for these CSV files.
Added the uploadExportedResults()
and insertExportedResultsInSqlite()
functions for uploading the results from the CSV files in a database. The launchResultsViewer()
and launchResultsViewerUsingSqlite()
functions were added for launching a Shiny app to view the results in the (SQLite) database.
Bug fixes:
- Fixed error when using integer
maxWeight
when performng IPTW.
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index eac4527f..26d02b32 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -4,5 +4,5 @@ pkgdown_sha: ~
articles:
MultipleAnalyses: MultipleAnalyses.html
SingleStudies: SingleStudies.html
-last_built: 2023-09-04T12:25Z
+last_built: 2023-12-21T10:18Z
diff --git a/docs/pull_request_template.html b/docs/pull_request_template.html
index 4483fdd5..72060d2f 100644
--- a/docs/pull_request_template.html
+++ b/docs/pull_request_template.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/CohortMethod-package.html b/docs/reference/CohortMethod-package.html
index c5e40f0c..0f5c06dd 100644
--- a/docs/reference/CohortMethod-package.html
+++ b/docs/reference/CohortMethod-package.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/CohortMethodData-class.html b/docs/reference/CohortMethodData-class.html
index 1ff6e539..321c8d0a 100644
--- a/docs/reference/CohortMethodData-class.html
+++ b/docs/reference/CohortMethodData-class.html
@@ -21,7 +21,7 @@
diff --git a/docs/reference/adjustedKm.html b/docs/reference/adjustedKm.html
index 99d107ee..13a28688 100644
--- a/docs/reference/adjustedKm.html
+++ b/docs/reference/adjustedKm.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/checkCmInstallation.html b/docs/reference/checkCmInstallation.html
index b3abbdde..bb354370 100644
--- a/docs/reference/checkCmInstallation.html
+++ b/docs/reference/checkCmInstallation.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/cohortMethodDataSimulationProfile.html b/docs/reference/cohortMethodDataSimulationProfile.html
index 53cbae00..044a8043 100644
--- a/docs/reference/cohortMethodDataSimulationProfile.html
+++ b/docs/reference/cohortMethodDataSimulationProfile.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/computeCovariateBalance.html b/docs/reference/computeCovariateBalance.html
index c33c618e..b0d9f47e 100644
--- a/docs/reference/computeCovariateBalance.html
+++ b/docs/reference/computeCovariateBalance.html
@@ -1,7 +1,8 @@
-Compute covariate balance before and after matching and trimming — computeCovariateBalance • CohortMethod Compute covariate balance before and after PS adjustment — computeCovariateBalance • CohortMethod
@@ -19,7 +20,7 @@
@@ -61,15 +62,16 @@
- Compute covariate balance before and after matching and trimming
+ Compute covariate balance before and after PS adjustment
Source: R/Balance.R
computeCovariateBalance.Rd
For every covariate, prevalence in treatment and comparator groups before and after
-matching/trimming are computed. When variable ratio matching was used the balance score will be
-corrected according the method described in Austin et al (2008).
+matching/trimming/weighting are computed. When variable ratio matching was used
+the balance score will be corrected according the method described in Austin et
+al (2008).
@@ -85,8 +87,7 @@ Compute covariate balance before and after matching and trimming
Arguments
- population
-A data frame containing the people that are remaining after matching
-and/or trimming.
+A data frame containing the people that are remaining after PS adjustment.
- cohortMethodData
@@ -117,7 +118,43 @@ Arguments
Value
-Returns a tibble describing the covariate balance before and after matching/trimming.
+Returns a tibble describing the covariate balance before and after PS adjustment,
+with one row per covariate, with the same data as the covariateRef
table in the CohortMethodData
object,
+and the following additional columns:
beforeMatchingMeanTarget: The (weighted) mean value in the target before PS adjustment.
+beforeMatchingMeanComparator: The (weighted) mean value in the comparator before PS adjustment.
+beforeMatchingSumTarget: The (weighted) sum value in the target before PS adjustment.
+beforeMatchingSumComparator: The (weighted) sum value in the comparator before PS adjustment.
+beforeMatchingSdTarget: The standard deviation of the value in the target before PS adjustment.
+beforeMatchingSdComparator: The standard deviation of the value in the comparator before PS adjustment.
+beforeMatchingMean: The mean of the value across target and comparator before PS adjustment.
+beforeMatchingSd: The standard deviation of the value across target and comparator before PS adjustment.
+afterMatchingMeanTarget: The (weighted) mean value in the target after PS adjustment.
+afterMatchingMeanComparator: The (weighted) mean value in the comparator after PS adjustment.
+afterMatchingSumTarget: The (weighted) sum value in the target after PS adjustment.
+afterMatchingSumComparator: The (weighted) sum value in the comparator after PS adjustment.
+afterMatchingSdTarget: The standard deviation of the value in the target after PS adjustment.
+afterMatchingSdComparator: The standard deviation of the value in the comparator after PS adjustment.
+afterMatchingMean: The mean of the value across target and comparator after PS adjustment.
+afterMatchingSd: The standard deviation of the value across target and comparator after PS adjustment.
+beforeMatchingStdDiff: The standardized difference of means when comparing the target to
+the comparator before PS adjustment.
+afterMatchingStdDiff: The standardized difference of means when comparing the target to
+the comparator after PS adjustment.
+targetStdDiff: The standardized difference of means when comparing the target
+before PS adjustment to the target after PS adjustment.
+comparatorStdDiff: The standardized difference of means when comparing the comparator
+before PS adjustment to the comparator after PS adjustment.
+-targetComparatorStdDiff: The standardized difference of means when comparing the entire
+population before PS adjustment to the entire population after
+PS adjustment.
+
The 'beforeMatchingStdDiff' and 'afterMatchingStdDiff' columns inform on the balance:
+are the target and comparator sufficiently similar in terms of baseline covariates to
+allow for valid causal estimation?
+
+
+The 'targetStdDiff', 'comparatorStdDiff', and 'targetComparatorStdDiff' columns inform on
+the generalizability: are the cohorts after PS adjustment sufficiently similar to the cohorts
+before adjustment to allow generalizing the findings to the original cohorts?
Details
diff --git a/docs/reference/computeEquipoise.html b/docs/reference/computeEquipoise.html
index a2722142..5ee05a17 100644
--- a/docs/reference/computeEquipoise.html
+++ b/docs/reference/computeEquipoise.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/computeMdrr.html b/docs/reference/computeMdrr.html
index 9b8d26bd..b3a86948 100644
--- a/docs/reference/computeMdrr.html
+++ b/docs/reference/computeMdrr.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/computePsAuc.html b/docs/reference/computePsAuc.html
index 0c05a816..3e34f0e9 100644
--- a/docs/reference/computePsAuc.html
+++ b/docs/reference/computePsAuc.html
@@ -17,7 +17,7 @@
@@ -108,7 +108,7 @@ Examples
data <- data.frame(treatment = treatment, propensityScore = propensityScore)
data <- data[data$propensityScore > 0 & data$propensityScore < 1, ]
computePsAuc(data)
-#> [1] 0.6879579
+#> [1] 0.7343716
diff --git a/docs/reference/createCmAnalysis.html b/docs/reference/createCmAnalysis.html
index c8252a84..503cc5e7 100644
--- a/docs/reference/createCmAnalysis.html
+++ b/docs/reference/createCmAnalysis.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createCmDiagnosticThresholds.html b/docs/reference/createCmDiagnosticThresholds.html
index 74464395..9a0a699b 100644
--- a/docs/reference/createCmDiagnosticThresholds.html
+++ b/docs/reference/createCmDiagnosticThresholds.html
@@ -17,7 +17,7 @@
@@ -74,7 +74,8 @@ Create CohortMethod diagnostics thresholds
easeThreshold = 0.25,
sdmThreshold = 0.1,
equipoiseThreshold = 0.2,
- attritionFractionThreshold = 1
+ attritionFractionThreshold = NULL,
+ generalizabilitySdmThreshold = 1
)
@@ -101,10 +102,14 @@ Arguments
attritionFractionThreshold
-What is the maximum allowed attrition fraction? If the attrition
-between the input target cohort and the target cohort entering the
-outcome model is greater than this fraction, the diagnostic will
-fail.
+DEPRECATED. See generalizabilitySdmThreshold
instead.
+
+
+generalizabilitySdmThreshold
+What is the maximum allowed standardized difference of mean
+(SDM)when comparing the population before and after PS
+adjustments? If the SDM is greater than this value, the diagnostic
+will fail.
diff --git a/docs/reference/createCmTable1.html b/docs/reference/createCmTable1.html
index 48ed5d1c..10175a7e 100644
--- a/docs/reference/createCmTable1.html
+++ b/docs/reference/createCmTable1.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createCohortMethodDataSimulationProfile.html b/docs/reference/createCohortMethodDataSimulationProfile.html
index e4d26e8f..13fe5e01 100644
--- a/docs/reference/createCohortMethodDataSimulationProfile.html
+++ b/docs/reference/createCohortMethodDataSimulationProfile.html
@@ -19,7 +19,7 @@
diff --git a/docs/reference/createComputeCovariateBalanceArgs.html b/docs/reference/createComputeCovariateBalanceArgs.html
index 9501a214..5dcd574d 100644
--- a/docs/reference/createComputeCovariateBalanceArgs.html
+++ b/docs/reference/createComputeCovariateBalanceArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createCreatePsArgs.html b/docs/reference/createCreatePsArgs.html
index 13907d3d..e0109fe9 100644
--- a/docs/reference/createCreatePsArgs.html
+++ b/docs/reference/createCreatePsArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createCreateStudyPopulationArgs.html b/docs/reference/createCreateStudyPopulationArgs.html
index 141e6159..a7f0f7af 100644
--- a/docs/reference/createCreateStudyPopulationArgs.html
+++ b/docs/reference/createCreateStudyPopulationArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createDefaultMultiThreadingSettings.html b/docs/reference/createDefaultMultiThreadingSettings.html
index 2498eb4e..d700c232 100644
--- a/docs/reference/createDefaultMultiThreadingSettings.html
+++ b/docs/reference/createDefaultMultiThreadingSettings.html
@@ -18,7 +18,7 @@
diff --git a/docs/reference/createFitOutcomeModelArgs.html b/docs/reference/createFitOutcomeModelArgs.html
index 7b9aeaba..95a0e609 100644
--- a/docs/reference/createFitOutcomeModelArgs.html
+++ b/docs/reference/createFitOutcomeModelArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createGetDbCohortMethodDataArgs.html b/docs/reference/createGetDbCohortMethodDataArgs.html
index 0c4173e0..f7e33227 100644
--- a/docs/reference/createGetDbCohortMethodDataArgs.html
+++ b/docs/reference/createGetDbCohortMethodDataArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createMatchOnPsAndCovariatesArgs.html b/docs/reference/createMatchOnPsAndCovariatesArgs.html
index 18035581..47c14f02 100644
--- a/docs/reference/createMatchOnPsAndCovariatesArgs.html
+++ b/docs/reference/createMatchOnPsAndCovariatesArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createMatchOnPsArgs.html b/docs/reference/createMatchOnPsArgs.html
index 0c27fd97..e6ae22ad 100644
--- a/docs/reference/createMatchOnPsArgs.html
+++ b/docs/reference/createMatchOnPsArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createMultiThreadingSettings.html b/docs/reference/createMultiThreadingSettings.html
index 1a6faf50..96f1e04b 100644
--- a/docs/reference/createMultiThreadingSettings.html
+++ b/docs/reference/createMultiThreadingSettings.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createOutcome.html b/docs/reference/createOutcome.html
index 2cc5d955..1cc48d4a 100644
--- a/docs/reference/createOutcome.html
+++ b/docs/reference/createOutcome.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createPs.html b/docs/reference/createPs.html
index 1d781739..d80b3c6d 100644
--- a/docs/reference/createPs.html
+++ b/docs/reference/createPs.html
@@ -17,7 +17,7 @@
@@ -171,10 +171,10 @@ Examples
#> Removing 0 redundant covariates
#> Removing 0 infrequent covariates
#> Normalizing covariates
-#> Tidying covariates took 1.25 secs
+#> Tidying covariates took 0.494 secs
#> Warning: All coefficients (except maybe the intercept) are zero. Either the covariates are completely uninformative or completely predictive of the treatment. Did you remember to exclude the treatment variables from the covariates?
#> Propensity model fitting finished with status OK
-#> Creating propensity scores took 4.71 secs
+#> Creating propensity scores took 1.69 secs
diff --git a/docs/reference/createResultsDataModel.html b/docs/reference/createResultsDataModel.html
new file mode 100644
index 00000000..8a1920bf
--- /dev/null
+++ b/docs/reference/createResultsDataModel.html
@@ -0,0 +1,121 @@
+
+Create the results data model tables on a database server. — createResultsDataModel • CohortMethod
+
+
+
+
+
+
+
+
+
+
+ Create the results data model tables on a database server.
+ Source: R/ResultsDataModel.R
+ createResultsDataModel.Rd
+
+
+
+ Create the results data model tables on a database server.
+
+
+
+ createResultsDataModel(
+ connectionDetails = NULL,
+ databaseSchema,
+ tablePrefix = ""
+)
+
+
+
+ Arguments
+ - connectionDetails
+DatabaseConnector connectionDetails instance @seealsoDatabaseConnector::createConnectionDetails
+
+
+- databaseSchema
+The schema on the server where the tables will be created.
+
+
+- tablePrefix
+(Optional) string to insert before table names for database table names
+
+
+
+ Details
+ Only PostgreSQL and SQLite servers are supported.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/reference/createStratifyByPsAndCovariatesArgs.html b/docs/reference/createStratifyByPsAndCovariatesArgs.html
index 366ad2ac..ab838557 100644
--- a/docs/reference/createStratifyByPsAndCovariatesArgs.html
+++ b/docs/reference/createStratifyByPsAndCovariatesArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createStratifyByPsArgs.html b/docs/reference/createStratifyByPsArgs.html
index c002b77d..163a41e8 100644
--- a/docs/reference/createStratifyByPsArgs.html
+++ b/docs/reference/createStratifyByPsArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createStudyPopulation.html b/docs/reference/createStudyPopulation.html
index 9e712693..066f3bfc 100644
--- a/docs/reference/createStudyPopulation.html
+++ b/docs/reference/createStudyPopulation.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createTargetComparatorOutcomes.html b/docs/reference/createTargetComparatorOutcomes.html
index 48d9d724..5bee64d9 100644
--- a/docs/reference/createTargetComparatorOutcomes.html
+++ b/docs/reference/createTargetComparatorOutcomes.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createTrimByIptwArgs.html b/docs/reference/createTrimByIptwArgs.html
index f734c078..0059a4a6 100644
--- a/docs/reference/createTrimByIptwArgs.html
+++ b/docs/reference/createTrimByIptwArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createTrimByPsArgs.html b/docs/reference/createTrimByPsArgs.html
index e354248e..9f5a8659 100644
--- a/docs/reference/createTrimByPsArgs.html
+++ b/docs/reference/createTrimByPsArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createTrimByPsToEquipoiseArgs.html b/docs/reference/createTrimByPsToEquipoiseArgs.html
index a9683246..ddae52e6 100644
--- a/docs/reference/createTrimByPsToEquipoiseArgs.html
+++ b/docs/reference/createTrimByPsToEquipoiseArgs.html
@@ -17,7 +17,7 @@
diff --git a/docs/reference/createTruncateIptwArgs.html b/docs/reference/createTruncateIptwArgs.html
index ddc388f9..10b30d97 100644
--- a/docs/reference/createTruncateIptwArgs.html
+++ b/docs/reference/createTruncateIptwArgs.html
@@ -17,7 +17,7 @@
Returns ResultModelManager DataMigrationsManager instance.
+getDataMigrator(connectionDetails, databaseSchema, tablePrefix = "")
DatabaseConnector connection details object
String schema where database schema lives
(Optional) Use if a table prefix is used before table names (e.g. "cd_")
Instance of ResultModelManager::DataMigrationManager that has interface for converting existing data models
+to assess generalizability we compare the distribution of covariates before and after +any (propensity score) adjustments. We compute the standardized difference of mean as +our metric of generalizability. (Lipton et al., 2017)
+Depending on our target estimand, we need to consider a different base population for +generalizability. For example, if we aim to estimate the average treatment effect in +thetreated (ATT), our base population should be the target population, meaning we +should consider the covariate distribution before and after PS adjustment in the target +population only. By default this function will attempt to select the right base +population based on what operations have been performed on the population. For example, +if PS matching has been performed we assume the target estimand is the ATT, and the +target population is selected as base.
+Requires running computeCovariateBalance()
` first.
getGeneralizabilityTable(balance, baseSelection = "auto")
A data frame created by the computeCovariateBalance
function.
The selection of the population to consider for generalizability. +Options are "auto", "target", "comparator", and "both". The "auto" +option will attempt to use the balance meta-data to pick the most +appropriate population based on the target estimator.
A tibble with the following columns:
covariateId: The ID of the covariate. Can be linked to the covariates
and covariateRef
+tables in the CohortMethodData
object.
covariateName: The name of the covariate.
beforeMatchingMean: The mean covariate value before any (propensity score) adjustment.
afterMatchingMean: The mean covariate value after any (propensity score) adjustment.
stdDiff: The standardized difference of means between before and after adjustment.
The tibble also has a 'baseSelection' attribute, documenting the base population used +to assess generalizability.
+Tipton E, Hallberg K, Hedges LV, Chan W (2017) Implications of Small Samples +for Generalization: Adjustments and Rules of Thumb, Eval Rev. Oct;41(5):472-505.
+R/ResultsDataModel.R
+ getResultsDataModelSpecifications.Rd
Get specifications for CohortMethod results data model
+getResultsDataModelSpecifications()
A tibble data frame object with specifications
+Compute covariate balance before and after matching and trimming
Compute covariate balance before and after PS adjustment
plotTimeToEvent()
Plot time-to-event
Get information on generalizability
Functions for running multiple analyses in an efficient way.
@@ -339,9 +343,9 @@Get results data model
Get specifications for CohortMethod results data model
Launch Shiny app using a SQLite database
Create the results data model tables on a database server.
Migrate Data model
Get database migrations instance
Upload exported results to a database
Upload results to the database server.
Migrate data from current state to next state
+It is strongly advised that you have a backup of all data (either sqlite files, a backup database (in the case you +are using a postgres backend) or have kept the csv/zip files from your data generation.
+migrateDataModel(connectionDetails, databaseSchema, tablePrefix = "")
DatabaseConnector connection details object
String schema where database schema lives
(Optional) Use if a table prefix is used before table names (e.g. "cd_")
Requires the results data model tables have been created using the createResultsDataModel
function.
uploadResults(
+ connectionDetails,
+ schema,
+ zipFileName,
+ forceOverWriteOfSpecifications = FALSE,
+ purgeSiteDataBeforeUploading = TRUE,
+ tempFolder = tempdir(),
+ tablePrefix = "",
+ ...
+)
An object of type connectionDetails
as created using the
+createConnectionDetails
function in the
+DatabaseConnector package.
The schema on the server where the tables have been created.
The name of the zip file.
If TRUE, specifications of the phenotypes, cohort definitions, and analysis +will be overwritten if they already exist on the database. Only use this if these specifications +have changed since the last upload.
If TRUE, before inserting data for a specific databaseId all the data for +that site will be dropped. This assumes the input zip file contains the full data for that +data site.
A folder on the local file system where the zip files are extracted to. Will be cleaned +up when the function is finished. Can be used to specify a temp folder on a drive that +has sufficient space if the default system temp space is too limited.
(Optional) string to insert before table names for database table names
See ResultModelManager::uploadResults