diff --git a/_projects/uclh_ngtube_s0/_01_detail.md b/_projects/uclh_ngtube_s0/_01_detail.md
deleted file mode 100644
index 13d15c5..0000000
--- a/_projects/uclh_ngtube_s0/_01_detail.md
+++ /dev/null
@@ -1,27 +0,0 @@
-
-##### Theme :
-NgTube
-
-##### Supporting Document / Problem Statement:
-N/A
-
-##### Correlated Data Source:
-Ngtube vocab data
-
-##### Generation Rules
-* The patient's age should be between 18 and 100 at the moment of the visit.
-* Ethnicity data is using 2021 census data in England and Wales ([Census in England and Wales 2021](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/bulletins/ethnicgroupenglandandwales/census2021)) .
-* Gender is equally distributed between Male and Female (50% each).
-* Every person in the record has a link in procedure_occurrence with the concept "Checking the position of nasogastric tube using X-ray"
-* 2% of person records have a link in procedure_occurrence with the concept of "Plain chest X-ray"
-* 60% of visit_occurrence has visit concept "Inpatient Visit", while 40% have "Emergency Room Visit"
-
-##### Table
-1. person
-2. procedure_occurrence
-3. visit_occurrence
-
-##### Remark:
-* Version 0
-* Generated by man-made rule/story generator
-* Structural correct, all tables linked with the relationship
diff --git a/_projects/uclh_ngtube_s0/_02_appendix.md b/_projects/uclh_ngtube_s0/_02_appendix.md
deleted file mode 100644
index 87c392a..0000000
--- a/_projects/uclh_ngtube_s0/_02_appendix.md
+++ /dev/null
@@ -1,23 +0,0 @@
-
-##### 2011 Race Census figure in England and Wales
-
-| Ethnic Group | Population(%) |
-|------------------------------------------------------------------------------------------------|---------------|
-| Asian or Asian British: Bangladeshi | 1.1 |
-| Asian or Asian British: Chinese | 0.7 |
-| Asian or Asian British: Indian | 3.1 |
-| Asian or Asian British: Pakistani | 2.7 |
-| Asian or Asian British: any other Asian background | 1.6 |
-| Black or African or Caribbean or Black British: African | 2.5 |
-| Black or African or Caribbean or Black British: Caribbean | 1 |
-| Black or African or Caribbean or Black British: other Black or African or Caribbean background | 0.5 |
-| Mixed multiple ethnic groups: White and Asian | 0.8 |
-| Mixed multiple ethnic groups: White and Black African | 0.4 |
-| Mixed multiple ethnic groups: White and Black Caribbean | 0.9 |
-| Mixed multiple ethnic groups: any other Mixed or multiple ethnic background | 0.8 |
-| White: English or Welsh or Scottish or Northern Irish or British | 74.4 |
-| White: Irish | 0.9 |
-| White: Gypsy or Irish Traveller | 0.1 |
-| White: any other White background | 6.4 |
-| Other ethnic group: any other ethnic group | 1.6 |
-| Other ethnic group: Arab | 0.6 |
diff --git a/_projects/uclh_ngtube_s0/_03_data.html b/_projects/uclh_ngtube_s0/_03_data.html
deleted file mode 100644
index 97a7fcf..0000000
--- a/_projects/uclh_ngtube_s0/_03_data.html
+++ /dev/null
@@ -1,12 +0,0 @@
-
-
Synthetic Data CSV file
-
- Warning! We will publish here our synthetic data once it's approved for publication by the people at UCL Research Data Repository.
-
-
diff --git a/_projects/uclh_ngtube_s0/_03_data.md b/_projects/uclh_ngtube_s0/_03_data.md
new file mode 100644
index 0000000..3d0ab4b
--- /dev/null
+++ b/_projects/uclh_ngtube_s0/_03_data.md
@@ -0,0 +1,56 @@
+# Example (synthetic) Electronic Health Record data
+
+These data are modelled using the OMOP Common Data Model v5.3.
+
+## CSV files
+
+The name of the file corresponds to the table in the OMOP CDM.
+
+- [person](data/person.csv)
+- [procedure_occurrence](data/procedure_occurrence.csv)
+- [visit_occurrence](data/visit_occurrence.csv)
+
+## Correlated Data Source
+
+- NG tube vocabularies
+
+## Generation Rules
+
+- The patient's age should be between 18 and 100 at the moment of the visit.
+- Ethnicity data is using 2021 census data in England and Wales ([Census in England and Wales 2021](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/bulletins/ethnicgroupenglandandwales/census2021)) .
+- Gender is equally distributed between Male and Female (50% each).
+- Every person in the record has a link in procedure_occurrence with the concept "Checking the position of nasogastric tube using X-ray"
+- 2% of person records have a link in procedure_occurrence with the concept of "Plain chest X-ray"
+- 60% of visit_occurrence has visit concept "Inpatient Visit", while 40% have "Emergency Room Visit"
+
+## Notes
+
+- Version 0
+- Generated by man-made rule/story generator
+- Structural correct, all tables linked with the relationship
+- We used national ethnicity data to generate a realistic distribution (see below)
+
+
+
+### 2011 Race Census figure in England and Wales
+
+| Ethnic Group | Population(%) |
+|------------------------------------------------------------------------------------------------|---------------|
+| Asian or Asian British: Bangladeshi | 1.1 |
+| Asian or Asian British: Chinese | 0.7 |
+| Asian or Asian British: Indian | 3.1 |
+| Asian or Asian British: Pakistani | 2.7 |
+| Asian or Asian British: any other Asian background | 1.6 |
+| Black or African or Caribbean or Black British: African | 2.5 |
+| Black or African or Caribbean or Black British: Caribbean | 1 |
+| Black or African or Caribbean or Black British: other Black or African or Caribbean background | 0.5 |
+| Mixed multiple ethnic groups: White and Asian | 0.8 |
+| Mixed multiple ethnic groups: White and Black African | 0.4 |
+| Mixed multiple ethnic groups: White and Black Caribbean | 0.9 |
+| Mixed multiple ethnic groups: any other Mixed or multiple ethnic background | 0.8 |
+| White: English or Welsh or Scottish or Northern Irish or British | 74.4 |
+| White: Irish | 0.9 |
+| White: Gypsy or Irish Traveller | 0.1 |
+| White: any other White background | 6.4 |
+| Other ethnic group: any other ethnic group | 1.6 |
+| Other ethnic group: Arab | 0.6 |
diff --git a/_projects/uclh_ngtube_s0/_04_image.md b/_projects/uclh_ngtube_s0/_04_image.md
new file mode 100644
index 0000000..c0df4cb
--- /dev/null
+++ b/_projects/uclh_ngtube_s0/_04_image.md
@@ -0,0 +1,29 @@
+# Example (synthetic) images
+
+## Model
+
+A Hugging Face Unconditional image generation Diffusion Model was used for training. [1] Unconditional image generation models are not conditioned on text or images during training. They only generate images that resemble the training data distribution. The model usually starts with a seed that generates a random noise vector. The model will then use this vector to create an output image similar to the images used to train the model.
+The training script initializes a UNet2DModel and uses it to train the model. [2] The training loop adds noise to the images, predicts the noise residual, calculates the loss, saves checkpoints at specified steps, and saves the generated models.
+
+## Training Dataset
+
+The RANZCR CLiP dataset was used to train the model. [3] This dataset has been created by The Royal Australian and New Zealand College of Radiologists (RANZCR) which is a not-for-profit professional organisation for clinical radiologists and radiation oncologists. The dataset has been labelled with a set of definitions to ensure consistency with labelling. The normal category includes lines that were appropriately positioned and did not require repositioning. The borderline category includes lines that would ideally require some repositioning but would in most cases still function adequately in their current position. The abnormal category included lines that required immediate repositioning. 30000 images were used during training. All training images were 512x512 in size.
+Computational Information
+Training has been conducted using RTX 6000 cards with 24GB of graphics memory. A checkpoint was created after each epoch was saved with 220 checkpoints being generated so far. Each checkpoint takes up 1GB space in memory. Generating each epoch takes around 6 hours. Machine learning libraries such as TensorFlow, PyTorch, or scikit-learn are used to run the training, along with additional libraries for data preprocessing, visualization, or deployment.
+
+
+
+
+
+
+
+
+
+
+
+
+## References
+
+1. https://huggingface.co/docs/diffusers/en/training/unconditional_training#unconditional-image-generation
+2. https://github.com/huggingface/diffusers/blob/096f84b05f9514fae9f185cbec0a4d38fbad9919/examples/unconditional_image_generation/train_unconditional.py#L356
+3. https://www.kaggle.com/competitions/ranzcr-clip-catheter-line-classification/data
diff --git a/_projects/uclh_ngtube_s0/images/1.png b/_projects/uclh_ngtube_s0/images/1.png
new file mode 100644
index 0000000..3dce665
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/1.png differ
diff --git a/_projects/uclh_ngtube_s0/images/10.png b/_projects/uclh_ngtube_s0/images/10.png
new file mode 100644
index 0000000..ed2bb95
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/10.png differ
diff --git a/_projects/uclh_ngtube_s0/images/20.png b/_projects/uclh_ngtube_s0/images/20.png
new file mode 100644
index 0000000..b590898
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/20.png differ
diff --git a/_projects/uclh_ngtube_s0/images/2_2.png b/_projects/uclh_ngtube_s0/images/2_2.png
new file mode 100644
index 0000000..ee80e6b
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/2_2.png differ
diff --git a/_projects/uclh_ngtube_s0/images/36_2.png b/_projects/uclh_ngtube_s0/images/36_2.png
new file mode 100644
index 0000000..c289f34
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/36_2.png differ
diff --git a/_projects/uclh_ngtube_s0/images/37.png b/_projects/uclh_ngtube_s0/images/37.png
new file mode 100644
index 0000000..0e00234
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/37.png differ
diff --git a/_projects/uclh_ngtube_s0/images/39.png b/_projects/uclh_ngtube_s0/images/39.png
new file mode 100644
index 0000000..928e47c
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/39.png differ
diff --git a/_projects/uclh_ngtube_s0/images/44.png b/_projects/uclh_ngtube_s0/images/44.png
new file mode 100644
index 0000000..c23490e
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/44.png differ
diff --git a/_projects/uclh_ngtube_s0/images/44_2.png b/_projects/uclh_ngtube_s0/images/44_2.png
new file mode 100644
index 0000000..dd21755
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/44_2.png differ
diff --git a/_projects/uclh_ngtube_s0/images/45.png b/_projects/uclh_ngtube_s0/images/45.png
new file mode 100644
index 0000000..23eb5b1
Binary files /dev/null and b/_projects/uclh_ngtube_s0/images/45.png differ
diff --git a/_projects/uclh_ngtube_s0/index.md b/_projects/uclh_ngtube_s0/index.md
index 2807b27..eb4cd3d 100644
--- a/_projects/uclh_ngtube_s0/index.md
+++ b/_projects/uclh_ngtube_s0/index.md
@@ -1,28 +1,29 @@
---
layout: project
-title: UCLH Ngtube Synthetic Data
+title: Detecting misplaced NG tubes
status: ongoing
+tags: imaging, safety
authors:
-- NIHR CRIU SAFEHR Team
+- SAFEHR team
tabs:
- {
- name: uclh-ngt-s0-detail,
- type: md,
- source: _01_detail.md,
- label: Detail
+ name: uclh-ngt-s0-quarto,
+ type: html,
+ source: omop_concepts_frequency_table.html,
+ label: Data summary
}
- {
- name: uclh-ngt-s0-appx,
- type: md,
- source: _02_appendix.md,
- label: Appendix
+ name: uclh-ngt-s0-ehr-data,
+ type: md,
+ source: _03_data.md,
+ label: Synthetic EHR
}
- {
- name: uclh-ngt-s0-data,
- type: html,
- source: _03_data.html,
- label: Data
+ name: uclh-ngt-s0-image-data,
+ type: md,
+ source: _04_image.md,
+ label: Synthetic CXR
}
---
diff --git a/_projects/uclh_ngtube_s0/omop_concepts_frequency_table.html b/_projects/uclh_ngtube_s0/omop_concepts_frequency_table.html
new file mode 100644
index 0000000..e2d349f
--- /dev/null
+++ b/_projects/uclh_ngtube_s0/omop_concepts_frequency_table.html
@@ -0,0 +1,1395 @@
+
+
+
+
+
+
+
+
+
+Frequency of OMOP concepts in clinical tables
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Frequency of OMOP concepts in clinical tables
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Code
+
# PROJECT SETUP
+
+# #################### Libraries #################### #
+library(here)
+library(tidyverse)
+library(dbplyr, warn.conflicts =FALSE)
+library(rlang, warn.conflicts =FALSE)
+library(odbc)
+
+
+# #################### Constants #################### #
+CONFIG_PATH <-"config/db_config.yml"
+
+OMOP_TABLES_DIR <-"res/tables/"
+OMOP_COLUMNS_DIR <-"res/columns/"
+
+INPUT_TABLES_FILE <-"clinical.txt"
+OUTPUT_PATH <-"out/concept_frequency.csv"
+
+
+# #################### Functions: Data #################### #
+
+# Higher order function to conditionally apply a pipe
+# Note that the cond is not vectorised (should be a single logical)
+pipe_if <-function(df, cond, func) {
+if (cond) func(df)
+else df
+}
+
+# Load the list of OMOP clinical tables
+load_table_list <-function(filename) {
+read_delim(
+file =here(paste0(OMOP_TABLES_DIR, filename)),
+delim =",",
+col_names =FALSE,
+col_types ="c",
+show_col_types =FALSE)$X1 |>
+map(function(t) { tolower(t) })
+}
+
+# Load OMOP column specs for the given table
+load_column_metadata <-function(table) {
+read_csv(
+file =here(paste0(OMOP_COLUMNS_DIR, table, "_column_spec.csv")),
+col_types ="cIlcccllcIc",
+show_col_types =FALSE)
+}
+
+# Load OMOP column specs extracting Concept columns
+load_concept_columns <-function(table) {
+ OMOP_VER <-53
+ OMOP_CONCEPT_TYPE <-"concept"
+
+load_column_metadata(table) |>
+filter(version <= OMOP_VER, type == OMOP_CONCEPT_TYPE) |>
+pull(column)
+}
+
+# Debugging output
+#
+#clinical_tables <- load_table_list("clinical.txt")
+#clinical_tables
+#
+# concepts_by_table <- clinical_tables |>
+# # keep(function(t) { t == "person" || t == "death" }) |>
+# map(function(t) { l <- list(); l[[t]] <- load_concept_columns(t); l }) |>
+# list_flatten()
+# concepts_by_table
+# map(ls(concepts_by_table), function(t) { list(t, concepts_by_table[[t]] |> as.list()) })
+
+
+# Load DB configuration
+db_load_config <-function(filepath) {
+ config = config::get(file =here(filepath))
+}
+
+# Connect to a database from the given config
+db_connect <-function(config) {
+# Load DB connection
+dbConnect(
+odbc(),
+driver =as.character(config["odbc_driver"]),
+database =as.character(config["odbc_database"]),
+server =as.character(config["odbc_server"]),
+port =as.integer(config["odbc_port"]),
+uid =as.character(config["odbc_uid"]),
+pwd =as.character(config["odbc_pwd"]))
+}
+
+# Load table from DB
+db_omop_table <-function(tablename, config, conn, cols=NULL) {
+tbl(conn, in_schema(as.character(config["odbc_schema"]), tablename)) |>
+pipe_if(!missing(cols), \(df) df |>select(all_of(cols))) |>
+# head() |> # Head of all tables
+# pipe_if(tablename != "concept", \(df) df |> head()) |> # Head of non Concept tables
+collect()
+}
+
+# Enrich given data frame with the count by column, and include table and column names as metadata
+count_with_metadata <-function(df, tablename, colname) {
+ df |>
+rename(concept =all_of({{colname}})) |>
+count(concept, name="count") |>
+mutate(
+table=tablename,
+column=colname,
+.before=concept)
+}
+
+# Enrich data frame adding concept names
+join_with_concept <-function(df, concepts_df) {
+ df |>
+left_join(
+ concepts_df,
+by=join_by(concept == concept_id))
+}
+
+
+# #################### Functions: Plots #################### #
+
+# Arranges rows by "count" and update factor levels (for the arrangement to be respected by plots)
+arrange_by_count <-function(.df) {
+ .df |>
+# Arrange by count, which sorts the dataframe but NOT the factor levels
+arrange(desc(count)) |>
+# Update the factor levels
+mutate(concept_name=fct_reorder(concept_name, count))
+}
+
+# Groups rows by concept name, summarising the counts
+group_by_name <-function(.df) {
+ .df |>
+group_by(concept_name) |>
+summarise(count=sum(count))
+}
+
+# Returns a vector of N colours (N <= 12) to use as palette
+get_palette <-function(n) {
+c("#009cdb", "#00a3c0", "#00a599", "#33a46f", "#6e9e4c", "#9a933c",
+"#bd8445", "#d57562", "#db6d8a", "#cc72b2", "#a881d3", "#7090e2") |>
+head(n)
+}
+
+# Returns a bar plot of frequency counts
+freq_bar_plot <-function(.df, head=30, title="", fill="#999999") {
+ .df |>
+group_by_name() |>
+arrange_by_count() |>
+head(head) |>
+ggplot(aes(x=concept_name, y=count)) +
+geom_bar(stat="identity", fill=fill, width=.6) +
+coord_flip() +
+xlab("") +
+scale_x_discrete(label=function(x) { stringr::str_trunc(x, 50) }) +
+ggtitle(label=title) +
+theme_bw()
+}
+
+# Returns a pie plot of concept distribution with percentages
+dist_pie_plot <-function(.df, head=10, title="", fill=c(), border="white") {
+ .df |>
+group_by_name() |>
+arrange_by_count() |>
+head(head) |>
+# Calculate count %
+mutate(percent =round(count /sum(.df$count) *100)) |>
+# Plot
+ggplot(aes(x="", y=count, fill=concept_name)) +
+geom_bar(stat="identity", width=1, colour="white") +
+coord_polar("y", start=0) +
+# Remove background, grid, numeric labels
+theme_void() +
+# Embed count %
+geom_text(
+aes(label=paste0(percent, "%")),
+position=position_stack(vjust=0.5),
+colour="white", fontface ="bold", size=6) +
+# Title and colour
+ggtitle(label=title) +
+scale_fill_manual(values=fill)
+}
+
+
+
+
Data Processing
+
+
Frequency table
+
Frequency table for OMOP concepts in clinical tables.
+
Clincial tables are:
+
+
CARE_SITE
+
CONDITION_OCCURRENCE
+
DEATH
+
DEVICE_EXPOSURE
+
DRUG_EXPOSURE
+
FACT_RELATIONSHIP
+
LOCATION
+
MEASUREMENT
+
OBSERVATION_PERIOD
+
OBSERVATION
+
PERSON
+
PROCEDURE_OCCURRENCE
+
SPECIMEN
+
VISIT_DETAIL
+
VISIT_OCCURRENCE
+
+
+
+Code
+
# DATA PROCESSING
+
+# Generate frequency table for OMOP concepts in clinical tables
+
+db_config <-db_load_config(CONFIG_PATH)
+db_conn =db_connect(db_config)
+
+start_time <-Sys.time()
+
+# Load all Concepts to find names
+concepts_df <-db_omop_table("concept", db_config, db_conn, cols=c("concept_id", "concept_name"))
+
+concept_freq <-tibble()
+for (tablename inload_table_list(INPUT_TABLES_FILE)) {
+# Table from DB
+ df <-db_omop_table(tablename, db_config, db_conn)
+
+# Add to metadata
+for (colname inload_concept_columns(tablename)) {
+#message("count_with_metadata: Processing ", tablename, ".", colname)
+ concept_freq <-bind_rows(
+ concept_freq,
+count_with_metadata(df, tablename, colname))
+ }
+}
+
+concept_freq <- concept_freq |>
+
+# Remove lines with count < 5
+filter(count >=5) |>
+
+# Sort by concept
+arrange(table, column, concept) |>
+
+# Join with Concept to include names
+join_with_concept(concepts_df)
+
+# Calculate processing time
+end_time <-Sys.time()
+message("Generated in ", sprintf("%.2f", as.numeric(end_time - start_time, units="mins")), " minutes")
+
+# Export and print result
+concept_freq |>write_csv(OUTPUT_PATH)
+concept_freq
+
+
+# [WIP] Attempts to generate frequency table with functional programming
+#
+# load_table_list("clinical.txt") |>
+# # keep(function(t) { t == "person" || t == "death" }) |>
+# map(function(t) {
+# load_concept_columns(t) |>
+# map(function(c) {
+# count_with_metadata(
+# db_omop_table(schema, t, conn=conn),
+# t, c)
+# })
+# }) |>
+# bind_rows()
+
+dbDisconnect(db_conn)
+
+
+
+
A tibble: 14272 × 5
+
+
+
table
+
column
+
concept
+
count
+
concept_name
+
+
+
<chr>
+
<chr>
+
<int>
+
<int>
+
<chr>
+
+
+
+
+
care_site
+
place_of_service_concept_id
+
8717
+
23
+
Inpatient Hospital
+
+
+
condition_occurrence
+
condition_concept_id
+
22274
+
33
+
Neoplasm of uncertain behavior of larynx
+
+
+
condition_occurrence
+
condition_concept_id
+
22281
+
212
+
Sickle cell-hemoglobin SS disease
+
+
+
condition_occurrence
+
condition_concept_id
+
22350
+
5
+
Edema of larynx
+
+
+
condition_occurrence
+
condition_concept_id
+
22492
+
5
+
Foreign body in pharynx
+
+
+
condition_occurrence
+
condition_concept_id
+
22557
+
13
+
Malignant tumor of submandibular gland
+
+
+
condition_occurrence
+
condition_concept_id
+
22955
+
28
+
Perforation of esophagus
+
+
+
condition_occurrence
+
condition_concept_id
+
23034
+
153
+
Neonatal hypoglycemia
+
+
+
condition_occurrence
+
condition_concept_id
+
23220
+
28
+
Chronic tonsillitis
+
+
+
condition_occurrence
+
condition_concept_id
+
23325
+
58
+
Heartburn
+
+
+
condition_occurrence
+
condition_concept_id
+
23986
+
39
+
Disorder of pituitary gland
+
+
+
condition_occurrence
+
condition_concept_id
+
24006
+
42
+
Sickle cell-hemoglobin C disease
+
+
+
condition_occurrence
+
condition_concept_id
+
24134
+
150
+
Neck pain
+
+
+
condition_occurrence
+
condition_concept_id
+
24148
+
33
+
Congenital diverticulum of pharynx
+
+
+
condition_occurrence
+
condition_concept_id
+
24609
+
226
+
Hypoglycemia
+
+
+
condition_occurrence
+
condition_concept_id
+
24660
+
28
+
Acute tonsillitis
+
+
+
condition_occurrence
+
condition_concept_id
+
24818
+
7
+
Injury of neck
+
+
+
condition_occurrence
+
condition_concept_id
+
24909
+
17
+
Hereditary spherocytosis
+
+
+
condition_occurrence
+
condition_concept_id
+
24966
+
49
+
Esophageal varices
+
+
+
condition_occurrence
+
condition_concept_id
+
24974
+
5
+
Stenosis of larynx
+
+
+
condition_occurrence
+
condition_concept_id
+
25189
+
27
+
Malignant tumor of oral cavity
+
+
+
condition_occurrence
+
condition_concept_id
+
25518
+
231
+
Sickle cell trait
+
+
+
condition_occurrence
+
condition_concept_id
+
25572
+
5
+
Disorder of salivary gland
+
+
+
condition_occurrence
+
condition_concept_id
+
25582
+
24
+
Tracheoesophageal fistula
+
+
+
condition_occurrence
+
condition_concept_id
+
25844
+
8
+
Ulcer of esophagus
+
+
+
condition_occurrence
+
condition_concept_id
+
26052
+
28
+
Primary malignant neoplasm of larynx
+
+
+
condition_occurrence
+
condition_concept_id
+
26141
+
5
+
Barrett's esophagus with esophagitis
+
+
+
condition_occurrence
+
condition_concept_id
+
26727
+
46
+
Hematemesis
+
+
+
condition_occurrence
+
condition_concept_id
+
26942
+
83
+
Hemoglobin SS disease with crisis
+
+
+
condition_occurrence
+
condition_concept_id
+
27674
+
183
+
Nausea and vomiting
+
+
+
⋮
+
⋮
+
⋮
+
⋮
+
⋮
+
+
+
specimen
+
specimen_concept_id
+
40490358
+
21
+
Specimen from skin obtained by scraping
+
+
+
specimen
+
specimen_concept_id
+
40490923
+
10
+
Foreign body submitted as specimen
+
+
+
specimen
+
specimen_concept_id
+
40490924
+
11
+
Urine specimen from urinary conduit
+
+
+
specimen
+
specimen_concept_id
+
43021080
+
5
+
Swab from lower limb
+
+
+
specimen
+
specimen_concept_id
+
43021097
+
12
+
Swab from pharynx
+
+
+
specimen
+
specimen_concept_id
+
43021144
+
14
+
Central venous catheter tip submitted as specimen
+
+
+
specimen
+
specimen_concept_id
+
43021146
+
22
+
Arterial line tip submitted as specimen
+
+
+
specimen
+
specimen_concept_id
+
44783230
+
14
+
Urine specimen obtained via suprapubic indwelling urinary catheter
+
+
+
specimen
+
specimen_concept_id
+
44784239
+
22
+
First stream urine sample
+
+
+
specimen
+
specimen_concept_id
+
45766301
+
16
+
Arterial cord blood specimen
+
+
+
specimen
+
specimen_concept_id
+
45766302
+
13
+
Venous cord blood specimen
+
+
+
specimen
+
specimen_concept_id
+
46270252
+
69
+
Specimen from bronchus obtained by endobronchial biopsy
+
+
+
specimen
+
specimen_concept_id
+
46273457
+
5
+
Brain cyst fluid sample
+
+
+
specimen
+
specimen_type_concept_id
+
32817
+
182136
+
EHR
+
+
+
specimen
+
unit_concept_id
+
0
+
182136
+
No matching concept
+
+
+
visit_occurrence
+
admitting_source_concept_id
+
0
+
9795
+
No matching concept
+
+
+
visit_occurrence
+
admitting_source_concept_id
+
8602
+
26
+
Temporary Lodging
+
+
+
visit_occurrence
+
admitting_source_concept_id
+
8717
+
94
+
Inpatient Hospital
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
0
+
164
+
No matching concept
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8536
+
9543
+
Home
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8602
+
37
+
Temporary Lodging
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8615
+
16
+
Assisted Living Facility
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8717
+
128
+
Inpatient Hospital
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8882
+
14
+
Adult Living Care Facility
+
+
+
visit_occurrence
+
discharge_to_concept_id
+
8971
+
12
+
Inpatient Psychiatric Facility
+
+
+
visit_occurrence
+
visit_concept_id
+
262
+
918
+
Emergency Room and Inpatient Visit
+
+
+
visit_occurrence
+
visit_concept_id
+
9201
+
5525
+
Inpatient Visit
+
+
+
visit_occurrence
+
visit_concept_id
+
9203
+
3472
+
Emergency Room Visit
+
+
+
visit_occurrence
+
visit_source_concept_id
+
NA
+
9915
+
NA
+
+
+
visit_occurrence
+
visit_type_concept_id
+
32817
+
9915
+
EHR
+
+
+
+
+
+
+
+
Figures
+
The plots below are based on the frequencies of concepts in clinical tables.
+
Null values and the following special concepts have been ignored: - 0: Used when there is no matching concept between the source value and the standard defined by OMOP. - 32817: EHR, indicating that the source of the information is the EHR system.
+
+
+Code
+
# FIGURES
+
+# General options
+options(repr.plot.width=12)
+
+# Load data (or reuse data frame)
+#plot_df_all <- concept_freq
+plot_df_all <-read_csv(file =here(OUTPUT_PATH), col_types ="cciic")
+
+# Ignore null, 0, and EHR (32817)
+plot_df <- plot_df_all |>
+filter(concept >0) |># Ignore nulls and No matching concept
+filter(concept !=32817) # Ignore concept "EHR"
+
+
+
+
Top concepts
+
See Figure 1 for concepts appearing the most often in all clinical tables.
+
Since the table measurement contains a much larger number of records than other clinical tables, the concepts with the higher frequency mostly come from it.
+
+
+Code
+
plot_df_all |>
+freq_bar_plot(fill="#009CDB")
+
+
+
+
+
+
+
+
+Figure 1: Top 30 concepts with the higher frequency
+
+
+
+
+
+
+
+
Top measurements
+
See Figure 2 for the measurements recorded the most often.
+
This information is taken from table measurement, column measurement_concept_id.