- Set up an i2b2 project with the ACT ontology.
Option A. Download the Synthea-i2b2 63k sample and load this into your i2b2 project.
Option B. Load a Synthea dataset and convert into i2b2 format.
- All data sets (1k, COVID 10k, COVID 100k) have been verified to work EXCEPT the 100k patients in the large SyntheticMass Version 2 download. This version needs an extra step to delete invalid records before import. (Details coming soon.)
- Run
create_synthea_table_[dbtype].sql
in your project to create the Synthea tables, for your database platform. - Import the Synthea data you downloaded in step one into the Synthea tables in your project.
- Load the i2b2-to-SNOMED table in this repository into your project. https://www.nlm.nih.gov/healthit/snomedct/us_edition.html
- Click on the "Download SNOMED-CT to ICD-10-CM Mapping Resources" link to download. (You will need a UMLS account.)
- Unzip the file
- Import the TSV file into a table called SNOMED_to_ICD10 in your database.
- In Postgres and Oracle, follow the additional instructions in the comments at the top of
synthea_to_i2b2_[dbtype].sql
to clean up the date formatting. - Run
synthea_to_i2b2_[dbtype].sql
to convert synthea data into i2b2 tables (this will truncate your existing fact and dimension tables!)- In MSSQL, replace references to
i2b2metadata.dbo
in the script. Use the database and schema where your ACT ontology tables are.
- In MSSQL, replace references to