Skip to content

Commit

Permalink
Adds successful dbt run
Browse files Browse the repository at this point in the history
  • Loading branch information
ZackLarsen committed Aug 26, 2023
1 parent bec783f commit e870810
Show file tree
Hide file tree
Showing 11 changed files with 548 additions and 136 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
*.duckdb
*.parquet
*.tar.gz
*.zip
synthea_1m_fhir_3_0_May_24/
seeds/

*.open
*.open.wal
*.user.yml

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
66 changes: 65 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,51 @@ ref() is, under the hood, actually doing two important things. First, it is inte

## Data transformation best practices

dbt has some recommended best practices for transforming data. The data model is divided into different layers, each of which has certain operations applied to it. The layers and their associated operations are:
dbt has some recommended best practices for transforming data. The data model is divided into different layers, each of which has certain operations applied to it. The layers and their associated operations are shown below as a directory tree and in more detail in the following sections.

```
jaffle_shop
├── README.md
├── analyses
├── seeds
│ └── employees.csv
├── dbt_project.yml
├── macros
│ └── cents_to_dollars.sql
├── models
│ ├── intermediate
│ │ └── finance
│ │ ├── _int_finance__models.yml
│ │ └── int_payments_pivoted_to_orders.sql
│ ├── marts
│ │ ├── finance
│ │ │ ├── _finance__models.yml
│ │ │ ├── orders.sql
│ │ │ └── payments.sql
│ │ └── marketing
│ │ ├── _marketing__models.yml
│ │ └── customers.sql
│ ├── staging
│ │ ├── jaffle_shop
│ │ │ ├── _jaffle_shop__docs.md
│ │ │ ├── _jaffle_shop__models.yml
│ │ │ ├── _jaffle_shop__sources.yml
│ │ │ ├── base
│ │ │ │ ├── base_jaffle_shop__customers.sql
│ │ │ │ └── base_jaffle_shop__deleted_customers.sql
│ │ │ ├── stg_jaffle_shop__customers.sql
│ │ │ └── stg_jaffle_shop__orders.sql
│ │ └── stripe
│ │ ├── _stripe__models.yml
│ │ ├── _stripe__sources.yml
│ │ └── stg_stripe__payments.sql
│ └── utilities
│ └── all_dates.sql
├── packages.yml
├── snapshots
└── tests
└── assert_positive_value_for_total_amount.sql
```

- base
- Optional
Expand Down Expand Up @@ -126,6 +170,22 @@ Load the CSVs with the demo data set. This materializes the CSVs as tables in yo
dbt seed
```

## Note: dbt seed only allows .csv files, and I ran into parsing errors due to the data types, so instead of using the seed command, I used the following Python command in the eda.ipynb notebook to load the data as .parquet files into DuckDB

```python
con = duckdb.connect('synthea.duckdb')

seed_path = Path('./seeds/')

for parquet_file in seed_path.glob('*.parquet'):
con.sql(
f"""
CREATE TABLE IF NOT EXISTS {parquet_file.stem} AS
SELECT * FROM read_parquet('{parquet_file}');
"""
)
```

Run dbt run in your terminal to compile and run your dbt project. This will create a compiled SQL file for your example_model and execute it against your DuckDB database.

```bash
Expand Down Expand Up @@ -179,6 +239,10 @@ LIMIT 10;
- https://docs.getdbt.com/docs/core/connect-data-platform/duckdb-setup
- https://docs.getdbt.com/docs/core/connect-data-platform/duckdb-setup
- https://www.getdbt.com/analytics-engineering/modular-data-modeling-technique/
- https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview
- https://docs.getdbt.com/blog/stakeholder-friendly-model-names
- https://docs.getdbt.com/docs/build/sql-models
- https://docs.getdbt.com/reference/dbt-jinja-functions/ref
- https://github.com/dbt-labs/dbt-learn-demo

## Lineage Graph
Expand Down
6 changes: 6 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,9 @@ models:
materialized: table
staging:
materialized: view

seeds:
synthea: # you must include the project name
raw_careplans:
+column_types:
CODE: varchar(25)
567 changes: 459 additions & 108 deletions eda.ipynb

Large diffs are not rendered by default.

14 changes: 0 additions & 14 deletions models/docs.md

This file was deleted.

10 changes: 5 additions & 5 deletions models/patients.sql
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ patient_encounters as (
encounters.base_encounter_cost,
encounters.total_encounter_cost,
encounters.encounter_payer,
encounters.payer_coverage,
encounters.encounter_payer_coverage,
patients.healthcare_expenses,
patients.healthcare_coverage,

Expand All @@ -61,8 +61,8 @@ patient_medications as (
medications.medication_diag_code,
medications.medication_diag_description,
medications.dispenses,
medications.payer,
medications.payer_coverage,
medications.medication_payer,
medications.medication_payer_coverage,
medications.base_medication_cost,
medications.total_medication_cost,
patients.healthcare_expenses,
Expand Down Expand Up @@ -93,7 +93,7 @@ final as (
patient_encounters.base_encounter_cost,
patient_encounters.total_encounter_cost,
patient_encounters.encounter_payer,
patient_encounters.payer_encounter_coverage,
patient_encounters.encounter_payer_coverage,
patient_medications.medication_start_time,
patient_medications.medication_end_time,
patient_medications.medication_code,
Expand All @@ -102,7 +102,7 @@ final as (
patient_medications.medication_diag_description,
patient_medications.dispenses,
patient_medications.medication_payer,
patient_medications.payer_medication_coverage,
patient_medications.medication_payer_coverage,
patient_medications.base_medication_cost,
patient_medications.total_medication_cost,
patients.healthcare_expenses,
Expand Down
1 change: 0 additions & 1 deletion models/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ models:
description: Healthcare payer for the encounter

- name: encounter_class
description: '{{ doc("encounter_class") }}'
tests:
- accepted_values:
values: ['ambulatory', 'wellness', 'outpatient', 'urgentcare', 'emergency', 'inpatient']
Expand Down
7 changes: 3 additions & 4 deletions models/staging/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ models:
tests:
- not_null
- relationships:
to: ref('patients')
to: ref('stg_patients')
field: patient

- name: organization
Expand All @@ -108,7 +108,6 @@ models:
description: Healthcare payer for the encounter

- name: encounter_class
description: '{{ doc("encounter_class") }}'
tests:
- accepted_values:
values: ['ambulatory', 'wellness', 'outpatient', 'urgentcare', 'emergency', 'inpatient']
Expand Down Expand Up @@ -149,7 +148,7 @@ models:
tests:
- not_null
- relationships:
to: ref('patients')
to: ref('stg_patients')
field: patient

- name: medication_payer
Expand All @@ -160,7 +159,7 @@ models:
- unique
- not_null
- relationships:
to: ref('encounters')
to: ref('stg_encounters')
field: encounter
description: This is a unique identifier for an encounter

Expand Down
2 changes: 1 addition & 1 deletion models/staging/stg_encounters.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
with source as (

select * from {{ ref('raw_encounters') }}
select * from raw_encounters

),

Expand Down
2 changes: 1 addition & 1 deletion models/staging/stg_medications.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
with source as (

select * from {{ ref('raw_medications') }}
select * from raw_medications

),

Expand Down
2 changes: 1 addition & 1 deletion models/staging/stg_patients.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
with source as (

select * from {{ ref('raw_patients') }}
select * from raw_patients

),

Expand Down

0 comments on commit e870810

Please sign in to comment.