Is it possible for the `fhir-data-pipes` to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

muhammad-levi · 2024-09-19T08:57:05Z

Instead of:
fhir-data-pipes -> Google Healthcare API FHIR Store -> Google BigQuery

It will be like:
fhir-data-pipes -> Google BigQuery

As also suggested in this diagram

"Data Loaders" includes fhir-data-pipes.

The text was updated successfully, but these errors were encountered:

bashir2 · 2024-09-19T18:26:19Z

Actually this feature is the long standing issue #455, i.e., adding BigQuery as a sink option. It should not be too hard to add this and I think it is a useful feature. The main reason we have not implemented it yet is that we have not heard much demand for it from our partners. If this is a useful feature for you and you can contribute for implementing it, I am willing to help.

Side note 1: We have actually done some work in #454 to make the resulting schema similar to the BigQuery schema of GCP FHIR store -> BigQuery flow.

Side note 2: You can import Parquet files into BigQuery; that's how the comparisons in #454 was done.

muhammad-levi · 2024-09-20T02:05:57Z

@bashir2 I see. Initially I was thinking of using the JDBC driver for BigQuery and try and create a sample JDBC URL config for BigQuery in the DatabaseConfiguration

fhir-data-pipes/pipelines/common/src/main/java/com/google/fhir/analytics/model/DatabaseConfiguration.java

Lines 58 to 62 in dc70755

    
             return String.format( 
        
                 // For the TLS see: https://stackoverflow.com/questions/67332909 
        
                 "jdbc:%s://%s:%s/%s?enabledTLSProtocols=TLSv1.2", 
        
                 getDatabaseService(), getDatabaseHostName(), getDatabasePort(), getDatabaseName()); 
        
           }

and then make use of the sinkDbConfigPath config property.

fhir-data-pipes/pipelines/controller/config/application.yaml

Lines 168 to 173 in dc70755

    
           # The configuration file for the sink database. If `viewDefinitionsDir` is set 
        
           # then the generated views are materialized and written to this DB. If not, 
        
           # then the raw FHIR JSON resources are written to this DB. Note enabling this 
        
           # feature can have a noticeable impact on pipelines performance. The default 
        
           # empty string disables this feature. 
        
           sinkDbConfigPath: "config/hapi-postgres-config_local_views.json"

References

bashir2 · 2024-09-20T18:05:23Z

@muhammad-levi your JDBC based idea can work but since we use Beam for our pipeline, I would first consider BigQueryIO; it is usually better to rely on Beam IOs when it is possible. That said, there are reasons not to use them; for example, in some places, we don't use ParquetIO for creating Parquet files (mostly because of Flink's memory overhead in the single-machine mode).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible for the `fhir-data-pipes` to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

Is it possible for the `fhir-data-pipes` to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

muhammad-levi commented Sep 19, 2024

bashir2 commented Sep 19, 2024

muhammad-levi commented Sep 20, 2024 •

edited

Loading

bashir2 commented Sep 20, 2024

Is it possible for the fhir-data-pipes to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

Is it possible for the fhir-data-pipes to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

Comments

muhammad-levi commented Sep 19, 2024

bashir2 commented Sep 19, 2024

muhammad-levi commented Sep 20, 2024 • edited Loading

References

bashir2 commented Sep 20, 2024

Is it possible for the `fhir-data-pipes` to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

Is it possible for the `fhir-data-pipes` to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

muhammad-levi commented Sep 20, 2024 •

edited

Loading