Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible for the fhir-data-pipes to sink directly into a Data Warehouse e.g. Google BigQuery? #1191

Open
muhammad-levi opened this issue Sep 19, 2024 · 3 comments

Comments

@muhammad-levi
Copy link
Collaborator

Instead of:
fhir-data-pipes -> Google Healthcare API FHIR Store -> Google BigQuery

It will be like:
fhir-data-pipes -> Google BigQuery

As also suggested in this diagram
image

"Data Loaders" includes fhir-data-pipes.

@bashir2
Copy link
Collaborator

bashir2 commented Sep 19, 2024

Actually this feature is the long standing issue #455, i.e., adding BigQuery as a sink option. It should not be too hard to add this and I think it is a useful feature. The main reason we have not implemented it yet is that we have not heard much demand for it from our partners. If this is a useful feature for you and you can contribute for implementing it, I am willing to help.

Side note 1: We have actually done some work in #454 to make the resulting schema similar to the BigQuery schema of GCP FHIR store -> BigQuery flow.

Side note 2: You can import Parquet files into BigQuery; that's how the comparisons in #454 was done.

@muhammad-levi
Copy link
Collaborator Author

muhammad-levi commented Sep 20, 2024

@bashir2 I see. Initially I was thinking of using the JDBC driver for BigQuery and try and create a sample JDBC URL config for BigQuery in the DatabaseConfiguration

return String.format(
// For the TLS see: https://stackoverflow.com/questions/67332909
"jdbc:%s://%s:%s/%s?enabledTLSProtocols=TLSv1.2",
getDatabaseService(), getDatabaseHostName(), getDatabasePort(), getDatabaseName());
}

and then make use of the sinkDbConfigPath config property.

# The configuration file for the sink database. If `viewDefinitionsDir` is set
# then the generated views are materialized and written to this DB. If not,
# then the raw FHIR JSON resources are written to this DB. Note enabling this
# feature can have a noticeable impact on pipelines performance. The default
# empty string disables this feature.
sinkDbConfigPath: "config/hapi-postgres-config_local_views.json"

References

@bashir2
Copy link
Collaborator

bashir2 commented Sep 20, 2024

@muhammad-levi your JDBC based idea can work but since we use Beam for our pipeline, I would first consider BigQueryIO; it is usually better to rely on Beam IOs when it is possible. That said, there are reasons not to use them; for example, in some places, we don't use ParquetIO for creating Parquet files (mostly because of Flink's memory overhead in the single-machine mode).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants