-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Update data access tooling to better support distributed querying of big data #475
Closed
8 tasks done
Comments
carter-cundiff
changed the title
Feature: Update data access tooling to better support querying outside of a pipeline and support Hive Metastore v4.0.0
Feature: Update data access tooling to better support querying outside of a pipeline
Nov 19, 2024
carter-cundiff
changed the title
Feature: Update data access tooling to better support querying outside of a pipeline
Feature: Update data access tooling to better support distributed querying of big data
Nov 19, 2024
DOD with @ewilkins-csi, @csun-cpointe |
OTS with @nartieri @ewilkins-csi |
carter-cundiff
added a commit
that referenced
this issue
Nov 22, 2024
carter-cundiff
added a commit
that referenced
this issue
Nov 22, 2024
carter-cundiff
added a commit
that referenced
this issue
Nov 22, 2024
carter-cundiff
added a commit
that referenced
this issue
Nov 22, 2024
carter-cundiff
added a commit
that referenced
this issue
Nov 25, 2024
carter-cundiff
added a commit
that referenced
this issue
Nov 25, 2024
Confirmed deprecation warnings were received. Testing has passed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Currently data access makes use of a GraphQL Quarkus app for accessing data outside of your spark pipeline. GraphQL is not optimized for performing queries against large datasets stored in data lakes. For better performance when accessing your data lake data, GraphQL should be replaced with a tool specifically designed for querying large data lakes (e.g Trino).
DOD
Test Strategy/Script
OTS Only:
Create a downstream project:
test-475-pipeline-models/src/main/resources/pipelines/
directorytest-475-pipeline-models/src/main/resources/dictionaries/
directorytest-475-pipeline-models/src/main/resources/records/
directorymvn clean install
until all the manual actions are completetest-475-deploy/pom.xml
:test-475-pipelines/spark-pipeline/src/main/java/com/test/TestSyncStep.java
:mvn clean install -Dmaven.build.cache.skipCache
to get any remaining manual actionsUpdate the
test-475-deploy/src/main/resources/apps/trino/Chart.yaml
with the following:Continue the build with
mvn clean install -Dmaven.build.cache.skipCache -rf :test-475-deploy
tilt up
spark-pipeline
resource./trino --server http://localhost:8084
tilt down
test-475-pipeline-models/src/main/resources/records/Person.json
on lines 5-7:mvn clean install -Dmaven.build.cache.skipCache
and complete the manual actionsmvn clean install
and verify you see the following warnings about data-access deprecation:References/Additional Context
The text was updated successfully, but these errors were encountered: