-
Notifications
You must be signed in to change notification settings - Fork 226
Add regtests for Spark client to test built jars #1402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
243b79a
ad retest setup
gh-yzou 8f9d3ac
add change
gh-yzou 177605c
fix creadential
gh-yzou d7e469e
add docker
gh-yzou 802d03e
add regtests for built jars
gh-yzou 076a156
add comments
gh-yzou d43e8bc
remove unncessary build
gh-yzou ad76d5b
add back
gh-yzou 22e7594
simplify jars
gh-yzou a6f1374
remove dead code
gh-yzou a7f52b3
address feedback
gh-yzou e12453c
update CI name to Spark Client Regression Test
gh-yzou 59aff04
address feedback
gh-yzou 3463811
update comment
gh-yzou b517f30
udpate grammer
gh-yzou 075b4c0
address comments
gh-yzou 5dc9a1e
remove credentials
gh-yzou 9c760eb
remove cloud specific config
gh-yzou 0b752cf
remove unused credential
gh-yzou fd1d61e
remove aws env var
gh-yzou 19360ea
add change
gh-yzou e19d178
remove credential from dcker
gh-yzou 652a2db
remove aws setting
gh-yzou 8f3074c
add comment
gh-yzou f538f95
fix comment
gh-yzou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
|
||
name: Spark Client Regression Tests | ||
on: | ||
push: | ||
branches: [ "main" ] | ||
pull_request: | ||
branches: [ "main" ] | ||
|
||
jobs: | ||
regtest: | ||
|
||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Set up JDK 21 | ||
uses: actions/setup-java@v4 | ||
with: | ||
java-version: '21' | ||
distribution: 'temurin' | ||
|
||
- name: Fix permissions | ||
run: mkdir -p regtests/output && chmod 777 regtests/output && chmod 777 regtests/t_*/ref/* | ||
|
||
- name: Project build without testing | ||
run: ./gradlew assemble | ||
|
||
- name: Image build | ||
run: | | ||
./gradlew \ | ||
:polaris-quarkus-server:assemble \ | ||
:polaris-quarkus-server:quarkusAppPartsBuild --rerun \ | ||
-Dquarkus.container-image.build=true | ||
|
||
# NOTE: the regression test runs with spark 3.5.5 and scala 2.12 in Java 17. We also have integration | ||
# tests runs with the existing gradle.yml, which only runs on Java 21. Since spark Java compatibility | ||
# for 3.5 is 8, 11, and 17, we should run spark client with those compatible java versions. | ||
# TODO: add separate spark client CI and run with Java 8, 11 and 17. | ||
- name: Regression Test | ||
run: | | ||
docker compose -f plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
|
||
FROM docker.io/apache/spark:3.5.5-java17 | ||
ARG POLARIS_HOST=polaris | ||
ENV POLARIS_HOST=$POLARIS_HOST | ||
ENV SPARK_HOME=/opt/spark | ||
ENV CURRENT_SCALA_VERSION='2.12' | ||
ENV LANGUAGE='en_US:en' | ||
|
||
USER root | ||
RUN apt update | ||
RUN apt-get install -y diffutils wget curl | ||
RUN mkdir -p /home/spark && \ | ||
chown -R spark /home/spark && \ | ||
mkdir -p /tmp/polaris-regtests && \ | ||
chown -R spark /tmp/polaris-regtests | ||
RUN mkdir /opt/spark/conf && chmod -R 777 /opt/spark/conf | ||
|
||
USER spark | ||
|
||
WORKDIR /home/spark/polaris | ||
|
||
COPY --chown=spark ./v3.5 /home/spark/polaris/v3.5 | ||
|
||
# /home/spark/regtests might not be writable in all situations, see https://github.com/apache/polaris/pull/205 | ||
USER root | ||
RUN chmod -R go+rwx /home/spark/polaris | ||
RUN chmod -R 777 ./v3.5/regtests | ||
USER spark | ||
|
||
ENTRYPOINT ["./v3.5/regtests/run.sh"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
<!-- | ||
|
||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
|
||
--> | ||
|
||
# End-to-end regression tests | ||
|
||
regtests provides basic end-to-end tests for spark_sql using spark client jars. | ||
|
||
Regression tests are either run in Docker, using docker-compose to orchestrate the tests, or | ||
locally. | ||
|
||
**NOTE** regtests are supposed to be a light-weight testing to ensure jars can be used to start | ||
spark and run basic SQL commands. Please use integration for detailed testing. | ||
|
||
## Prerequisites | ||
|
||
It is recommended to clean the `regtests/output` directory before running tests. This can be done by | ||
running: | ||
|
||
```shell | ||
rm -rf ./plugins/spark/v3.5/regtests/output && mkdir -p ./plugins/spark/v3.5/regtests/output && chmod -R 777 ./plugins/spark/v3.5/regtests/output | ||
``` | ||
|
||
## Run Tests With Docker Compose | ||
|
||
Tests can be run with docker-compose using the provided `./plugins/spark/v3.5/regtests/docker-compose.yml` file, as | ||
follows: | ||
|
||
```shell | ||
./gradlew build | ||
./gradlew \ | ||
:polaris-quarkus-server:assemble \ | ||
:polaris-quarkus-server:quarkusAppPartsBuild --rerun \ | ||
-Dquarkus.container-image.build=true | ||
docker compose -f ./plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest | ||
``` | ||
|
||
In this setup, a Polaris container will be started in a docker-compose group, using the image | ||
previously built by the Gradle build. Then another container, including a Spark SQL shell, will run | ||
the tests. The exit code will be the same as the exit code of the Spark container. | ||
**NOTE** Docker compose only support testing with scala 2.12, because no scala 2.13 image is available | ||
for spark 3.5. Scala 2.13 will be supported for Spark 4.0. | ||
|
||
This is the flow used in CI and should be done locally before pushing to GitHub to ensure that no | ||
environmental factors contribute to the outcome of the tests. | ||
|
||
**Important**: if you are also using minikube, for example to test the Helm chart, you may need to | ||
_unset_ the Docker environment that was pointing to the Minikube Docker daemon, otherwise the image | ||
will be built by the Minikube Docker daemon and will not be available to the local Docker daemon. | ||
This can be done by running, _before_ building the image and running the tests: | ||
|
||
```shell | ||
eval $(minikube -p minikube docker-env --unset) | ||
``` | ||
|
||
## Run Tests Locally | ||
|
||
Regression tests can be run locally as well, using the test harness. For local testing, both | ||
Scala 2.12 and Scala 2.13 are supported. | ||
|
||
To run regression tests locally, run the following: | ||
- `./gradlew build` -- build the Polaris project and Spark Client jars. | ||
- `./gradlew run` -- start a Polaris server on localhost:8181. | ||
- `env POLARIS_HOST=localhost ./plugins/spark/v3.5/regtests/run.sh` -- run regtests. | ||
|
||
Note: the regression tests expect Polaris to run with certain options, e.g. with support for `FILE` | ||
storage, default realm `POLARIS` and root credentials `root:secret`; if you run the above command, | ||
this will be the case. If you run Polaris in a different way, make sure that Polaris is configured | ||
appropriately. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
|
||
services: | ||
polaris: | ||
image: apache/polaris:latest | ||
ports: | ||
- "8181" | ||
- "8182" | ||
environment: | ||
AWS_REGION: us-west-2 | ||
POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,secret | ||
quarkus.log.file.enable: "false" | ||
quarkus.otel.sdk.disabled: "true" | ||
healthcheck: | ||
test: ["CMD", "curl", "http://localhost:8182/q/health"] | ||
interval: 10s | ||
timeout: 10s | ||
retries: 5 | ||
regtest: | ||
build: | ||
context: ../.. | ||
dockerfile: v3.5/regtests/Dockerfile | ||
args: | ||
POLARIS_HOST: polaris | ||
depends_on: | ||
polaris: | ||
condition: service_healthy | ||
volumes: | ||
- ./output:/tmp/polaris-regtests/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
#!/bin/bash | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# Run without args to run all tests. | ||
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) | ||
SPARK_ROOT_DIR=$(dirname ${SCRIPT_DIR}) | ||
export SPARK_LOCAL_HOSTNAME=localhost # avoid VPN messing up driver local IP address binding | ||
|
||
FMT_RED='\033[0;31m' | ||
FMT_GREEN='\033[0;32m' | ||
FMT_NC='\033[0m' | ||
|
||
function loginfo() { | ||
echo "$(date): ${@}" | ||
} | ||
function loggreen() { | ||
echo -e "${FMT_GREEN}$(date): ${@}${FMT_NC}" | ||
} | ||
function logred() { | ||
echo -e "${FMT_RED}$(date): ${@}${FMT_NC}" | ||
} | ||
|
||
# Allow bearer token to be provided if desired | ||
if [[ -z "$REGTEST_ROOT_BEARER_TOKEN" ]]; then | ||
if ! output=$(curl -X POST -H "Polaris-Realm: POLARIS" "http://${POLARIS_HOST:-localhost}:8181/api/catalog/v1/oauth/tokens" \ | ||
-d "grant_type=client_credentials" \ | ||
-d "client_id=root" \ | ||
-d "client_secret=secret" \ | ||
-d "scope=PRINCIPAL_ROLE:ALL"); then | ||
logred "Error: Failed to retrieve bearer token" | ||
exit 1 | ||
fi | ||
|
||
token=$(echo "$output" | awk -F\" '{print $4}') | ||
|
||
if [ "$token" == "unauthorized_client" ]; then | ||
logred "Error: Failed to retrieve bearer token" | ||
exit 1 | ||
fi | ||
|
||
export REGTEST_ROOT_BEARER_TOKEN=$token | ||
fi | ||
|
||
echo "Root bearer token: ${REGTEST_ROOT_BEARER_TOKEN}" | ||
|
||
NUM_FAILURES=0 | ||
|
||
SCALA_VERSIONS=("2.12" "2.13") | ||
if [[ -n "$CURRENT_SCALA_VERSION" ]]; then | ||
SCALA_VERSIONS=("${CURRENT_SCALA_VERSION}") | ||
fi | ||
SPARK_MAJOR_VERSION="3.5" | ||
SPARK_VERSION="3.5.5" | ||
|
||
for SCALA_VERSION in "${SCALA_VERSIONS[@]}"; do | ||
echo "RUN REGRESSION TEST FOR SPARK_MAJOR_VERSION=${SPARK_MAJOR_VERSION}, SPARK_VERSION=${SPARK_VERSION}, SCALA_VERSION=${SCALA_VERSION}" | ||
# find the project jar | ||
SPARK_DIR=${SPARK_ROOT_DIR}/spark | ||
JAR_PATH=$(find ${SPARK_DIR} -name "polaris-iceberg-*.*-spark-runtime-${SPARK_MAJOR_VERSION}_${SCALA_VERSION}-*.jar" -print -quit) | ||
echo "find jar ${JAR_PATH}" | ||
|
||
SPARK_EXISTS="TRUE" | ||
if [ -z "${SPARK_HOME}" ]; then | ||
SPARK_EXISTS="FALSE" | ||
fi | ||
|
||
source ${SCRIPT_DIR}/setup.sh --sparkVersion ${SPARK_VERSION} --scalaVersion ${SCALA_VERSION} --jar ${JAR_PATH} | ||
|
||
# run the spark_sql test | ||
loginfo "Starting test spark_sql.sh" | ||
|
||
TEST_FILE="spark_sql.sh" | ||
TEST_SHORTNAME="spark_sql" | ||
TEST_TMPDIR="/tmp/polaris-spark-regtests/${TEST_SHORTNAME}_${SPARK_MAJOR_VERSION}_${SCALA_VERSION}" | ||
TEST_STDERR="${TEST_TMPDIR}/${TEST_SHORTNAME}.stderr" | ||
TEST_STDOUT="${TEST_TMPDIR}/${TEST_SHORTNAME}.stdout" | ||
|
||
mkdir -p ${TEST_TMPDIR} | ||
if (( ${VERBOSE} )); then | ||
${SCRIPT_DIR}/${TEST_FILE} 2>${TEST_STDERR} | grep -v 'loading settings' | tee ${TEST_STDOUT} | ||
else | ||
${SCRIPT_DIR}/${TEST_FILE} 2>${TEST_STDERR} | grep -v 'loading settings' > ${TEST_STDOUT} | ||
fi | ||
loginfo "Test run concluded for ${TEST_SUITE}:${TEST_SHORTNAME}" | ||
|
||
TEST_REF="$(realpath ${SCRIPT_DIR})/${TEST_SHORTNAME}.ref" | ||
if cmp --silent ${TEST_STDOUT} ${TEST_REF}; then | ||
loggreen "Test SUCCEEDED: ${TEST_SUITE}:${TEST_SHORTNAME}" | ||
else | ||
logred "Test FAILED: ${TEST_SUITE}:${TEST_SHORTNAME}" | ||
echo '#!/bin/bash' > ${TEST_TMPDIR}/${TEST_SHORTNAME}.fixdiffs.sh | ||
echo "meld ${TEST_STDOUT} ${TEST_REF}" >> ${TEST_TMPDIR}/${TEST_SHORTNAME}.fixdiffs.sh | ||
chmod 750 ${TEST_TMPDIR}/${TEST_SHORTNAME}.fixdiffs.sh | ||
logred "To compare and fix diffs (if 'meld' installed): ${TEST_TMPDIR}/${TEST_SHORTNAME}.fixdiffs.sh" | ||
logred "Or manually diff: diff ${TEST_STDOUT} ${TEST_REF}" | ||
logred "See stderr from test run for additional diagnostics: ${TEST_STDERR}" | ||
diff ${TEST_STDOUT} ${TEST_REF} | ||
NUM_FAILURES=$(( NUM_FAILURES + 1 )) | ||
fi | ||
|
||
# clean up | ||
if [ "${SPARK_EXISTS}" = "FALSE" ]; then | ||
rm -rf ${SPARK_HOME} | ||
export SPARK_HOME="" | ||
fi | ||
done | ||
|
||
# clean the output dir | ||
rm -rf ${SCRIPT_DIR}/output | ||
|
||
loginfo "Tests completed with ${NUM_FAILURES} failures" | ||
if (( ${NUM_FAILURES} > 0 )); then | ||
exit 1 | ||
else | ||
exit 0 | ||
fi |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are running java 21 in CI why are pulling java 17 here then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mainly follows how existing regtests works today, it pulls an existing spark image, from the published spark docker image here https://hub.docker.com/r/apache/spark/tags, it seems we only have java 11 and java 17 for spark 3.5.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets then make the CI use 17 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter what CI uses since we are starting in the docker? and the CI does more than just running spark docker, it also starts container with server, which does Polaris project build and starts Polaris server, the Polaris server will use java 21. Further more, i would like to be consistent with how the current regtests CI looks like today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the jvm in CI are marked as stamp of approval that this works for this jvm version both runtime and compile wise, building these in JVM requires special flags as per JVM version.
Saying this from my past experience of upgrading Iceberg with JDK 17 : apache/iceberg#7391
if we can only assert polaris spark client with JDK 17, then we should have only JDK 17
If we are saying polaris-spark client will work on java 21 (both build and runtime) then i think this is fine to have JDK 21, (though i would like to understand more on how this is guaranteed, when spark doesn't do that ?)
prev tests were not building spark client of their own, they were using it, just asserting the Polaris is ok to work with 21 so 21 in CI made sense.
Atleast thats my read from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regression test starts two containers, one for Polaris server, and anther one for spark. The Polaris server build and run will all be with Java21, the CI does server image build, so it needs to be with java 21 since that is what the server supports. The spark image is directly downloaded from docker, and docker can manage java version and dependency separately, so it doesn't needs to be consistent with the CI java used.
So we have our polaris server container runs with java21, and our spark container runs with java18, this is actually the closet setup to how the server client env setup and jar release happens for a user.
The iceberg ci setup is different, it is mostly just integration tests, which directly builds and runs with the CI env, not docker container. Our integration tests are all running with Java 21 in another CI today, so basically we have test that covers the case that our spark client works with both java 17 and java 21. We will need to add better CI support for different java version with integration tests later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, will add CI to run regression tests on different Java versions, at least 11 and 17