Skip to content

Commit

Permalink
[SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent data files t…
Browse files Browse the repository at this point in the history
…o data directory

### What changes were proposed in this pull request?

This PR proposes to move several data files used for PySpark artifact tests from `connector/connect/common/src/test/resources/artifact-tests`, added in apache#40368, to `data` directory.

### Why are the changes needed?

PySpark tests should better not depend on Spark's test package build. This PR decouples it.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR should verify it.

Closes apache#41510 from HyukjinKwon/SPARK-43612-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
  • Loading branch information
HyukjinKwon authored and zhengruifeng committed Jun 8, 2023
1 parent f1cca85 commit 3cae38b
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 3 deletions.
5 changes: 5 additions & 0 deletions data/artifact-tests/crc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The CRCs for a specific file are stored in a text file with the same name (excluding the original extension).

The CRCs are calculated for data chunks of `32768 bytes` (individual CRCs) and are newline delimited.

The CRCs were calculated using https://simplycalc.com/crc32-file.php
12 changes: 12 additions & 0 deletions data/artifact-tests/crc/junitLargeJar.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
902183889
2415704507
1084811487
1951510
1158852476
2003120166
3026803842
3850244775
3409267044
652109216
104029242
3019434266
1 change: 1 addition & 0 deletions data/artifact-tests/crc/smallJar.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1631702900
Binary file added data/artifact-tests/junitLargeJar.jar
Binary file not shown.
Binary file added data/artifact-tests/smallJar.jar
Binary file not shown.
4 changes: 1 addition & 3 deletions python/pyspark/sql/tests/connect/client/test_artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,7 @@ class ArtifactTests(ReusedConnectTestCase):
def setUpClass(cls):
super(ArtifactTests, cls).setUpClass()
cls.artifact_manager: ArtifactManager = cls.spark._client._artifact_manager
cls.base_resource_dir = os.path.join(
SPARK_HOME, "connector", "connect", "common", "src", "test", "resources"
)
cls.base_resource_dir = os.path.join(SPARK_HOME, "data")
cls.artifact_file_path = os.path.join(
cls.base_resource_dir,
"artifact-tests",
Expand Down

0 comments on commit 3cae38b

Please sign in to comment.