Skip to content

Commit

Permalink
Bump Spark to 3.5.4
Browse files Browse the repository at this point in the history
  • Loading branch information
msmygit committed Dec 26, 2024
1 parent 3a96d3b commit 2818d69
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 8 deletions.
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ RUN mkdir -p /assets/ && cd /assets && \
curl -OL https://downloads.datastax.com/enterprise/cqlsh-astra.tar.gz && \
tar -xzf ./cqlsh-astra.tar.gz && \
rm ./cqlsh-astra.tar.gz && \
curl -OL https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz && \
tar -xzf ./spark-3.5.3-bin-hadoop3-scala2.13.tgz && \
rm ./spark-3.5.3-bin-hadoop3-scala2.13.tgz
curl -OL https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3-scala2.13.tgz && \
tar -xzf ./spark-3.5.4-bin-hadoop3-scala2.13.tgz && \
rm ./spark-3.5.4-bin-hadoop3-scala2.13.tgz

RUN apt-get update && apt-get install -y openssh-server vim python3 --no-install-recommends && \
rm -rf /var/lib/apt/lists/* && \
Expand Down Expand Up @@ -44,7 +44,7 @@ RUN chmod +x ./get-latest-maven-version.sh && \
rm -rf "$USER_HOME_DIR/.m2"

# Add all migration tools to path
ENV PATH="${PATH}:/assets/dsbulk/bin/:/assets/cqlsh-astra/bin/:/assets/spark-3.5.3-bin-hadoop3-scala2.13/bin/"
ENV PATH="${PATH}:/assets/dsbulk/bin/:/assets/cqlsh-astra/bin/:/assets/spark-3.5.4-bin-hadoop3-scala2.13/bin/"

EXPOSE 22

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
Migrate and Validate Tables between Origin and Target Cassandra Clusters.

> [!IMPORTANT]
> Please note this job has been tested with spark version [3.5.3](https://archive.apache.org/dist/spark/spark-3.5.3/)
> Please note this job has been tested with spark version [3.5.4](https://archive.apache.org/dist/spark/spark-3.5.4/)
## Install as a Container
- Get the latest image that includes all dependencies from [DockerHub](https://hub.docker.com/r/datastax/cassandra-data-migrator)
Expand All @@ -20,14 +20,14 @@ Migrate and Validate Tables between Origin and Target Cassandra Clusters.
### Prerequisite
- **Java11** (minimum) as Spark binaries are compiled with it.
- **Spark `3.5.x` with Scala `2.13` and Hadoop `3.3`**
- Typically installed using [this binary](https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz) on a single VM (no cluster necessary) where you want to run this job. This simple setup is recommended for most one-time migrations.
- Typically installed using [this binary](https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3-scala2.13.tgz) on a single VM (no cluster necessary) where you want to run this job. This simple setup is recommended for most one-time migrations.
- However we recommend using a Spark Cluster or a Spark Serverless platform like `Databricks` or `Google Dataproc` (that supports the above mentioned versions) for large (e.g. several terabytes) complex migrations OR when CDM is used as a long-term data-transfer utility and not a one-time job.

Spark can be installed by running the following: -

```
wget https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz
tar -xvzf spark-3.5.3-bin-hadoop3-scala2.13.tgz
wget https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3-scala2.13.tgz
tar -xvzf spark-3.5.4-bin-hadoop3-scala2.13.tgz
```

> [!CAUTION]
Expand Down
3 changes: 3 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Release Notes

## [5.2.0] - 2025-xx-xx
- Upgraded to use Spark `3.5.4`.

## [5.1.4] - 2024-12-04
- Bug fix: Any run started with a `previousRunId` that is not found in the `cdm_run_info` table (for whatever reason), will be executed as a fresh new run instead of doing nothing.

Expand Down

0 comments on commit 2818d69

Please sign in to comment.