Ansible deployment

An ansible playbook is provided in ansible folder. The ansible playbook will install the pre-requisites, spark, on the master and workers added to the ansible/inventory/hosts file. Scylla-migrator will be installed on the spark master node.

Update ansible/inventory/hosts file with master and worker instances
Update ansible/ansible.cfg with location of private key if necessary
The ansible/template/spark-env-master-sample and ansible/template/spark-env-worker-sample contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.
run ansible-playbook scylla-migrator.yml
On the spark master node: cd scylla-migrator ./start-spark.sh
On the spark worker nodes: ./start-slave.sh
Open spark web console

Ensure networking is configured to allow you access spark master node via 8080 and 4040
visit http://:8080

Review and modify config.yaml based whether you're performing a migration to CQL or Alternator

If you're migrating to Scylla CQL interface (from Cassandra, Scylla, or other CQL source), make a copy review the comments in config.yaml.example, and edit as directed.
If you're migrating to Alternator (from DynamoDB or other Scylla Alternator), make a copy, review the comments in config.dynamodb.yml, and edit as directed.

As part of ansible deployment, sample submit jobs were created. You may edit and use the submit jobs.

For CQL migration: Edit scylla-migrator/submit-cql-job.sh, change line --conf spark.scylla.config=config.yaml \ to point to the whatever you named the config.yaml in previous step.
For Alternator migration: Edit scylla-migrator/submit-alternator-job.sh, change line --conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \ to reference the config.yaml file you created and modified in previous step.

Ensure the table has been created in the target environment.
Submit the migration by submitting the appropriate job

CQL migration: ./submit-cql-job.sh
Alternator migration: ./submit-alternator-job.sh

You can monitor progress by observing the spark web console you opened in step 7. Additionally, after the job has started, you can track progress via http://:4040.
FYI: When no spark jobs are actively running, the spark progress page at port 4040 displays unavailable. It is only useful and renders when a spark job is in progress.

Building

Make sure the Java 8+ JDK and sbt are installed on your machine.
Export the JAVA_HOME environment variable with the path to the JDK installation.
Run build.sh.

Configuring the Migrator

Create a config.yaml for your migration using the template config.yaml.example in the repository root. Read the comments throughout carefully.

Running on a live Spark cluster

The Scylla Migrator is built against Spark 3.5.1, so you'll need to run that version on your cluster.

If you didn't build Scylla Migrator on the master node: After running build.sh, copy the jar from ./migrator/target/scala-2.13/scylla-migrator-assembly-0.0.1.jar and the config.yaml you've created to the Spark master server.

Start the spark master and slaves. cd scylla-migrator ./start-spark.sh

On worker instances: ./start-slave.sh

Configure and confirm networking between:

source and spark servers
target and spark servers

Create schema in target server.

Then, run this command on the Spark master server:

spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://<spark-master-hostname>:7077 \
  --conf spark.scylla.config=<path to config.yaml> \
  <path to scylla-migrator-assembly-0.0.1.jar>

If you pass on the truststore file or ssl related files use --files option:

spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://<spark-master-hostname>:7077 \
  --conf spark.scylla.config=<path to config.yaml> \
  --files truststorefilename \
  <path to scylla-migrator-assembly-0.0.1.jar>

Running the validator

This project also includes an entrypoint for comparing the source table and the target table. You can launch it as so (after performing the previous steps):

spark-submit --class com.scylladb.migrator.Validator \
  --master spark://<spark-master-hostname>:7077 \
  --conf spark.scylla.config=<path to config.yaml> \
  <path to scylla-migrator-assembly-0.0.1.jar>

Running locally

To run in the local Docker-based setup:

First start the environment:

docker compose up -d

Launch cqlsh in Cassandra's container and create a keyspace and a table with some data:

docker compose exec cassandra cqlsh
<create stuff>

Launch cqlsh in Scylla's container and create the destination keyspace and table with the same schema as the source table:

docker compose exec scylla cqlsh
<create stuff>

Edit the config.yaml file; note the comments throughout.
Run build.sh.
Then, launch spark-submit in the master's container to run the job:

docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://spark-master:7077 \
  --conf spark.driver.host=spark-master \
  --conf spark.scylla.config=/app/config.yaml \
  /jars/scylla-migrator-assembly-0.0.1.jar

The spark-master container mounts the ./migrator/target/scala-2.13 dir on /jars and the repository root on /app. To update the jar with new code, just run build.sh and then run spark-submit again.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github		.github
ansible		ansible
dockerfiles/spark		dockerfiles/spark
docs		docs
migrator/src/main/scala/com/scylladb/migrator		migrator/src/main/scala/com/scylladb/migrator
project		project
spark-cassandra-connector @ 468079b		spark-cassandra-connector @ 468079b
spark-kinesis-dynamodb/src/main/scala/org/apache/spark/streaming/kinesis		spark-kinesis-dynamodb/src/main/scala/org/apache/spark/streaming/kinesis
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.scalafmt.conf		.scalafmt.conf
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
build.sh		build.sh
config.yaml.example		config.yaml.example
docker-compose-tests.yml		docker-compose-tests.yml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ansible deployment

Building

Configuring the Migrator

Running on a live Spark cluster

Running the validator

Running locally

About

Releases

Packages

Languages

License

julienrf/scylla-migrator

Folders and files

Latest commit

History

Repository files navigation

Ansible deployment

Building

Configuring the Migrator

Running on a live Spark cluster

Running the validator

Running locally

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages