diff --git a/antora.yml b/antora.yml index ec335a33..e42ef0ee 100644 --- a/antora.yml +++ b/antora.yml @@ -1,5 +1,5 @@ -name: zero-downtime-migration -title: Zero Downtime Migration +name: data-migration +title: Data Migration version: ~ start_page: introduction.adoc @@ -21,4 +21,5 @@ asciidoc: db-classic: 'Classic' astra-cli: 'Astra CLI' url-astra: 'https://astra.datastax.com' - link-astra-portal: '{url-astra}[{astra_ui}^]' + link-astra-portal: '{url-astra}[{astra_ui}]' + astra-db-serverless: 'Astra DB Serverless' diff --git a/local-preview-playbook.yml b/local-preview-playbook.yml index 0e8de8f1..fcbf1e63 100644 --- a/local-preview-playbook.yml +++ b/local-preview-playbook.yml @@ -7,7 +7,7 @@ git: site: title: DataStax Docs - start_page: zero-downtime-migration::introduction.adoc + start_page: data-migration::index.adoc robots: disallow content: diff --git a/modules/ROOT/images/zdm-ansible-container-ls3.png b/modules/ROOT/images/zdm-ansible-container-ls3.png index e7efa6e8..2fd59a1b 100644 Binary files a/modules/ROOT/images/zdm-ansible-container-ls3.png and b/modules/ROOT/images/zdm-ansible-container-ls3.png differ diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index ed391cab..f6cb01fd 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -1,32 +1,34 @@ .{product} * xref:introduction.adoc[] * xref:components.adoc[] -* xref:faqs.adoc[] -* Preliminary steps -** xref:preliminary-steps.adoc[] +* xref:preliminary-steps.adoc[] ** xref:feasibility-checklists.adoc[] ** xref:deployment-infrastructure.adoc[] ** xref:create-target.adoc[] ** xref:rollback.adoc[] -* Phase 1: Deploy {zdm-proxy} and connect client applications -** xref:phase1.adoc[] +//phase 1 +* xref:phase1.adoc[] ** xref:setup-ansible-playbooks.adoc[] ** xref:deploy-proxy-monitoring.adoc[] ** xref:tls.adoc[] ** xref:connect-clients-to-proxy.adoc[] ** xref:metrics.adoc[] ** xref:manage-proxy-instances.adoc[] -* Phase 2: Migrate and validate data -** xref:migrate-and-validate-data.adoc[] +//phase 2 +* xref:migrate-and-validate-data.adoc[] ** xref:cassandra-data-migrator.adoc[] ** xref:dsbulk-migrator.adoc[] +//phase 3 * xref:enable-async-dual-reads.adoc[] +//phase 4 * xref:change-read-routing.adoc[] +//phase 5 * xref:connect-clients-to-target.adoc[] * Troubleshooting ** xref:troubleshooting.adoc[] ** xref:troubleshooting-tips.adoc[] ** xref:troubleshooting-scenarios.adoc[] +* xref:faqs.adoc[] * xref:glossary.adoc[] * xref:contributions.adoc[] -* xref:release-notes.adoc[] +* xref:release-notes.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/cassandra-data-migrator.adoc b/modules/ROOT/pages/cassandra-data-migrator.adoc index 9729d0c6..3f8cba33 100644 --- a/modules/ROOT/pages/cassandra-data-migrator.adoc +++ b/modules/ROOT/pages/cassandra-data-migrator.adoc @@ -5,9 +5,10 @@ Use {cstar-data-migrator} to migrate and validate tables between Origin and Targ [[cdm-prereqs]] == {cstar-data-migrator} prerequisites -* Install or switch to Java 11. The Spark binaries are compiled with this version of Java. -* Install https://archive.apache.org/dist/spark/spark-3.5.1/[Spark 3.5.1^] on a single VM (no cluster necessary) where you want to run this job. -* Optionally, install https://maven.apache.org/download.cgi[Maven^] 3.9.x if you want to build the JAR for local development. +* Install or switch to Java 11. +The Spark binaries are compiled with this version of Java. +* Install https://archive.apache.org/dist/spark/spark-3.5.1/[Spark 3.5.1] on a single VM (no cluster necessary) where you want to run this job. +* Optionally, install https://maven.apache.org/download.cgi[Maven] 3.9.x if you want to build the JAR for local development. You can install Apache Spark by running the following commands: @@ -21,24 +22,26 @@ tar -xvzf spark-3.5.1-bin-hadoop3-scala2.13.tgz [[cdm-install-as-container]] == Install {cstar-data-migrator} as a Container -Get the latest image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub^]. +Get the latest image that includes all dependencies from https://hub.docker.com/r/datastax/cassandra-data-migrator[DockerHub]. All migration tools (`cassandra-data-migrator` + `dsbulk` + `cqlsh`) are available in the `/assets/` folder of the container. [[cdm-install-as-jar]] == Install {cstar-data-migrator} as a JAR file -Download the *latest* JAR file from the {cstar-data-migrator} https://github.com/datastax/cassandra-data-migrator/packages/1832128[GitHub repo^]. image:https://img.shields.io/github/v/release/datastax/cassandra-data-migrator?color=green[Latest release] +Download the *latest* JAR file from the {cstar-data-migrator} https://github.com/datastax/cassandra-data-migrator/packages/1832128[GitHub repo]. +image:https://img.shields.io/github/v/release/datastax/cassandra-data-migrator?color=green[Latest release] [NOTE] ==== -Version 4.x of {cstar-data-migrator} is not backward-compatible with `*.properties` files created in previous versions, and package names have changed. If you're starting new, we recommended that you use the latest released version. +Version 4.x of {cstar-data-migrator} is not backward-compatible with `*.properties` files created in previous versions, and package names have changed. +If you're starting new, we recommended that you use the latest released version. ==== [[cdm-build-jar-local]] == Build {cstar-data-migrator} JAR for local development (optional) -Optionally, you can build the {cstar-data-migrator} JAR for local development. (You'll need https://maven.apache.org/download.cgi[Maven^] 3.9.x.) +Optionally, you can build the {cstar-data-migrator} JAR for local development. (You'll need https://maven.apache.org/download.cgi[Maven] 3.9.x.) Example: @@ -55,9 +58,16 @@ The fat jar (`cassandra-data-migrator-x.y.z.jar`) file should be present now in [[cdm-steps]] == {cstar-data-migrator} steps -1. Configure for your environment the `cdm*.properties` file that's provided in the {cstar-data-migrator} https://github.com/datastax/cassandra-data-migrator/tree/main/src/resources[GitHub repo^]. The file can have any name. It does not need to be `cdm.properties` or `cdm-detailed.properties`. In both versions, only the parameters that aren't commented out will be processed by the `spark-submit` job. Other parameter values use defaults or are ignored. See the descriptions and defaults in each file. Refer to: - * The simplified sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm.properties[cdm.properties^]. This file contains only those parameters that are commonly configured. - * The complete sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm-detailed.properties[cdm-detailed.properties^], for the full set of configurable settings. +1. Configure for your environment the `cdm*.properties` file that's provided in the {cstar-data-migrator} https://github.com/datastax/cassandra-data-migrator/tree/main/src/resources[GitHub repo]. +The file can have any name. +It does not need to be `cdm.properties` or `cdm-detailed.properties`. +In both versions, only the parameters that aren't commented out will be processed by the `spark-submit` job. +Other parameter values use defaults or are ignored. +See the descriptions and defaults in each file. +Refer to: + * The simplified sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm.properties[cdm.properties]. + This file contains only those parameters that are commonly configured. + * The complete sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm-detailed.properties[cdm-detailed.properties], for the full set of configurable settings. 2. Place the properties file that you elected to use and customize where it can be accessed while running the job via `spark-submit`. @@ -104,7 +114,8 @@ Example: [TIP] ==== -To get the list of missing or mismatched records, grep for all `ERROR` entries in the log files. Differences noted in the log file are listed by primary-key values. +To get the list of missing or mismatched records, grep for all `ERROR` entries in the log files. +Differences noted in the log file are listed by primary-key values. ==== You can also run the {cstar-data-migrator} validation job in an **AutoCorrect** mode. This mode can: @@ -112,7 +123,7 @@ You can also run the {cstar-data-migrator} validation job in an **AutoCorrect** * Add any missing records from Origin to Target. * Update any mismatched records between Origin and Target; this action makes Target the same as Origin. -To enable or disable this feature, use one or both of the following settings in your *.properties configuration file. +To enable or disable this feature, use one or both of the following settings in your `*.properties` configuration file. [source,properties] ---- @@ -122,13 +133,15 @@ spark.cdm.autocorrect.mismatch false|true [IMPORTANT] ==== -The {cstar-data-migrator} validation job will never delete records from Target. The job only adds or updates data on Target. +The {cstar-data-migrator} validation job will never delete records from Target. +The job only adds or updates data on Target. ==== [[cdm--partition-ranges]] == Migrating or validating specific partition ranges -You can also use {cstar-data-migrator} to migrate or validate specific partition ranges, by using a **partition-file** with the name `./._partitions.csv`. Use the following format in the CSV file, in the current folder as input. +You can also use {cstar-data-migrator} to migrate or validate specific partition ranges, by using a **partition-file** with the name `./._partitions.csv`. +Use the following format in the CSV file, in the current folder as input. Example: [source,csv] @@ -157,13 +170,19 @@ This mode is specifically useful to processes a subset of partition-ranges that [NOTE] ==== -A file named `./._partitions.csv` is auto-generated by the migration & validation jobs, in the format shown above. The file contains any failed partition ranges. No file is created if there were no failed partitions. You can use the CSV as input to process any failed partition in a subsequent run. +A file named `./._partitions.csv` is auto-generated by the migration & validation jobs, in the format shown above. +The file contains any failed partition ranges. +No file is created if there were no failed partitions. +You can use the CSV as input to process any failed partition in a subsequent run. ==== [[cdm-guardrail-checks]] == Perform large-field guardrail violation checks -Use {cstar-data-migrator} to identify large fields from a table that may break your cluster guardrails. For example, {astra_db} has a 10MB limit for a single large field. Specify `--class com.datastax.cdm.job.GuardrailCheck` on the command. Example: +Use {cstar-data-migrator} to identify large fields from a table that may break your cluster guardrails. +For example, {astra_db} has a 10MB limit for a single large field. +Specify `--class com.datastax.cdm.job.GuardrailCheck` on the command. +Example: [source,bash] ---- @@ -193,13 +212,14 @@ Use {cstar-data-migrator} to identify large fields from a table that may break y [[cdm-connection-params]] === Common connection parameters for Origin and Target -[cols="3,1,3"] +[cols="5,2,4"] |=== |Property | Default | Notes | `spark.cdm.connect.origin.host` | `localhost` -| Hostname/IP address of the cluster. May be a comma-separated list, and can follow the `:` convention. +| Hostname/IP address of the cluster. +May be a comma-separated list, and can follow the `:` convention. | `spark.cdm.connect.origin.port` | `9042` @@ -207,7 +227,8 @@ Use {cstar-data-migrator} to identify large fields from a table that may break y | `spark.cdm.connect.origin.scb` | (Not set) -| Secure Connect Bundle, used to connect to an Astra DB database. Example: `file:///aaa/bbb/scb-enterprise.zip`. +| Secure Connect Bundle, used to connect to an Astra DB database. +Example: `file:///aaa/bbb/scb-enterprise.zip`. | `spark.cdm.connect.origin.username` | `cassandra` @@ -219,7 +240,8 @@ Use {cstar-data-migrator} to identify large fields from a table that may break y | `spark.cdm.connect.target.host` | `localhost` -| Hostname/IP address of the cluster. May be a comma-separated list, and can follow the `:` convention. +| Hostname/IP address of the cluster. +May be a comma-separated list, and can follow the `:` convention. | `spark.cdm.connect.target.port` | `9042` @@ -227,7 +249,9 @@ Use {cstar-data-migrator} to identify large fields from a table that may break y | `spark.cdm.connect.target.scb` | (Not set) -| Secure Connect Bundle, used to connect to an Astra DB database. Default is not set. Example if set: `file:///aaa/bbb/my-scb.zip`. +| Secure Connect Bundle, used to connect to an Astra DB database. +Default is not set. +Example if set: `file:///aaa/bbb/my-scb.zip`. | `spark.cdm.connect.target.username` | `cassandra` @@ -243,25 +267,31 @@ Use {cstar-data-migrator} to identify large fields from a table that may break y [[cdm-origin-schema-params]] === Origin schema parameters -[cols="3,1,3a"] +[cols="3,1,5a"] |=== |Property | Default | Notes | `spark.cdm.schema.origin.keyspaceTable` | -| Required - the `.` of the table to be migrated. Table must exist in Origin. +| Required - the `.` of the table to be migrated. +Table must exist in Origin. | `spark.cdm.schema.origin.column.ttl.automatic` | `true` -| Default is `true`, unless `spark.cdm.schema.origin.column.ttl.names` is specified. When `true`, the Time To Live (TTL) of the Target record will be determined by finding the maximum TTL of all Origin columns that can have TTL set (which excludes partition key, clustering key, collections/UDT/tuple, and frozen columns). When `false`, and `spark.cdm.schema.origin.column.ttl.names` is not set, the Target record will have the TTL determined by the Target table configuration. +| Default is `true`, unless `spark.cdm.schema.origin.column.ttl.names` is specified. +When `true`, the Time To Live (TTL) of the Target record will be determined by finding the maximum TTL of all Origin columns that can have TTL set (which excludes partition key, clustering key, collections/UDT/tuple, and frozen columns). +When `false`, and `spark.cdm.schema.origin.column.ttl.names` is not set, the Target record will have the TTL determined by the Target table configuration. | `spark.cdm.schema.origin.column.ttl.names` | -| Default is empty, meaning the names will be determined automatically if `spark.cdm.schema.origin.column.ttl.automatic` is set. Specify a subset of eligible columns that are used to calculate the TTL of the Target record. +| Default is empty, meaning the names will be determined automatically if `spark.cdm.schema.origin.column.ttl.automatic` is set. +Specify a subset of eligible columns that are used to calculate the TTL of the Target record. | `spark.cdm.schema.origin.column.writetime.automatic` | `true` -| Default is `true`, unless `spark.cdm.schema.origin.column.writetime.names` is specified. When `true`, the `WRITETIME` of the Target record will be determined by finding the maximum `WRITETIME` of all Origin columns that can have `WRITETIME` set (which excludes partition key, clustering key, collections/UDT/tuple, and frozen columns). When `false`, and `spark.cdm.schema.origin.column.writetime.names` is not set, the Target record will have the `WRITETIME` determined by the Target table configuration. +| Default is `true`, unless `spark.cdm.schema.origin.column.writetime.names` is specified. +When `true`, the `WRITETIME` of the Target record will be determined by finding the maximum `WRITETIME` of all Origin columns that can have `WRITETIME` set (which excludes partition key, clustering key, collections/UDT/tuple, and frozen columns). +When `false`, and `spark.cdm.schema.origin.column.writetime.names` is not set, the Target record will have the `WRITETIME` determined by the Target table configuration. [NOTE] ==== The `spark.cdm.transform.custom.writetime` property, if set, would override `spark.cdm.schema.origin.column.writetime`. @@ -269,30 +299,38 @@ The `spark.cdm.transform.custom.writetime` property, if set, would override `spa | `spark.cdm.schema.origin.column.writetime.names` | -| Default is empty, meaning the names will be determined automatically if `spark.cdm.schema.origin.column.writetime.automatic` is set. Otherwise, specify a subset of eligible columns that are used to calculate the WRITETIME of the Target record. Example: `data_col1,data_col2,...` +| Default is empty, meaning the names will be determined automatically if `spark.cdm.schema.origin.column.writetime.automatic` is set. +Otherwise, specify a subset of eligible columns that are used to calculate the WRITETIME of the Target record. +Example: `data_col1,data_col2,...` | `spark.cdm.schema.origin.column.names.to.target` | -| Default is empty. If column names are changed between Origin and Target, then this mapped list provides a mechanism to associate the two. The format is `:`. The list is comma-separated. You only need to list renamed columns. +| Default is empty. +If column names are changed between Origin and Target, then this mapped list provides a mechanism to associate the two. +The format is `:`. +The list is comma-separated. +You only need to list renamed columns. |=== [NOTE] ==== -For optimization reasons, {cstar-data-migrator} does not migrate TTL and writetime at the field-level. Instead, {cstar-data-migrator} finds the field with the highest TTL, and the field with the highest writetime within an Origin table row, and uses those values on the entire Target table row. +For optimization reasons, {cstar-data-migrator} does not migrate TTL and writetime at the field-level. +Instead, {cstar-data-migrator} finds the field with the highest TTL, and the field with the highest writetime within an Origin table row, and uses those values on the entire Target table row. ==== - [[cdm-target-schema-params]] === Target schema parameter -[cols="3,1,3"] +[cols="3,1,2"] |=== |Property | Default | Notes | `spark.cdm.schema.target.keyspaceTable` -| -| This parameter is commented out. It's the `.` of the table to be migrated into the Target. Table must exist in Target. Default is the value of `spark.cdm.schema.origin.keyspaceTable`. +| Equals the value of `spark.cdm.schema.origin.keyspaceTable` +| This parameter is commented out. +It's the `.` of the table to be migrated into the Target. +Table must exist in Target. |=== @@ -300,11 +338,13 @@ For optimization reasons, {cstar-data-migrator} does not migrate TTL and writeti [[cdm-auto-correction-params]] === Auto-correction parameters -Auto-correction parameters allow {cstar-data-migrator} to correct data differences found between Origin and Target when you run the `DiffData` program. Typically, these are run disabled (for "what if" migration testing), which will generate a list of data discrepancies. The reasons for these discrepancies can then be investigated, and if necessary the parameters below can be enabled. +Auto-correction parameters allow {cstar-data-migrator} to correct data differences found between Origin and Target when you run the `DiffData` program. +Typically, these are run disabled (for "what if" migration testing), which will generate a list of data discrepancies. +The reasons for these discrepancies can then be investigated, and if necessary the parameters below can be enabled. For information about invoking `DiffData` in a {cstar-data-migrator} command, see xref:#cdm-validation-steps[{cstar-data-migrator} steps in validation mode] in this topic. -[cols="3,1,3a"] +[cols="2,2,3a"] |=== |Property | Default | Notes @@ -317,16 +357,22 @@ For information about invoking `DiffData` in a {cstar-data-migrator} command, se | When `true`, data that is different between Origin and Target will be reconciled. [NOTE] ==== -The `TIMESTAMP` of records may have an effect. If the `WRITETIME` of the Origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the Target record, the change will not appear in Target. This comparative state may be particularly challenging to troubleshoot if individual columns (cells) have been modified in Target. +The `TIMESTAMP` of records may have an effect. +If the `WRITETIME` of the Origin record (determined with `.writetime.names`) is earlier than the `WRITETIME` of the Target record, the change will not appear in Target. +This comparative state may be particularly challenging to troubleshoot if individual columns (cells) have been modified in Target. ==== | `spark.cdm.autocorrect.missing.counter` | `false` -| Commented out. By default, Counter tables are not copied when missing, unless explicitly set. +| Commented out. +By default, Counter tables are not copied when missing, unless explicitly set. | `spark.tokenrange.partitionFile` | `./._partitions.csv` -| Commented out. This CSV file is used as input, as well as output when applicable. If the file exists, only the partition ranges in this file will be migrated or validated. Similarly, if exceptions occur while migrating or validating, partition ranges with exceptions will be logged to this file. +| Commented out. +This CSV file is used as input, as well as output when applicable. +If the file exists, only the partition ranges in this file will be migrated or validated. +Similarly, if exceptions occur while migrating or validating, partition ranges with exceptions will be logged to this file. |=== @@ -336,45 +382,61 @@ The `TIMESTAMP` of records may have an effect. If the `WRITETIME` of the Origin Performance and operations parameters that can affect migration throughput, error handling, and similar concerns. -[cols="3,1,3"] +[cols="4,1,3"] |=== |Property | Default | Notes | `spark.cdm.perfops.numParts` | `10000` -| In standard operation, the full token range (-2^63 .. 2^63-1) is divided into a number of parts, which will be parallel-processed. You should aim for each part to comprise a total of ≈1-10GB of data to migrate. During initial testing, you may want this to be a small number (such as `1`). +| In standard operation, the full token range (-2^63 .. 2^63-1) is divided into a number of parts, which will be parallel-processed. +You should aim for each part to comprise a total of ≈1-10GB of data to migrate. +During initial testing, you may want this to be a small number (such as `1`). | `spark.cdm.perfops.batchSize` | `5` -| When writing to Target, this comprises the number of records that will be put into an `UNLOGGED` batch. {cstar-data-migrator} will tend to work on the same partition at a time. Thus if your partition sizes are larger, this number may be increased. If the `spark.cdm.perfops.batchSize` would mean that more than 1 partition is often contained in a batch, reduce this parameter's value. Ideally < 1% of batches have more than 1 partition. +| When writing to Target, this comprises the number of records that will be put into an `UNLOGGED` batch. +{cstar-data-migrator} will tend to work on the same partition at a time. +Thus if your partition sizes are larger, this number may be increased. +If the `spark.cdm.perfops.batchSize` would mean that more than 1 partition is often contained in a batch, reduce this parameter's value. +Ideally < 1% of batches have more than 1 partition. | `spark.cdm.perfops.ratelimit.origin` | `20000` -| Concurrent number of operations across all parallel threads from Origin. This value may be adjusted up (or down), depending on the amount of data and the processing capacity of the Origin cluster. +| Concurrent number of operations across all parallel threads from Origin. +This value may be adjusted up (or down), depending on the amount of data and the processing capacity of the Origin cluster. | `spark.cdm.perfops.ratelimit.target` | `40000` -| Concurrent number of operations across all parallel threads from Target. This may be adjusted up (or down), depending on the amount of data and the processing capacity of the Target cluster. +| Concurrent number of operations across all parallel threads from Target. +This may be adjusted up (or down), depending on the amount of data and the processing capacity of the Target cluster. | `spark.cdm.perfops.consistency.read` | `LOCAL_QUORUM` -| Commented out. Read consistency from Origin, and also from Target when records are read for comparison purposes. The consistency parameters may be one of: `ANY`, `ONE`, `TWO`, `THREE`, `QUORUM`, `LOCAL_ONE`, `EACH_QUORUM`, `LOCAL_QUORUM`, `SERIAL`, `LOCAL_SERIAL`, `ALL`. +| Commented out. +Read consistency from Origin, and also from Target when records are read for comparison purposes. +The consistency parameters may be one of: `ANY`, `ONE`, `TWO`, `THREE`, `QUORUM`, `LOCAL_ONE`, `EACH_QUORUM`, `LOCAL_QUORUM`, `SERIAL`, `LOCAL_SERIAL`, `ALL`. | `spark.cdm.perfops.consistency.write` | `LOCAL_QUORUM` -| Commented out. Write consistency to Target. The consistency parameters may be one of: `ANY`, `ONE`, `TWO`, `THREE`, `QUORUM`, `LOCAL_ONE`, `EACH_QUORUM`, `LOCAL_QUORUM`, `SERIAL`, `LOCAL_SERIAL`, `ALL`. +| Commented out. +Write consistency to Target. +The consistency parameters may be one of: `ANY`, `ONE`, `TWO`, `THREE`, `QUORUM`, `LOCAL_ONE`, `EACH_QUORUM`, `LOCAL_QUORUM`, `SERIAL`, `LOCAL_SERIAL`, `ALL`. | `spark.cdm.perfops.printStatsAfter` | `100000` -| Commented out. Number of rows of processing after which a progress log entry will be made. +| Commented out. +Number of rows of processing after which a progress log entry will be made. | `spark.cdm.perfops.fetchSizeInRows` | `1000` -| Commented out. This parameter affects the frequency of reads from Origin, and also the frequency of flushes to Target. +| Commented out. +This parameter affects the frequency of reads from Origin, and also the frequency of flushes to Target. | `spark.cdm.perfops.errorLimit` | `0` -| Commented out. Controls how many errors a thread may encounter during `MigrateData` and `DiffData` operations before failing. Recommendation: set this parameter to a non-zero value **only when not doing** a mutation-type operation, such as when you're running `DiffData` without `.autocorrect`. +| Commented out. +Controls how many errors a thread may encounter during `MigrateData` and `DiffData` operations before failing. +Recommendation: set this parameter to a non-zero value **only when not doing** a mutation-type operation, such as when you're running `DiffData` without `.autocorrect`. |=== @@ -386,26 +448,34 @@ Parameters to perform schema transformations between Origin and Target. By default, these parameters are commented out. -[cols="3,1,3a"] +[cols="2,1,4a"] |=== |Property | Default | Notes | `spark.cdm.transform.missing.key.ts.replace.value` | `1685577600000` | Timestamp value in milliseconds. -Partition and clustering columns cannot have null values, but if these are added as part of a schema transformation between Origin and Target, it is possible that the Origin side is null. In this case, the `Migrate` data operation would fail. This parameter allows a crude constant value to be used in its place, separate from the Constant values feature. +Partition and clustering columns cannot have null values, but if these are added as part of a schema transformation between Origin and Target, it is possible that the Origin side is null. +In this case, the `Migrate` data operation would fail. +This parameter allows a crude constant value to be used in its place, separate from the Constant values feature. | `spark.cdm.transform.custom.writetime` | `0` -| Default is 0 (disabled). Timestamp value in microseconds to use as the `WRITETIME` for the Target record. This is useful when the `WRITETIME` of the record in Origin cannot be determined (such as when the only non-key columns are collections). This parameter allows a crude constant value to be used in its place, and overrides `spark.cdm.schema.origin.column.writetime.names`. +| Default is 0 (disabled). +Timestamp value in microseconds to use as the `WRITETIME` for the Target record. +This is useful when the `WRITETIME` of the record in Origin cannot be determined (such as when the only non-key columns are collections). +This parameter allows a crude constant value to be used in its place, and overrides `spark.cdm.schema.origin.column.writetime.names`. | `spark.cdm.transform.custom.writetime.incrementBy` | `0` -| Default is `0`. This is useful when you have a List that is not frozen, and you are updating this via the autocorrect feature. Lists are not idempotent, and subsequent UPSERTs would add duplicates to the list. +| Default is `0`. +This is useful when you have a List that is not frozen, and you are updating this via the autocorrect feature. +Lists are not idempotent, and subsequent UPSERTs would add duplicates to the list. | `spark.cdm.transform.codecs` | -| Default is empty. A comma-separated list of additional codecs to enable. +| Default is empty. +A comma-separated list of additional codecs to enable. * `INT_STRING` : int stored in a String. * `DOUBLE_STRING` : double stored in a String. @@ -421,12 +491,14 @@ Where there are multiple type pair options, such as with `TIMESTAMP_STRING_*`, o | `spark.cdm.transform.codecs.timestamp.string.format` | `yyyyMMddHHmmss` -| Configuration for `CQL_TIMESTAMP_TO_STRING_FORMAT` codec. Default format is `yyyyMMddHHmmss`; `DateTimeFormatter.ofPattern(formatString)` +| Configuration for `CQL_TIMESTAMP_TO_STRING_FORMAT` codec. +Default format is `yyyyMMddHHmmss`; `DateTimeFormatter.ofPattern(formatString)` | `spark.cdm.transform.codecs.timestamp.string.zone` | `UTC` -| Default is `UTC`. Must be in `ZoneRulesProvider.getAvailableZoneIds()`. +| Default is `UTC`. +Must be in `ZoneRulesProvider.getAvailableZoneIds()`. |=== @@ -434,7 +506,8 @@ Where there are multiple type pair options, such as with `TIMESTAMP_STRING_*`, o [[cdm-cassandra-filter-params]] === Cassandra filter parameters -Cassandra filters are applied on the coordinator node. Note that, depending on the filter, the coordinator node may need to do a lot more work than is normal, notably because {cstar-data-migrator} specifies `ALLOW FILTERING`. +Cassandra filters are applied on the coordinator node. +Note that, depending on the filter, the coordinator node may need to do a lot more work than is normal, notably because {cstar-data-migrator} specifies `ALLOW FILTERING`. By default, these parameters are commented out. @@ -444,15 +517,17 @@ By default, these parameters are commented out. | `spark.cdm.filter.cassandra.partition.min` | `-9223372036854775808` -| Default is `0` (when using `RandomPartitioner`) and `-9223372036854775808` (-2^63) otherwise. Lower partition bound (inclusive). +| Default is `0` (when using `RandomPartitioner`) and `-9223372036854775808` (-2^63) otherwise. +Lower partition bound (inclusive). | `spark.cdm.filter.cassandra.partition.max` | `9223372036854775807` -| Default is `2^127-1` (when using `RandomPartitioner`) and `9223372036854775807` (2^63-1) otherwise. Upper partition bound (inclusive). +| Default is `2^127-1` (when using `RandomPartitioner`) and `9223372036854775807` (2^63-1) otherwise. +Upper partition bound (inclusive). | `spark.cdm.filter.cassandra.whereCondition` | -| CQL added to the `WHERE` clause of `SELECT` statements from Origin +| CQL added to the `WHERE` clause of `SELECT` statements from Origin. |=== @@ -460,36 +535,49 @@ By default, these parameters are commented out. [[cdm-java-filter-params]] === Java filter parameters -Java filters are applied on the client node. Data must be pulled from the Origin cluster and then filtered. However, this option may have a lower impact on the production cluster than xref:#cdm-cassandra-filter-params[Cassandra filters]. Java filters put load onto the {cstar-data-migrator} processing node, by sending more data from Cassandra. Cassandra filters put load on the Cassandra nodes, notably because {cstar-data-migrator} specifies `ALLOW FILTERING`, which could cause the coordinator node to perform a lot more work. +Java filters are applied on the client node. +Data must be pulled from the Origin cluster and then filtered. +However, this option may have a lower impact on the production cluster than xref:cdm-cassandra-filter-params[Cassandra filters]. +Java filters put load onto the {cstar-data-migrator} processing node, by sending more data from Cassandra. +Cassandra filters put load on the Cassandra nodes, notably because {cstar-data-migrator} specifies `ALLOW FILTERING`, which could cause the coordinator node to perform a lot more work. By default, these parameters are commented out. -[cols="3,1,3"] +[cols="2,1,4"] |=== |Property | Default | Notes | `spark.cdm.filter.java.token.percent` | `100` | Percent (between 1 and 100) of the token in each Split that will be migrated. -This property is used to do a wide and random sampling of the data. The percentage value is applied to each split. Invalid percentages will be treated as 100. +This property is used to do a wide and random sampling of the data. +The percentage value is applied to each split. +Invalid percentages will be treated as 100. | `spark.cdm.filter.java.writetime.min` | `0` -| The lowest (inclusive) writetime values to be migrated. Using the `spark.cdm.filter.java.writetime.min` and `spark.cdm.filter.java.writetime.max` thresholds, {cstar-data-migrator} can filter records based on their writetimes. The maximum writetime of the columns configured at `spark.cdm.schema.origin.column.writetime.names` will be compared to the `.min` and `.max` thresholds, which must be in **microseconds since the epoch**. If the `spark.cdm..schema.origin.column.writetime.names` are not specified, or the thresholds are null or otherwise invalid, the filter will be ignored. Note that `spark.cdm.s.perfops.batchSize` will be ignored when this filter is in place; a value of 1 will be used instead. +| The lowest (inclusive) writetime values to be migrated. +Using the `spark.cdm.filter.java.writetime.min` and `spark.cdm.filter.java.writetime.max` thresholds, {cstar-data-migrator} can filter records based on their writetimes. +The maximum writetime of the columns configured at `spark.cdm.schema.origin.column.writetime.names` will be compared to the `.min` and `.max` thresholds, which must be in **microseconds since the epoch**. +If the `spark.cdm.schema.origin.column.writetime.names` are not specified, or the thresholds are null or otherwise invalid, the filter will be ignored. +Note that `spark.cdm.s.perfops.batchSize` will be ignored when this filter is in place; a value of 1 will be used instead. | `spark.cdm.filter.java.writetime.max` | `9223372036854775807` -| The highest (inclusive) writetime values to be migrated. Maximum timestamp of the columns specified by `spark.cdm.schema.origin.column.writetime.names`; if that property is not specified, or is for some reason null, the filter is ignored. +| The highest (inclusive) writetime values to be migrated. +Maximum timestamp of the columns specified by `spark.cdm.schema.origin.column.writetime.names`; if that property is not specified, or is for some reason null, the filter is ignored. | `spark.cdm.filter.java.column.name` | -| Filter rows based on matching a configured value. With `spark.cdm.filter.java.column.name`, specify the column name against which the `spark.cdm.filter.java.column.value` is compared. Must be on the column list specified at `spark.cdm.schema.origin.column.names`. The column value will be converted to a String, trimmed of whitespace on both ends, and compared. +| Filter rows based on matching a configured value. +With `spark.cdm.filter.java.column.name`, specify the column name against which the `spark.cdm.filter.java.column.value` is compared. +Must be on the column list specified at `spark.cdm.schema.origin.column.names`. +The column value will be converted to a String, trimmed of whitespace on both ends, and compared. | `spark.cdm.filter.java.column.value` | -| String value to use as comparison. Whitespace on the ends of `spark.cdm.filter.java.column.value` will be trimmed. - - +| String value to use as comparison. +Whitespace on the ends of `spark.cdm.filter.java.column.value` will be trimmed. |=== @@ -501,7 +589,7 @@ If used, the `spark.cdm.feature.constantColumns.names`, `spark.cdm.feature.const By default, these parameters are commented out. -[cols="3,1,3"] +[cols="2,1,3"] |=== |Property | Default | Notes @@ -515,7 +603,9 @@ By default, these parameters are commented out. | `spark.cdm.feature.constantColumns.values` | -| A comma-separated list of hard-coded values. Each value should be provided as you would use on the `CQLSH` command line. Examples: `'abcd'` for a string; `1234` for an int, and so on. +| A comma-separated list of hard-coded values. +Each value should be provided as you would use on the `CQLSH` command line. +Examples: `'abcd'` for a string; `1234` for an int, and so on. | `spark.cdm.feature.constantColumns.splitRegex` | `,` @@ -531,22 +621,20 @@ The explode map feature allows you convert an Origin table Map into multiple Tar By default, these parameters are commented out. -[cols="3,1,3"] +[cols="3,3"] |=== -|Property | Default | Notes +|Property | Notes | `spark.cdm.feature.explodeMap.origin.name` -| -| The name of the map column, such as `my_map`. Must be defined on `spark.cdm.schema.origin.column.names`, and the corresponding type on `spark.cdm.schema.origin.column.types` must be a map. +| The name of the map column, such as `my_map`. +Must be defined on `spark.cdm.schema.origin.column.names`, and the corresponding type on `spark.cdm.schema.origin.column.types` must be a map. | `spark.cdm.feature.explodeMap.origin.name.key` -| -| The name of the column on the Target table that will hold the map key, such as `my_map_key`. This key must be present on the Target primary key `spark.cdm.schema.target.column.id.names`. +| The name of the column on the Target table that will hold the map key, such as `my_map_key`. +This key must be present on the Target primary key `spark.cdm.schema.target.column.id.names`. | `spark.cdm.feature.explodeMap.origin.value` -| | The name of the column on the Target table that will hold the map value, such as `my_map_value`. - |=== @@ -564,7 +652,9 @@ By default, these parameters are commented out. | `spark.cdm.feature.guardrail.colSizeInKB` | `0` -| The `0` default means the guardrail check is not done. If set, table records with one or more fields that exceed the column size in kB will be flagged. Note this is kB (base 10), not kiB (base 2). +| The `0` default means the guardrail check is not done. +If set, table records with one or more fields that exceed the column size in kB will be flagged. +Note this is kB (base 10), not kiB (base 2). |=== @@ -577,7 +667,7 @@ Note that a secure connect bundle (SCB) embeds these details. By default, these parameters are commented out. -[cols="3,1,3"] +[cols="3,3,3"] |=== |Property | Default | Notes @@ -593,7 +683,7 @@ By default, these parameters are commented out. | | Password needed to open the truststore. -| `spark.cdm.connect.origin.tls.trustStore.type ` +| `spark.cdm.connect.origin.tls.trustStore.type` | `JKS` | @@ -621,7 +711,7 @@ By default, these parameters are commented out. | | Password needed to open the truststore. -| `spark.cdm.connect.target.tls.trustStore.type ` +| `spark.cdm.connect.target.tls.trustStore.type` | `JKS` | @@ -637,4 +727,4 @@ By default, these parameters are commented out. | `TLS_RSA_WITH_AES_128_CBC_SHA`,`TLS_RSA_WITH_AES_256_CBC_SHA` | -|=== +|=== \ No newline at end of file diff --git a/modules/ROOT/pages/change-read-routing.adoc b/modules/ROOT/pages/change-read-routing.adoc index 41830b4c..d30a6c7f 100644 --- a/modules/ROOT/pages/change-read-routing.adoc +++ b/modules/ROOT/pages/change-read-routing.adoc @@ -5,7 +5,7 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] This topic explains how you can configure the {zdm-proxy} to route all reads to Target instead of Origin. -include::partial$lightbox-tip.adoc[] +//include::partial$lightbox-tip.adoc[] image::{imagesprefix}migration-phase4ra9.png["Phase 4 diagram shows read routing on ZDM Proxy was switched to Target."] @@ -19,7 +19,9 @@ This operation is a configuration change that can be carried out as explained xr [TIP] ==== -If you performed the optional steps described in the prior topic, xref:enable-async-dual-reads.adoc[] -- to verify that your Target cluster was ready and tuned appropriately to handle the production read load -- be sure to disable async dual reads when you're done testing. If you haven't already, revert `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY` when switching sync reads to Target. Example: +If you performed the optional steps described in the prior topic, xref:enable-async-dual-reads.adoc[] -- to verify that your Target cluster was ready and tuned appropriately to handle the production read load -- be sure to disable async dual reads when you're done testing. +If you haven't already, revert `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY` when switching sync reads to Target. +Example: [source,yml] ---- @@ -30,6 +32,7 @@ Otherwise, if you don't disable async dual reads, {zdm-proxy} instances would co ==== == Changing the read routing configuration + If you're not there already, `ssh` back into the jumphost: [source,bash] @@ -60,11 +63,14 @@ Run the playbook that changes the configuration of the existing {zdm-proxy} depl ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- -Wait for the {zdm-proxy} instances to be restarted by Ansible, one by one. All instances will now send all reads to Target instead of Origin. In other words, Target is now the primary cluster, but the {zdm-proxy} is still keeping Origin up-to-date via dual writes. +Wait for the {zdm-proxy} instances to be restarted by Ansible, one by one. +All instances will now send all reads to Target instead of Origin. +In other words, Target is now the primary cluster, but the {zdm-proxy} is still keeping Origin up-to-date via dual writes. == Verifying the read routing change -Once the read routing configuration change has been rolled out, you may want to verify that reads are correctly sent to Target as expected. This is not a required step, but you may wish to do it for peace of mind. +Once the read routing configuration change has been rolled out, you may want to verify that reads are correctly sent to Target as expected. +This is not a required step, but you may wish to do it for peace of mind. [TIP] ==== @@ -80,11 +86,18 @@ Although `DESCRIBE` requests are not system requests, they are also generally re Verifying that the correct routing is taking place is a slightly cumbersome operation, due to the fact that the purpose of the ZDM process is to align the clusters and therefore, by definition, the data will be identical on both sides. -For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters. To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application]. This client application connects directly to Origin, Target and the {zdm-proxy}, inserts some test data in its own table and allows you to view the results of reads from each source. Please refer to its README for more information. +For this reason, the only way to do a manual verification test is to force a discrepancy of some test data between the clusters. +To do this, you could consider using the xref:connect-clients-to-proxy.adoc#_themis_client[Themis sample client application]. +This client application connects directly to Origin, Target and the {zdm-proxy}, inserts some test data in its own table and allows you to view the results of reads from each source. +Please refer to its README for more information. Alternatively, you could follow this manual procedure: -* Create a small test table on both clusters, for example a simple key/value table (it could be in an existing keyspace, or in one that you create specifically for this test). For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);`. -* Use `cqlsh` to connect *directly to Origin*. Insert a row with any key, and with a value specific to Origin, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from Origin!');`. -* Now, use `cqlsh` to connect *directly to Target*. Insert a row with the same key as above, but with a value specific to Target, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from Target!');`. -* Now, use `cqlsh` to connect to the {zdm-proxy} (see xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[here] for how to do this) and issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`. The result will clearly show you where the read actually comes from. +* Create a small test table on both clusters, for example a simple key/value table (it could be in an existing keyspace, or in one that you create specifically for this test). +For example `CREATE TABLE test_keyspace.test_table(k TEXT PRIMARY KEY, v TEXT);`. +* Use `cqlsh` to connect *directly to Origin*. +Insert a row with any key, and with a value specific to Origin, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from Origin!');`. +* Now, use `cqlsh` to connect *directly to Target*. +Insert a row with the same key as above, but with a value specific to Target, for example `INSERT INTO test_keyspace.test_table(k, v) VALUES ('1', 'Hello from Target!');`. +* Now, use `cqlsh` to connect to the {zdm-proxy} (see xref:connect-clients-to-proxy.adoc#_connecting_cqlsh_to_the_zdm_proxy[here] for how to do this) and issue a read request for this test table: `SELECT * FROM test_keyspace.test_table WHERE k = '1';`. +The result will clearly show you where the read actually comes from. diff --git a/modules/ROOT/pages/components.adoc b/modules/ROOT/pages/components.adoc index 52832d99..06cafb5e 100644 --- a/modules/ROOT/pages/components.adoc +++ b/modules/ROOT/pages/components.adoc @@ -3,28 +3,36 @@ ifdef::env-github,env-browser,env-vscode[:imagesprefix: ../images/] ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] -The main component of the {company} {zdm-product} product suite is **{zdm-proxy}**, which by design is a simple and lightweight proxy that handles all the real-time requests generated by your client applications. {zdm-proxy} is open-source software (OSS) and available in its Public GitHub repo, https://github.com/datastax/zdm-proxy. You can view the source files and contribute code for potential inclusion via Pull Requests (PRs) initiated on a fork of the repo. +The main component of the {company} {zdm-product} product suite is **{zdm-proxy}**, which by design is a simple and lightweight proxy that handles all the real-time requests generated by your client applications. + +{zdm-proxy} is open-source software (OSS) and available in its https://github.com/datastax/zdm-proxy[Public GitHub repo]. +You can view the source files and contribute code for potential inclusion via Pull Requests (PRs) initiated on a fork of the repo. The {zdm-proxy} itself doesn't have any capability to migrate data or knowledge that a migration may be ongoing, and it is not coupled to the migration process in any way. * {company} {zdm-product} also provides the **{zdm-utility}** and **{zdm-automation}** to set up and run the Ansible playbooks that deploy and manage the {zdm-proxy} and its monitoring stack. -* Two data migration tools are available -- **{cstar-data-migrator}** and **{dsbulk-migrator}** -- to migrate your data. See the xref:introduction.adoc#_data_migration_tools[summary of features] below. +* Multiple data migration tools such as **{cstar-data-migrator}** and **{dsbulk-migrator}** are available. == Role of {zdm-proxy} -We created {zdm-proxy} to function between the application and both databases (Origin and Target). The databases can be any CQL-compatible data store (e.g. Apache Cassandra, DataStax Enterprise and {astra_db}). The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level: +We created {zdm-proxy} to function between the application and both databases (Origin and Target). +The databases can be any CQL-compatible data store (e.g. Apache Cassandra, DataStax Enterprise and {astra_db}). +The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level: -* If the write is successful in both clusters, it returns a successful acknowledgement to the client application +* If the write is successful in both clusters, it returns a successful acknowledgement to the client application. * If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy. -This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application. {zdm-proxy} also sends all reads to the primary cluster (initially Origin, and later Target) and returns the result to the client application. +This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application. +{zdm-proxy} also sends all reads to the primary cluster (initially Origin, and later Target) and returns the result to the client application. -{zdm-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers. {zdm-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration. +{zdm-proxy} is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers. +{zdm-proxy} can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration. [TIP] ==== -{zdm-proxy} has been designed to run in a **clustered** fashion so that it is never a single point of failure. Unless it is for a demo or local testing environment, a {zdm-proxy} deployment should always comprise multiple {zdm-proxy} instances. +{zdm-proxy} has been designed to run in a **clustered** fashion so that it is never a single point of failure. +Unless it is for a demo or local testing environment, a {zdm-proxy} deployment should always comprise multiple {zdm-proxy} instances. The term {zdm-proxy} indicates the whole deployment, and {zdm-proxy} instance refers to an individual proxy process in the deployment. ==== @@ -37,25 +45,34 @@ The term {zdm-proxy} indicates the whole deployment, and {zdm-proxy} instance re * Bifurcates writes synchronously to both clusters during the migration process. -* Returns (for read operations) the response from the primary cluster, which is its designated source of truth. During a migration, Origin is typically the primary cluster. Near the end of the migration, you'll shift the primary cluster to be Target. +* Returns (for read operations) the response from the primary cluster, which is its designated source of truth. +During a migration, Origin is typically the primary cluster. +Near the end of the migration, you'll shift the primary cluster to be Target. -* Can be configured to also read asynchronously from Target. This capability is called **Asynchronous Dual Reads** (also known as **Read Mirroring**) and allows you to observe what read latencies and throughput Target can achieve under the actual production load. +* Can be configured to also read asynchronously from Target. +This capability is called **Asynchronous Dual Reads** (also known as **Read Mirroring**) and allows you to observe what read latencies and throughput Target can achieve under the actual production load. ** Results from the asynchronous reads executed on Target are not sent back to the client application. ** This design implies that failure on asynchronous reads from Target does not cause an error on the client application. ** Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the {zdm-proxy} instances. [NOTE] ==== -When using Asynchronous Dual Reads, any additional read load on Target may impact its ability to keep up with writes. This behavior is expected and desired. The idea is to mimic the full read and write load on Target so there are no surprises during the last migration phase; that is, after cutting over completely to Target. +When using Asynchronous Dual Reads, any additional read load on Target may impact its ability to keep up with writes. +This behavior is expected and desired. +The idea is to mimic the full read and write load on Target so there are no surprises during the last migration phase; that is, after cutting over completely to Target. ==== == {zdm-utility} and {zdm-automation} -https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code. It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality. +https://www.ansible.com/[Ansible] is a suite of software tools that enables infrastructure as code. +It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality. -The Ansible automation for {zdm-shortproduct} is organized into playbooks, each implementing a specific operation. The machine from which the playbooks are run is known as the Ansible Control Host. In {zdm-shortproduct}, the Ansible Control Host will run as a Docker container. +The Ansible automation for {zdm-shortproduct} is organized into playbooks, each implementing a specific operation. +The machine from which the playbooks are run is known as the Ansible Control Host. +In {zdm-shortproduct}, the Ansible Control Host will run as a Docker container. -You will use the **{zdm-utility}** to set up Ansible in a Docker container, and **{zdm-automation}** to run the Ansible playbooks from the Docker container created by {zdm-utility}. In other words,the {zdm-utility} creates the Docker container acting as the **Ansible Control Host**, from which the {zdm-automation} allows you to deploy and manage the {zdm-proxy} instances and the associated monitoring stack - Prometheus metrics and Grafana visualization of the metric data. +You will use the **{zdm-utility}** to set up Ansible in a Docker container, and **{zdm-automation}** to run the Ansible playbooks from the Docker container created by {zdm-utility}. +In other words,the {zdm-utility} creates the Docker container acting as the **Ansible Control Host**, from which the {zdm-automation} allows you to deploy and manage the {zdm-proxy} instances and the associated monitoring stack - Prometheus metrics and Grafana visualization of the metric data. {zdm-utility} and {zdm-automation} expect that you have already provisioned the recommended infrastructure, as outlined in xref:deployment-infrastructure.adoc[]. @@ -68,29 +85,31 @@ For details, see: == Data migration tools -As part of the overall migration process, you can use {cstar-data-migrator} and/or {dsbulk-migrator} to migrate your data. Or you can use other technologies, such as Apache Spark™, to write your own custom data migration process. +As part of the overall migration process, you can use {cstar-data-migrator} and/or {dsbulk-migrator} to migrate your data. +Other technologies such as Apache Spark™ can be used to write your own custom data migration process. === {cstar-data-migrator} +[TIP] +==== +An important **prerequisite** to use {cstar-data-migrator} is that you already have the matching schema on Target. +==== + Use {cstar-data-migrator} to: -* Migrate your data from any CQL-supported Origin to any CQL-supported Target. Examples of databases that support CQL are Apache Cassandra, DataStax Enterprise and {astra_db}. -* Validate migration accuracy and performance using examples that provide a smaller, randomized data set -* Preserve internal `writetime` timestamps and Time To Live (TTL) values -* Take advantage of advanced data types (Sets, Lists, Maps, UDTs) -* Filter records from the Origin data, using Cassandra's internal `writetime` timestamp -* Use SSL Support, including custom cipher algorithms +* Migrate your data from any CQL-supported Origin to any CQL-supported Target. +Examples of databases that support CQL are Apache Cassandra, DataStax Enterprise and {astra_db}. +* Validate migration accuracy and performance using examples that provide a smaller, randomized data set. +* Preserve internal `writetime` timestamps and Time To Live (TTL) values. +* Take advantage of advanced data types (Sets, Lists, Maps, UDTs). +* Filter records from the Origin data, using Cassandra's internal `writetime` timestamp. +* Use SSL Support, including custom cipher algorithms. Cassandra Data Migrator is designed to: -* Connect to and compare your Target database with Origin -* Report differences in a detailed log file -* Optionally reconcile any missing records and fix any data inconsistencies in Target, if you enable `autocorrect` in a config file - -[TIP] -==== -An important **prerequisite** is that you already have the matching schema on Target. -==== +* Connect to and compare your Target database with Origin. +* Report differences in a detailed log file. +* Optionally reconcile any missing records and fix any data inconsistencies in Target by enabling `autocorrect` in a config file. === {dsbulk-migrator} diff --git a/modules/ROOT/pages/connect-clients-to-proxy.adoc b/modules/ROOT/pages/connect-clients-to-proxy.adoc index 8ecf5579..93f382b8 100644 --- a/modules/ROOT/pages/connect-clients-to-proxy.adoc +++ b/modules/ROOT/pages/connect-clients-to-proxy.adoc @@ -4,9 +4,14 @@ ifdef::env-github,env-browser,env-vscode[:imagesprefix: ../images/] ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] -The {zdm-proxy} is designed to look and feel very much like a conventional Cassandra® cluster. You communicate with it using the CQL query language used in your existing client applications. It understands the same messaging protocols used by Cassandra, DataStax Enterprise (DSE) server and {company} Astra DB. As a result, most of your client applications won't be able to distinguish between connecting to {zdm-proxy} and connecting directly to your Cassandra cluster. +The {zdm-proxy} is designed to look and feel very much like a conventional Cassandra® cluster. +You communicate with it using the CQL query language used in your existing client applications. +It understands the same messaging protocols used by Cassandra, DataStax Enterprise (DSE) server and {company} Astra DB. +As a result, most of your client applications won't be able to distinguish between connecting to {zdm-proxy} and connecting directly to your Cassandra cluster. -On this page, we explain how to connect your client applications to a Cassandra cluster. We then move on to discuss how this process changes when connecting to a {zdm-proxy}. We conclude by describing two sample client applications that serve as a real-world examples of how to build a client application that works effectively with {zdm-proxy}. +On this page, we explain how to connect your client applications to a Cassandra cluster. +We then move on to discuss how this process changes when connecting to a {zdm-proxy}. +We conclude by describing two sample client applications that serve as real-world examples of how to build a client application that works effectively with {zdm-proxy}. You can use the provided sample client applications, in addition to your own, as a quick way to validate that the deployed {zdm-proxy} is reading and writing data from the expected Origin and Target clusters. @@ -16,17 +21,24 @@ Finally, we will explain how to connect the `cqlsh` command-line client to the { At DataStax, we've developed and maintain a set of drivers for client applications to use when connecting to Cassandra, DSE, or Astra DB: -* https://github.com/datastax/java-driver[{company} Java driver for Apache Cassandra^] -* https://github.com/datastax/python-driver[{company} Python driver for Apache Cassandra^] +* https://github.com/datastax/java-driver[{company} Java driver for Apache Cassandra] +* https://github.com/datastax/python-driver[{company} Python driver for Apache Cassandra] * https://github.com/datastax/csharp-driver[{company} C# driver for Apache Cassandra] * https://github.com/datastax/cpp-driver[{company} C/C++ driver for Apache Cassandra] * https://github.com/datastax/nodejs-driver[{company} Node.js driver for Apache Cassandra] -These drivers provide a native implementation of the messaging protocols used to communicate with a Cassandra or DSE cluster or Astra DB. They allow you to execute queries, iterate through results, access metadata about your cluster, and perform other related activities. +These drivers provide a native implementation of the messaging protocols used to communicate with a Cassandra or DSE cluster or Astra DB. +They allow you to execute queries, iterate through results, access metadata about your cluster, and perform other related activities. +[[_connecting_company_drivers_to_cassandra]] == Connecting {company} drivers to Cassandra -Perhaps the simplest way to demonstrate how to use the {company} drivers to connect your client application to a Cassandra cluster is an example in the form of some sample code. But there's a bit of a problem: the {company} drivers are independent projects implemented natively in the relevant programming language. This approach offers the benefit of allowing each project to provide an API that makes the most sense for the language or platform on which it's implemented. Unfortunately it also means there is some variation between languages. With that in mind, the following pseudocode provides reasonable guidance for understanding how a client application might use one of the drivers. +Perhaps the simplest way to demonstrate how to use the {company} drivers to connect your client application to a Cassandra cluster is an example in the form of some sample code. +But there's a bit of a problem: the {company} drivers are independent projects implemented natively in the relevant programming language. + +This approach offers the benefit of allowing each project to provide an API that makes the most sense for the language or platform on which it's implemented. +Unfortunately it also means there is some variation between languages. +With that in mind, the following pseudocode provides reasonable guidance for understanding how a client application might use one of the drivers. [source] ---- @@ -54,38 +66,49 @@ print(release_version) As noted, you'll see some differences in individual drivers: -* New versions of the Java driver no longer define a Cluster object: -** Client programs create a Session directly instead; -* and the Node.js driver has no notion of a Cluster or Session at all, instead using a Client object to represent this functionality. +* New versions of the Java driver no longer define a Cluster object. +Client programs create a Session directly. +* The Node.js driver has no notion of a Cluster or Session at all, instead using a Client object to represent this functionality. The details may vary but you'll still see the same general pattern described in the pseudocode in each of the drivers. -This topic does not describe details or APIs for any of the {company} drivers mentioned above. All the drivers come with a complete set of documentation for exactly this task. The following links provide some good starting points for learning about the interfaces for each specific driver: +This topic does not describe details or APIs for any of the {company} drivers mentioned above. +All the drivers come with a complete set of documentation for exactly this task. +The following links provide some good starting points for learning about the interfaces for each specific driver: -* The https://docs.datastax.com/en/developer/java-driver/latest/manual/core/[core driver section^] of the Java driver manual. -* The https://docs.datastax.com/en/developer/python-driver/latest/getting_started/[getting started guide^] for the Python driver. -* The https://docs.datastax.com/en/developer/csharp-driver/latest/index.html#basic-usage[basic usage section^] of the C# driver documentation. -* The https://docs.datastax.com/en/developer/cpp-driver/latest/topics/[getting started section^] of the C/C++ driver documentation. -* The https://docs.datastax.com/en/developer/nodejs-driver/latest/#basic-usage[basic usage section^] of the Node.js driver documentation. +* The https://docs.datastax.com/en/developer/java-driver/latest/manual/core/[core driver section] of the Java driver manual. +* The https://docs.datastax.com/en/developer/python-driver/latest/getting_started/[getting started guide] for the Python driver. +* The https://docs.datastax.com/en/developer/csharp-driver/latest/index.html#basic-usage[basic usage section] of the C# driver documentation. +* The https://docs.datastax.com/en/developer/cpp-driver/latest/topics/[getting started section] of the C/C++ driver documentation. +* The https://docs.datastax.com/en/developer/nodejs-driver/latest/#basic-usage[basic usage section] of the Node.js driver documentation. [TIP] ==== -The links above lead to the documentation for the most recent version of each driver. You can find the documentation for earlier versions by selecting the appropriate version number from the drop-down menu in the upper right. +The links above lead to the documentation for the most recent version of each driver. +You can find the documentation for earlier versions by selecting the appropriate version number from the drop-down menu in the upper right. ==== == Connecting {company} drivers to {zdm-proxy} -We mentioned above that connecting to a {zdm-proxy} should be almost indistinguishable from connecting directly to your Cassandra cluster. This design decision means there isn't much to say here; everything we discussed in the section above also applies when connecting your {company} driver to a {zdm-proxy}. There are a few extra considerations to keep in mind, though, when using the proxy. +We mentioned above that connecting to a {zdm-proxy} should be almost indistinguishable from connecting directly to your Cassandra cluster. +This design decision means there isn't much to say here; everything we discussed in the section above also applies when connecting your {company} driver to a {zdm-proxy}. +There are a few extra considerations to keep in mind, though, when using the proxy. === Client-side compression -Client applications must not enable client-side compression when connecting through the {zdm-proxy}, as this is not currently supported. This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {zdm-proxy}. +Client applications must not enable client-side compression when connecting through the {zdm-proxy}, as this is not currently supported. +This is disabled by default in all drivers, but if it was enabled in your client application configuration it will have to be temporarily disabled when connecting to the {zdm-proxy}. +[[_client_application_credentials]] === Client application credentials -The credentials provided by the client application are used when forwarding its requests. However, the client application has no notion that there are two clusters involved: from its point of view, it talks to just one cluster as usual. For this reason, the {zdm-proxy} will only use the client application credentials when forwarding requests to one cluster (typically Target), and it will resort to using the credentials in its own configuration to forward requests to the other cluster (typically Origin). + +The credentials provided by the client application are used when forwarding its requests. +However, the client application has no notion that there are two clusters involved: from its point of view, it talks to just one cluster as usual. +For this reason, the {zdm-proxy} will only use the client application credentials when forwarding requests to one cluster (typically Target), and it will resort to using the credentials in its own configuration to forward requests to the other cluster (typically Origin). This means that, if your {zdm-proxy} is configured with an Origin or Target cluster with **user authentication enabled**, your client application has to provide credentials when connecting to the proxy: -* If both clusters require authentication, your client application must pass the credentials for Target. This is also the case if only Target requires authentication but Origin does not. +* If both clusters require authentication, your client application must pass the credentials for Target. +This is also the case if only Target requires authentication but Origin does not. * If Origin requires authentication but Target does not, your client application must supply credentials for Origin. * If neither cluster requires authentication, no credentials are needed. @@ -117,38 +140,48 @@ This means that, if your {zdm-proxy} is configured with an Origin or Target clus image::zdm-proxy-credential-usage.png[ZDM proxy credentials usage, 550] === A note on the Astra Secure Connect Bundle -If your {zdm-proxy} is configured to use Astra DB as an Origin or Target, your client application **does not need** to provide an Astra Secure Connect Bundle (SCB) when connecting to the proxy. It will, however, have to supply the Astra client ID and client secret as a username and password (respectively). + +If your {zdm-proxy} is configured to use Astra DB as an Origin or Target, your client application **does not need** to provide an Astra Secure Connect Bundle (SCB) when connecting to the proxy. +It will, however, have to supply the Astra client ID and client secret as a username and password (respectively). == Sample client applications -The documentation for the {company} drivers provides information about how to connect these drivers to your Cassandra cluster or {zdm-proxy} and how to use them to issue queries, update data and perform other actions. In addition to the smaller code samples provided in the documentation, we also provide a few sample client applications which demonstrate the use of the {company} Java driver to interact with {zdm-proxy} as well as Origin and Target for that proxy. +The documentation for the {company} drivers provides information about how to connect these drivers to your Cassandra cluster or {zdm-proxy} and how to use them to issue queries, update data and perform other actions. +In addition to the smaller code samples provided in the documentation, we also provide a few sample client applications which demonstrate the use of the {company} Java driver to interact with {zdm-proxy} as well as Origin and Target for that proxy. === ZDM Demo Client -https://github.com/alicel/zdm-demo-client/[ZDM Demo Client^] is a minimal Java web application which provides a simple, stripped-down example of an application built to work with {zdm-proxy}. After updating connection information you can compile and run the application locally and interact with it using HTTP clients such as `curl` or `wget`. +https://github.com/alicel/zdm-demo-client/[ZDM Demo Client] is a minimal Java web application which provides a simple, stripped-down example of an application built to work with {zdm-proxy}. +After updating connection information you can compile and run the application locally and interact with it using HTTP clients such as `curl` or `wget`. -You can find the details of building and running ZDM Demo Client in the https://github.com/alicel/zdm-demo-client/blob/master/README.md[README^]. +You can find the details of building and running ZDM Demo Client in the https://github.com/alicel/zdm-demo-client/blob/master/README.md[README]. +[[_themis_client]] === Themis client -https://github.com/absurdfarce/themis[Themis^] is a Java command-line client application that allows you to insert randomly-generated data into some combination of these three sources: +https://github.com/absurdfarce/themis[Themis] is a Java command-line client application that allows you to insert randomly-generated data into some combination of these three sources: * Directly into Origin * Directly into Target * Into the {zdm-proxy}, and subsequently on to Origin and Target -The client application can then be used to query the inserted data. This allows you to validate that the {zdm-proxy} is reading and writing data from the expected sources. Configuration details for the clusters and/or {zdm-proxy} are defined in a YAML file. Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[README^]. +The client application can then be used to query the inserted data. +This allows you to validate that the {zdm-proxy} is reading and writing data from the expected sources. +Configuration details for the clusters and/or {zdm-proxy} are defined in a YAML file. +Details are in the https://github.com/absurdfarce/themis/blob/main/README.md[README]. -In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {zdm-proxy} -- as well as directly to Cassandra or Astra DB clusters -- and perform operations. The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand. +In addition to any utility as a validation tool, Themis also serves as an example of a larger client application which uses the Java driver to connect to a {zdm-proxy} -- as well as directly to Cassandra or Astra DB clusters -- and perform operations. +The configuration logic as well as the cluster and session management code have been cleanly separated into distinct packages to make them easy to understand. == Connecting CQLSH to the {zdm-proxy} -https://downloads.datastax.com/#cqlsh[CQLSH^] is a simple, command-line client that is able to connect to any CQL cluster, enabling you to interactively send CQL requests to it. CQLSH comes pre-installed on any Cassandra or DSE node, or it can be downloaded and run as a standalone client on any machine able to connect to the desired cluster. +https://downloads.datastax.com/#cqlsh[CQLSH] is a simple, command-line client that is able to connect to any CQL cluster, enabling you to interactively send CQL requests to it. +CQLSH comes pre-installed on any Cassandra or DSE node, or it can be downloaded and run as a standalone client on any machine able to connect to the desired cluster. Using CQLSH to connect to a {zdm-proxy} instance is very easy: -* Download CQLSH for free from https://downloads.datastax.com/#cqlsh[here^] on a machine that has connectivity to the {zdm-proxy} instances: +* Download CQLSH for free from https://downloads.datastax.com/#cqlsh[here] on a machine that has connectivity to the {zdm-proxy} instances: ** To connect to the {zdm-proxy}, any version is fine. ** The Astra-ready version additionally supports connecting directly to an Astra DB cluster by passing the cluster's Secure Connect Bundle and valid credentials. * Install it by uncompressing the archive: `tar -xvf cqlsh-<...>.tar.gz`. @@ -164,4 +197,5 @@ For example, if one of your {zdm-proxy} instances has IP Address `172.18.10.34` ./cqlsh 172.18.10.34 14002 -u -p ---- -If the {zdm-proxy} listens on port `9042`, you can omit the port from the command above. If credentials are not required, just omit the `-u` and `-p` options. \ No newline at end of file +If the {zdm-proxy} listens on port `9042`, you can omit the port from the command above. +If credentials are not required, just omit the `-u` and `-p` options. \ No newline at end of file diff --git a/modules/ROOT/pages/connect-clients-to-target.adoc b/modules/ROOT/pages/connect-clients-to-target.adoc index 3ce20460..d0b202aa 100644 --- a/modules/ROOT/pages/connect-clients-to-target.adoc +++ b/modules/ROOT/pages/connect-clients-to-target.adoc @@ -14,17 +14,19 @@ At this point in our migration phases, we've completed: * Phase 4: Changed read routing to Target. -Now we're ready to perform Phase 5, in which we will configure our client applications to connect directly to Target. The way you do this varies based on whether your Target is Astra DB or a regular Apache Cassandra® or DataStax Enterprise cluster. +Now we're ready to perform Phase 5, in which we will configure our client applications to connect directly to Target. +The way you do this varies based on whether your Target is Astra DB or a regular Apache Cassandra® or DataStax Enterprise cluster. -include::partial$lightbox-tip.adoc[] +//include::partial$lightbox-tip.adoc[] image::{imagesprefix}migration-phase5ra.png[Phase 5 diagram shows apps no longer using proxy and instead connected directly to Target.] -For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. +//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. == Configuring your driver to connect to a generic CQL cluster -If your Target is a generic CQL cluster (such as Apache Cassandra or DataStax Enterprise), you can connect your client application to it in a similar way as you previously connected it to Origin, but with the appropriate contact points and any additional configuration that your Target may require. For further information, please refer to the documentation of the driver language and version that you are using. +If your Target is a generic CQL cluster (such as Apache Cassandra or DataStax Enterprise), you can connect your client application to it in a similar way as you previously connected it to Origin, but with the appropriate contact points and any additional configuration that your Target may require. +For further information, please refer to the documentation of the driver language and version that you are using. == Configuring your driver to connect to Astra DB @@ -32,7 +34,7 @@ To connect to your Astra DB cluster, you will need: * A valid set of credentials (ClientID and Client Secret) for the Astra DB organization to which your cluster belongs: ** Note: You will already have used these credentials when you configured the {zdm-proxy} to connect to your Astra DB cluster as Target. -** For more information on creating credentials (tokens), see https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html[here^]. +** For more information on creating credentials (tokens), see https://docs.datastax.com/en/astra/astra-db-vector/administration/manage-application-tokens.html[here]. * The Secure Connect Bundle (SCB) for your Astra DB cluster: ** This is a zip archive containing connection metadata and files to automatically enable Mutual TLS encryption between your client application and Astra DB. ** There is one SCB for each Astra DB cluster (or one for each region of an Astra DB Multi-region cluster). @@ -40,20 +42,21 @@ To connect to your Astra DB cluster, you will need: include::partial$tip-scb.adoc[] -You will also need to check whether the driver used by your client application has native support for the xref:glossary.adoc#_secure_connect_bundle_scb[Astra DB Secure Connect Bundle]. To do so, please refer to the documentation for your driver language and version, -and check the per-language https://docs.datastax.com/en/driver-matrix/docs/version-compatibility.html[driver compatibility matrix^] for details (look for the support status in the **Astra / Cloud** column for your driver version). +You will also need to check whether the driver used by your client application has native support for the xref:glossary.adoc#_secure_connect_bundle_scb[Astra DB Secure Connect Bundle]. +To do so, please refer to the documentation for your driver language and version, +and check the per-language https://docs.datastax.com/en/driver-matrix/docs/version-compatibility.html[driver compatibility matrix] for details (look for the support status in the **Astra / Cloud** column for your driver version). // The SCB support was made available beginning the following versions in the drivers: // -// * https://docs.datastax.com/en/developer/cpp-driver/latest/changelog/#2-14-0[Beginning `2.14.0` of {company} C++ Driver^]. +// * https://docs.datastax.com/en/developer/cpp-driver/latest/changelog/#2-14-0[Beginning `2.14.0` of {company} C++ Driver]. // -// * https://docs.datastax.com/en/developer/csharp-driver/latest/changelog/\#3-12-0[Beginning `3.12.0` of {company} C# Driver^] +// * https://docs.datastax.com/en/developer/csharp-driver/latest/changelog/\#3-12-0[Beginning `3.12.0` of {company} C# Driver] // -// * https://docs.datastax.com/en/developer/java-driver/latest/changelog/#3-8-0[Beginning `3.8.0` & `4.3.0` of {company} Java Driver^]. +// * https://docs.datastax.com/en/developer/java-driver/latest/changelog/#3-8-0[Beginning `3.8.0` & `4.3.0` of {company} Java Driver]. // -// * https://github.com/datastax/nodejs-driver/blob/master/CHANGELOG.md#440[Beginning `4.4.0` of {company} Nodejs Driver^]. +// * https://github.com/datastax/nodejs-driver/blob/master/CHANGELOG.md#440[Beginning `4.4.0` of {company} Nodejs Driver]. // -// * https://docs.datastax.com/en/developer/python-dse-driver/latest/CHANGELOG/#id24[Beginning `2.11.0` & `3.20.0` of {company} Python Driver^]. +// * https://docs.datastax.com/en/developer/python-dse-driver/latest/CHANGELOG/#id24[Beginning `2.11.0` & `3.20.0` of {company} Python Driver]. // // Based on this, follow the instructions in the relevant section below. @@ -73,7 +76,8 @@ Recalling the xref:connect-clients-to-proxy.adoc#_connecting_company_drivers_to_ [source] ---- // Create an object to represent a Cassandra cluster -// Note: there is no need to specify contact points when connecting to Astra DB. All connection information is implicitly passed in the SCB +// Note: there is no need to specify contact points when connecting to Astra DB. +// All connection information is implicitly passed in the SCB Cluster my_cluster = Cluster.build_new_cluster(username="my_AstraDB_client_ID", password="my_AstraDB_client_secret", secure_connect_bundle="/path/to/scb.zip") // Connect our Cluster object to our Cassandra cluster, returning a Session @@ -93,37 +97,43 @@ my_cluster.close() print(release_version) ---- -As noted before, this pseudocode is just a guideline to illustrate the changes that are needed. For the specific syntax that applies to your driver, please refer to the documentation for your driver language and version: +As noted before, this pseudocode is just a guideline to illustrate the changes that are needed. +For the specific syntax that applies to your driver, please refer to the documentation for your driver language and version: -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-cplusplus.html[C++ driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-cplusplus.html[C++ driver]. -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-csharp.html[C# driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-csharp.html[C# driver]. -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-java.html[Java driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-java.html[Java driver]. -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-nodejs.html[Node.js driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-nodejs.html[Node.js driver]. -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-python.html[Python driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-python.html[Python driver]. -That's it! Your client application is now able to connect directly to your Astra DB cluster. +That's it! +Your client application is now able to connect directly to your Astra DB cluster. === Drivers without support for the Secure Connect Bundle It is possible to configure older or community-contributed drivers to connect to Astra DB even if they lack built-in SCB support. -To do so, you will need to extract the files from the SCB and use them to enable Mutual TLS in the configuration of your driver. Please see https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/legacy-drivers.html[here^] for detailed instructions for each driver. +To do so, you will need to extract the files from the SCB and use them to enable Mutual TLS in the configuration of your driver. +Please see https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/legacy-drivers.html[here] for detailed instructions for each driver. -Alternatively, you could also consider using https://docs.datastax.com/en/astra-serverless/docs/connect/connecting-to-astra-databases-using-datastax-drivers.html#_cql_proxy[CQL Proxy^], which is an open-source lightweight proxy that abstracts away all Astra-specific connection configuration from your client application. +Alternatively, you could also consider using https://www.datastax.com/blog/easily-connect-apache-cassandra-workloads-to-datastaxs-serverless-dbaas-with-our-cql-proxy[CQL Proxy], which is an open-source lightweight proxy that abstracts away all Astra-specific connection configuration from your client application. === A word on the cloud-native drivers -Now that your client application is running on Astra DB, you can take advantage of many additional features and APIs that Astra DB offers such as gRPC, GraphQL, Document REST APIs and many more. To access these features, you may wish to consider moving to a cloud-native driver. This can be done at any time, as part of the future development and evolution of your client application. +Now that your client application is running on Astra DB, you can take advantage of many additional features and APIs that Astra DB offers such as gRPC, GraphQL, Document REST APIs and many more. +To access these features, you may wish to consider moving to a cloud-native driver. +This can be done at any time, as part of the future development and evolution of your client application. Here are the cloud-native drivers currently available: -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-java.html#_connecting_with_java_cloud_native_driver[Java cloud-native driver^]. -* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-nodejs.html#_connecting_with_node_js_cloud_native_driver[Node.js cloud-native driver^]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-java.html#_connecting_with_java_cloud_native_driver[Java cloud-native driver]. +* https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/connect-nodejs.html#_connecting_with_node_js_cloud_native_driver[Node.js cloud-native driver]. == Phase 5 of migration completed -Until this point, in case of any issues, you could have abandoned the migration and rolled back to connect directly to Origin at any time. From this point onward, the clusters will diverge, and Target is the source of truth for your client applications and data. +Until this point, in case of any issues, you could have abandoned the migration and rolled back to connect directly to Origin at any time. +From this point onward, the clusters will diverge, and Target is the source of truth for your client applications and data. \ No newline at end of file diff --git a/modules/ROOT/pages/contributions.adoc b/modules/ROOT/pages/contributions.adoc index 401c4334..256685c9 100644 --- a/modules/ROOT/pages/contributions.adoc +++ b/modules/ROOT/pages/contributions.adoc @@ -7,7 +7,8 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] The {zdm-proxy} is open source software (OSS). We welcome contributions from the developer community via Pull Requests on a fork, for evaluation by the ZDM team. -The code sources for additional {zdm-product} components -- including {zdm-utility}, {zdm-automation}, {cstar-data-migrator}, and {dsbulk-migrator} -- are available in public GitHub repos, where you may submit feedback and ideas via GitHub Issues. Code contributions for those additional components are not open for PRs at this time. +The code sources for additional {zdm-product} components -- including {zdm-utility}, {zdm-automation}, {cstar-data-migrator}, and {dsbulk-migrator} -- are available in public GitHub repos, where you may submit feedback and ideas via GitHub Issues. +Code contributions for those additional components are not open for PRs at this time. == {zdm-proxy} License @@ -15,10 +16,11 @@ The code sources for additional {zdm-product} components -- including {zdm-utili == Contributor License Agreement -Acceptance of the {company} https://cla.datastax.com/[Contributor License Agreement, window="_blank"] (CLA) is required before we can consider accepting your {zdm-proxy} code contribution. Refer to the https://cla.datastax.com/[CLA terms, window="_blank"] and, if you agree, indicate your acceptance on each Pull Request (PR) that you submit while using the https://github.com/datastax/zdm-proxy[{zdm-proxy} GitHub repository, window="_blank"]. +Acceptance of the {company} https://cla.datastax.com/[Contributor License Agreement, window="_blank"] (CLA) is required before we can consider accepting your {zdm-proxy} code contribution. +Refer to the https://cla.datastax.com/[CLA terms, window="_blank"] and, if you agree, indicate your acceptance on each Pull Request (PR) that you submit while using the https://github.com/datastax/zdm-proxy[{zdm-proxy} GitHub repository, window="_blank"]. // You will see the CLA listed on the standard pull request checklist (TBS) -// for the https://github.com/datastax/zdm-proxy[{zdm-proxy}^] repository. +// for the https://github.com/datastax/zdm-proxy[{zdm-proxy}] repository. == {zdm-proxy} code contributions @@ -28,22 +30,27 @@ The overall procedure: . Fork the repo by clicking the Fork button in the GitHub UI. . Make your changes locally on your fork. Git commit and push only to your fork. . Wait for CI to run successfully in GitHub Actions before submitting a PR. -. Submit a Pull Request (PR) with your forked updates. As noted above, be sure to indicate in the PR's Comments your acceptance (if you agree) with the {company} https://cla.datastax.com/[Contributor License Agreement] (CLA). +. Submit a Pull Request (PR) with your forked updates. +As noted above, be sure to indicate in the PR's Comments your acceptance (if you agree) with the {company} https://cla.datastax.com/[Contributor License Agreement] (CLA). . If you're not yet ready for a review, add "WIP" to the PR name to indicate it's a work in progress. -. Wait for the automated PR workflow to do some checks. Members of the {zdm-proxy} community will review your PR and decide whether to approve and merge it. +. Wait for the automated PR workflow to do some checks. +Members of the {zdm-proxy} community will review your PR and decide whether to approve and merge it. -In addition to potential {zdm-proxy} OSS code contribution, we encourage you to submit feedback and ideas via GitHub Issues in the repo, starting from https://github.com/datastax/zdm-proxy/issues. Add a label to help categorize the issue, such as the complexity level, component name, and other labels you'll find in the repo's Issues display. +In addition to potential {zdm-proxy} OSS code contribution, we encourage you to submit feedback and ideas via GitHub Issues in the repo, starting from https://github.com/datastax/zdm-proxy/issues. +Add a label to help categorize the issue, such as the complexity level, component name, and other labels you'll find in the repo's Issues display. == Submitting GitHub Issues in related public repos -The following {company} {zdm-product} GitHub repos are public. You are welcome to read the source and submit feedback and ideas via GitHub Issues per repo. In addition to the https://github.com/datastax/zdm-proxy[{zdm-proxy}^] open-source repo, refer to: +The following {company} {zdm-product} GitHub repos are public. +You are welcome to read the source and submit feedback and ideas via GitHub Issues per repo. +In addition to the https://github.com/datastax/zdm-proxy[{zdm-proxy}] open-source repo, refer to: -* https://github.com/datastax/zdm-proxy-automation/issues[{zdm-automation}^] repo for Ansible-based {zdm-automation} and {zdm-utility}. +* https://github.com/datastax/zdm-proxy-automation/issues[{zdm-automation}] repo for Ansible-based {zdm-automation} and {zdm-utility}. -* https://github.com/datastax/cassandra-data-migrator/issues[Cassandra Data Migrator^] repo. +* https://github.com/datastax/cassandra-data-migrator/issues[Cassandra Data Migrator] repo. -* https://github.com/datastax/dsbulk-migrator/issues[DSBulk Migrator^] repo. +* https://github.com/datastax/dsbulk-migrator/issues[DSBulk Migrator] repo. -// * https://github.com/datastax/migration-docs/issues[Migration documentation^] repo. +// * https://github.com/datastax/migration-docs/issues[Migration documentation] repo. Again, add a label to help categorize each issue, such as the complexity level, component name, and other labels you'll find in the repo's Issues display. diff --git a/modules/ROOT/pages/create-target.adoc b/modules/ROOT/pages/create-target.adoc index 79a8718c..2f321d03 100644 --- a/modules/ROOT/pages/create-target.adoc +++ b/modules/ROOT/pages/create-target.adoc @@ -6,23 +6,23 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] In this topic, we'll see how to create and prepare a new cluster to be used as Target. -This section covers in detail the steps to prepare a {company} Astra DB serverless database, and also outlines how to create and prepare a different cluster, which could be for example Cassandra 4.0.x or DSE 6.8.x. +This section covers in detail the steps to prepare a {company} {astra-db-serverless} database, and also outlines how to create and prepare a different cluster, which could be for example Cassandra 4.0.x or DSE 6.8.x. == Overview If you intend to use Astra DB as Target for the migration, you will need to: -* Create an Astra DB serverless cluster -* Retrieve its Secure Connect Bundle (SCB) and upload it to the application instances -* Create Astra DB access credentials for the cluster -* Create the client application schema +* Create an {astra-db-serverless} cluster. +* Retrieve its Secure Connect Bundle (SCB) and upload it to the application instances. +* Create Astra DB access credentials for the cluster. +* Create the client application schema. To use a generic Cassandra or DSE cluster, you will have to: -* Provision the infrastructure for your new cluster -* Create the cluster with the desired version of Cassandra or DSE -* Configure the cluster according to your requirements -* Create the client application schema +* Provision the infrastructure for your new cluster. +* Create the cluster with the desired version of Cassandra or DSE. +* Configure the cluster according to your requirements. +* Create the client application schema. == Using an Astra DB database as Target @@ -30,9 +30,10 @@ To use a generic Cassandra or DSE cluster, you will have to: * Access to https://astra.datastax.com[Astra Portal, window="_blank"] on astra.datastax.com. -=== Create an Astra DB serverless cluster +=== Create an {astra-db-serverless} cluster -Log into the Astra Portal and create a serverless Astra DB database. You can start with a Free plan, but consider upgrading during your migration project to an Astra DB Pay As You Go (PAYG) or Enterprise plan, to take advantage of additional functionality -- such as Exporting Metrics to external third-party applications, Bring Your Own Keys, and other features. +Log into the Astra Portal and create a serverless Astra DB database. +You can start with a Free plan, but consider upgrading during your migration project to an Astra DB Pay As You Go (PAYG) or Enterprise plan, to take advantage of additional functionality -- such as Exporting Metrics to external third-party applications, Bring Your Own Keys, and other features. The PAYG and Enterprise plans have many benefits over the Free plan, such as the ability to lift rate limiting, and avoiding hibernation timeouts. @@ -43,23 +44,29 @@ Assign your preferred values for the serverless database: * **Cloud provider**: You can choose your preferred cloud provider among AWS, GCP and Azure (only GCP is available to Free Tier accounts). * **Region**: choose your geographically preferred region - you can subsequently add more regions. -When the Astra DB reaches **Active** status, perform the following steps in an Astra DB user account. Create an IAM token with the "Read/Write User" role. This role will be used by the client application, the {zdm-proxy}, and the {zdm-automation}. +When the Astra DB reaches **Active** status, perform the following steps in an Astra DB user account. +Create an IAM token with the "Read/Write User" role. +This role will be used by the client application, the {zdm-proxy}, and the {zdm-automation}. -In Astra Portal, choose **Organization Settings** (upper left) from the drop-down menu, and then **Token Management**. Select the **Read/Write User** role: +In Astra Portal, choose **Organization Settings** (upper left) from the drop-down menu, and then **Token Management**. +Select the **Read/Write User** role: image::{imagesprefix}zdm-token-management1.png[] -Then click **Generate Token**. Astra console displays the generated values. Example: +Then, click **Generate Token**. +Astra console displays the generated values. +Example: image::{imagesprefix}zdm-tokens-generated.png[] -Save all credentials (Client ID, Client Secret, and Token) in a clearly named file. For example, you can save all three parts of the new credentials to a file called `my_app_readwrite_user` and store it safely. +Save all credentials (Client ID, Client Secret, and Token) in a clearly named file. +For example, you can save all three parts of the new credentials to a file called `my_app_readwrite_user` and store it safely. For more information about role permissions, see link:https://docs.datastax.com/en/astra/docs/manage/org/user-permissions.html[User permissions] in the Astra DB documentation. === Get the Secure Connect Bundle and upload to client instances -Your cluster's xref:glossary.adoc#_secure_connect_bundle_scb[Secure Connect Bundle] (SCB) is a zip file that contains the TLS encryption certificates and other metadata to connect to your database. +Your cluster's https://docs.datastax.com/en/astra/astra-db-vector/drivers/secure-connect-bundle.html#download-the-secure-connect-bundle[Secure Connect Bundle] (SCB) is a zip file that contains the TLS encryption certificates and other metadata to connect to your database. It will be needed by: * Your client application, to connect directly to Astra DB near the end of the migration; @@ -82,21 +89,24 @@ scp -i secure-connect-.zip @ zdm-ansible-container:/home/ubuntu` ... Specify its path in `*_astra_secure_connect_bundle_path`. .. Otherwise, if you wish the automation to download the cluster's Secure Connect Bundle for you, just specify the two following variables: -... `*_astra_db_id`: the cluster's https://docs.datastax.com/en/astra-serverless/docs/astra-faq.html#_where_do_i_find_the_database_id_and_organization_id[database id^]. +... `*_astra_db_id`: the cluster's https://docs.datastax.com/en/astra/astra-db-vector/faqs.html#where-do-i-find-the-organization-id-database-id-or-region-id[database id]. ... `*_astra_token`: the token field from a valid set of credentials for a `R/W User` Astra role (this is the long string that starts with `AstraCS:`). Save the file and exit the editor. @@ -107,39 +116,54 @@ target_astra_token: "AstraCS:dUTGnRs...jeiKoIqyw:01...29dfb7" ---- -The other file you need to be aware of is `zdm_proxy_core_config.yml`. This file contains some global variables that will be used in subsequent steps during the migration. It is good to familiarize yourself with this file, although these configuration variables do not need changing at this time: +The other file you need to be aware of is `zdm_proxy_core_config.yml`. +This file contains some global variables that will be used in subsequent steps during the migration. +It is good to familiarize yourself with this file, although these configuration variables do not need changing at this time: -. `primary_cluster`: which cluster is going to be the primary source of truth. This should be left set to its default value of `ORIGIN` at the start of the migration, and will be changed to `TARGET` after migrating all existing data. -. `read_mode`: leave to its default value of `PRIMARY_ONLY`. See xref:enable-async-dual-reads.adoc[] for more information on this variable. +. `primary_cluster`: which cluster is going to be the primary source of truth. +This should be left set to its default value of `ORIGIN` at the start of the migration, and will be changed to `TARGET` after migrating all existing data. +. `read_mode`: leave to its default value of `PRIMARY_ONLY`. +See xref:enable-async-dual-reads.adoc[] for more information on this variable. . `log_level`: leave to its default of `INFO`. Leave all these variables to their defaults for now. === Enable TLS encryption (optional) -If you wish to enable TLS encryption between the client application and the {zdm-proxy}, or between the {zdm-proxy} and one (or both) self-managed clusters, you will need to specify some additional configuration. To do so, please follow the steps on xref:tls.adoc[this page]. +If you wish to enable TLS encryption between the client application and the {zdm-proxy}, or between the {zdm-proxy} and one (or both) self-managed clusters, you will need to specify some additional configuration. +To do so, please follow the steps on xref:tls.adoc[this page]. +[[_advanced_configuration_optional]] === Advanced configuration (optional) -Here are some additional configuration variables that you may wish to review and change *at deployment time* in specific cases. All these variables are located in `vars/zdm_proxy_advanced_config.yml`. +Here are some additional configuration variables that you may wish to review and change *at deployment time* in specific cases. +All these variables are located in `vars/zdm_proxy_advanced_config.yml`. All advanced configuration variables not listed here are considered mutable and can be changed later if needed (changes can be easily applied to existing deployments in a rolling fashion using the relevant Ansible playbook, as explained later, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]). ==== *Multi-datacenter clusters* -If Origin is a multi-datacenter cluster, you will need to specify the name of the datacenter that the {zdm-proxy} should consider local. To do this, set the property `origin_local_datacenter` to the datacenter name. Likewise, for multi-datacenter Target clusters you will need to set `target_local_datacenter` appropriately. +If Origin is a multi-datacenter cluster, you will need to specify the name of the datacenter that the {zdm-proxy} should consider local. To do this, set the property `origin_local_datacenter` to the datacenter name. +Likewise, for multi-datacenter Target clusters you will need to set `target_local_datacenter` appropriately. -These two variables are located in `vars/zdm_proxy_advanced_config.yml`. Note that this is not relevant for multi-region Astra DB clusters, where this is handled through region-specific Secure Connect Bundles. +These two variables are located in `vars/zdm_proxy_advanced_config.yml`. +Note that this is not relevant for multi-region Astra DB clusters, where this is handled through region-specific Secure Connect Bundles. +[[_ports]] ==== *Ports* -Each {zdm-proxy} instance listens on port 9042 by default, like a regular Cassandra cluster. This can be overridden by setting `zdm_proxy_listen_port` to a different value. This can be useful if the Origin nodes listen on a port that is not 9042 and you want to configure the {zdm-proxy} to listen on that same port to avoid changing the port in your client application configuration. +Each {zdm-proxy} instance listens on port 9042 by default, like a regular Cassandra cluster. +This can be overridden by setting `zdm_proxy_listen_port` to a different value. +This can be useful if the Origin nodes listen on a port that is not 9042 and you want to configure the {zdm-proxy} to listen on that same port to avoid changing the port in your client application configuration. -The {zdm-proxy} exposes metrics on port 14001 by default. This port is used by Prometheus to scrape the application-level proxy metrics. This can be changed by setting `metrics_port` to a different value if desired. +The {zdm-proxy} exposes metrics on port 14001 by default. +This port is used by Prometheus to scrape the application-level proxy metrics. +This can be changed by setting `metrics_port` to a different value if desired. == Use Ansible to deploy the {zdm-proxy} -Now you can run the playbook that you've configured above. From the shell connected to the container, ensure that you are in `/home/ubuntu/zdm-proxy-automation/ansible` and run: +Now you can run the playbook that you've configured above. +From the shell connected to the container, ensure that you are in `/home/ubuntu/zdm-proxy-automation/ansible` and run: [source,bash] ---- @@ -148,18 +172,21 @@ ansible-playbook deploy_zdm_proxy.yml -i zdm_ansible_inventory That's it! A {zdm-proxy} container has been created on each proxy host. +[[_indications_of_success_on_origin_and_target_clusters]] == Indications of success on Origin and Target clusters -The playbook will create one {zdm-proxy} instance for each proxy host listed in the inventory file. It will indicate the operations that it is performing and print out any errors, or a success confirmation message at the end. +The playbook will create one {zdm-proxy} instance for each proxy host listed in the inventory file. +It will indicate the operations that it is performing and print out any errors, or a success confirmation message at the end. Confirm that the ZDM proxies are up and running by using one of the following options: -* Call the `liveness` and `readiness` HTTP endpoints for {zdm-proxy} instances -* Check {zdm-proxy} instances via docker logs +* Call the `liveness` and `readiness` HTTP endpoints for {zdm-proxy} instances. +* Check {zdm-proxy} instances via docker logs. === Call the `liveness` and `readiness` HTTP endpoints -ZDM metrics provide `/health/liveness` and `/health/readiness` HTTP endpoints, which you can call to determine the state of {zdm-proxy} instances. It's often fine to simply submit the `readiness` check to return the proxy's state. +ZDM metrics provide `/health/liveness` and `/health/readiness` HTTP endpoints, which you can call to determine the state of {zdm-proxy} instances. +It's often fine to simply submit the `readiness` check to return the proxy's state. The format: @@ -216,10 +243,10 @@ Result:: -- ==== - === Check {zdm-proxy} instances via docker logs -After running the playbook, you can `ssh` into one of the servers where one of the deployed {zdm-proxy} instances is running. You can do so from within the Ansible container, or directly from the jumphost machine: +After running the playbook, you can `ssh` into one of the servers where one of the deployed {zdm-proxy} instances is running. +You can do so from within the Ansible container, or directly from the jumphost machine: [source,bash] ---- @@ -254,7 +281,8 @@ time="2023-01-13T22:21:42Z" level=info msg="Proxy connected and ready to accept time="2023-01-13T22:21:42Z" level=info msg="Proxy started. Waiting for SIGINT/SIGTERM to shutdown." ---- -Also, you can check the status of the running Docker image. Here's an example with {zdm-proxy} 2.1.0: +Also, you can check the status of the running Docker image. +Here's an example with {zdm-proxy} 2.1.0: [source,bash] ---- @@ -269,33 +297,36 @@ If the {zdm-proxy} instances fail to start up due to mistakes in the configurati ==== With the exception of the Origin and Target credentials and the `primary_cluster` variable, which can all be changed for existing deployments in a rolling fashion, all cluster connection configuration variables are considered immutable and can only be changed by recreating the deployment. -If you wish to change any of the cluster connection configuration variables (other than credentials and `primary_cluster`) on an existing deployment, you will need to re-run the `deploy_zdm_proxy.yml` playbook. This playbook can be run as many times as necessary. +If you wish to change any of the cluster connection configuration variables (other than credentials and `primary_cluster`) on an existing deployment, you will need to re-run the `deploy_zdm_proxy.yml` playbook. +This playbook can be run as many times as necessary. Please note that running the `deploy_zdm_proxy.yml` playbook will result in a brief window of unavailability of the whole {zdm-proxy} deployment while all the {zdm-proxy} instances are torn down and recreated. ==== - +[[_setting_up_the_monitoring_stack]] == Setting up the Monitoring stack The {zdm-automation} enables you to easily set up a self-contained monitoring stack that is preconfigured to collect metrics from your {zdm-proxy} instances and display them in ready-to-use Grafana dashboards. +The monitoring stack is deployed entirely on Docker. +It includes the following components, all deployed as Docker containers: - -The monitoring stack is deployed entirely on Docker. It includes the following components, all deployed as Docker containers: - -* Prometheus node exporter, which runs on each {zdm-proxy} host and makes OS- and host-level metrics available to Prometheus -* Prometheus server, to collect metrics from the {zdm-proxy} process, its Golang runtime and the Prometheus node exporter -* Grafana, to visualize all these metrics in three preconfigured dashboards (see xref:troubleshooting-tips.adoc#how-to-leverage-metrics[this section] of the troubleshooting tips for details) +* Prometheus node exporter, which runs on each {zdm-proxy} host and makes OS- and host-level metrics available to Prometheus. +* Prometheus server, to collect metrics from the {zdm-proxy} process, its Golang runtime and the Prometheus node exporter. +* Grafana, to visualize all these metrics in three preconfigured dashboards (see xref:troubleshooting-tips.adoc#how-to-leverage-metrics[this section] of the troubleshooting tips for details). After running the playbook described here, you will have a fully configured monitoring stack connected to your {zdm-proxy} deployment. [NOTE] ==== -There are no additional prerequisites or dependencies for this playbook to execute. If it is not already present, Docker will automatically be installed by the playbook on your chosen monitoring server. +There are no additional prerequisites or dependencies for this playbook to execute. +If it is not already present, Docker will automatically be installed by the playbook on your chosen monitoring server. ==== === Connect to the Ansible Control Host -Make sure you are connected to the Ansible Control Host docker container. As above, you can do so from the jumphost machine by running: + +Make sure you are connected to the Ansible Control Host docker container. +As above, you can do so from the jumphost machine by running: [source,bash] ---- @@ -327,12 +358,12 @@ ansible-playbook deploy_zdm_monitoring.yml -i zdm_ansible_inventory === Check the Grafana dashboard -In a browser, open http://:3000. +In a browser, open \http://:3000 Login with: -* **username**: admin -* **password**: the password you configured +* *username*: admin +* *password*: the password you configured [TIP] ==== diff --git a/modules/ROOT/pages/deployment-infrastructure.adoc b/modules/ROOT/pages/deployment-infrastructure.adoc index 2375836e..1fe66ffc 100644 --- a/modules/ROOT/pages/deployment-infrastructure.adoc +++ b/modules/ROOT/pages/deployment-infrastructure.adoc @@ -4,11 +4,15 @@ ifdef::env-github,env-browser,env-vscode[:imagesprefix: ../images/] ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] == Choosing where to deploy the proxy -A typical {zdm-proxy} deployment is made up of multiple proxy instances. A minimum of three proxy instances is recommended for any deployment apart from those for demo or local testing purposes. -All {zdm-proxy} instances must be reachable by the client application and must be able to connect to your Origin and Target clusters. The {zdm-proxy} process is lightweight, requiring only a small amount of resources and no storage to persist state (apart from logs). +A typical {zdm-proxy} deployment is made up of multiple proxy instances. +A minimum of three proxy instances is recommended for any deployment apart from those for demo or local testing purposes. -The {zdm-proxy} should be deployed close to your client application instances. This can be on any cloud provider as well as on-premise, depending on your existing infrastructure. +All {zdm-proxy} instances must be reachable by the client application and must be able to connect to your Origin and Target clusters. +The {zdm-proxy} process is lightweight, requiring only a small amount of resources and no storage to persist state (apart from logs). + +The {zdm-proxy} should be deployed close to your client application instances. +This can be on any cloud provider as well as on-premise, depending on your existing infrastructure. If you have a multi-DC cluster with multiple set of client application instances deployed to geographically distributed data centers, you should plan for a separate {zdm-proxy} deployment for each data center. @@ -20,6 +24,7 @@ image::{imagesprefix}zdm-during-migration3.png[Connectivity between client appli To deploy the {zdm-proxy} and its companion monitoring stack, you will have to provision infrastructure that meets the following requirements. +[[_machines]] === Machines We will use the term "machine" to indicate a cloud instance (on any cloud provider), a VM, or a physical server. @@ -51,9 +56,11 @@ We will use the term "machine" to indicate a cloud instance (on any cloud provid [NOTE] ==== -* Scenario: If you have 20 TBs of existing data to be migrated and want to speed up the migration, you could use multiple VMs. For example, you can use four VMs that are the equivalent of an AWS m5.4xlarge, a GCP e2-standard-16 or an Azure D16v5. +* Scenario: If you have 20 TBs of existing data to be migrated and want to speed up the migration, you could use multiple VMs. +For example, you can use four VMs that are the equivalent of an AWS m5.4xlarge, a GCP e2-standard-16 or an Azure D16v5. + -Next, run DSBulk Migrator or Cassandra-Data-Migrator in parallel on each VM with each one responsible for migrating around 5TB of data. If there is one super large table (e.g. 15 TB of 20 TB is in one table), you can choose to migrate this table in three parts on three separate VMs in parallel by splitting the full token range into three parts and migrating the rest of the tables on the fourth VM. +Next, run DSBulk Migrator or Cassandra-Data-Migrator in parallel on each VM with each one responsible for migrating around 5TB of data. +If there is one super large table (e.g. 15 TB of 20 TB is in one table), you can choose to migrate this table in three parts on three separate VMs in parallel by splitting the full token range into three parts and migrating the rest of the tables on the fourth VM. * Ensure that your Origin and Target clusters can handle high traffic from Cassandra Data Migrator or DSBulk in addition to the live traffic from your application. @@ -70,30 +77,43 @@ The {zdm-proxy} machines must be reachable by: * The client application instances, on port 9042 * The monitoring machine on port 14001 * The jumphost on port 22 -* Important: the {zdm-proxy} machines should not be directly accessible by external machines. The only direct access to these machines should be from the jumphost + +[IMPORTANT] +==== +The {zdm-proxy} machines should not be directly accessible by external machines. +The only direct access to these machines should be from the jumphost. +==== The {zdm-proxy} machines must be able to connect to the Origin and Target cluster nodes: -* For self-managed (non-Astra DB) clusters, connectivity is needed to the Cassandra native protocol port (typically 9042) -* For Astra DB clusters, you will need to ensure outbound connectivity to the Astra endpoint indicated in the Secure Connect Bundle. Connectivity over Private Link is also supported. +* For self-managed (non-Astra DB) clusters, connectivity is needed to the Cassandra native protocol port (typically 9042). +* For Astra DB clusters, you will need to ensure outbound connectivity to the Astra endpoint indicated in the Secure Connect Bundle. +Connectivity over Private Link is also supported. The connectivity requirements for the jumphost / monitoring machine are: -* Connecting to the {zdm-proxy} instances: on port 14001 for metrics collection, and on port 22 to run the Ansible automation and for log inspection or troubleshooting -* Allowing incoming ssh connections from outside, potentially from allowed IP ranges only -* Exposing the Grafana UI on port 3000 -* Important: it is strongly recommended **to restrict external access** to this machine to specific IP ranges (for example, the IP range of your corporate networks or trusted VPNs) +* Connecting to the {zdm-proxy} instances: on port 14001 for metrics collection, and on port 22 to run the Ansible automation and for log inspection or troubleshooting. +* Allowing incoming ssh connections from outside, potentially from allowed IP ranges only. +* Exposing the Grafana UI on port 3000. + +[IMPORTANT] +==== +It is strongly recommended **to restrict external access** to this machine to specific IP ranges (for example, the IP range of your corporate networks or trusted VPNs). +==== The {zdm-proxy} and monitoring machines must be able to connect externally, as the automation will download: -* Various software packages (Docker, Prometheus, Grafana); +* Various software packages (Docker, Prometheus, Grafana). * {zdm-proxy} image from DockerHub repo. === Connecting to the ZDM infrastructure from an external machine -To connect to the jumphost from an external machine, ensure that its IP address belongs to a permitted IP range. If you are connecting through a VPN that only intercepts connections to selected destinations, you may have to add a route from your VPN IP gateway to the public IP of the jumphost. +To connect to the jumphost from an external machine, ensure that its IP address belongs to a permitted IP range. +If you are connecting through a VPN that only intercepts connections to selected destinations, you may have to add a route from your VPN IP gateway to the public IP of the jumphost. -To simplify connecting to the jumphost and, through it, to the {zdm-proxy} instances, you can create a custom SSH config file. You can use this template and replace all the placeholders in angle brackets with the appropriate values for your deployment, adding more entries if you have more than three proxy instances. Save this file, for example calling it `zdm_ssh_config`. +To simplify connecting to the jumphost and, through it, to the {zdm-proxy} instances, you can create a custom SSH config file. +You can use this template and replace all the placeholders in angle brackets with the appropriate values for your deployment, adding more entries if you have more than three proxy instances. +Save this file, for example calling it `zdm_ssh_config`. [source,bash] ---- diff --git a/modules/ROOT/pages/dsbulk-migrator.adoc b/modules/ROOT/pages/dsbulk-migrator.adoc index b1ce1e09..b3406605 100644 --- a/modules/ROOT/pages/dsbulk-migrator.adoc +++ b/modules/ROOT/pages/dsbulk-migrator.adoc @@ -6,14 +6,15 @@ Use {dsbulk-migrator} to perform simple migration of smaller data quantities, wh == {dsbulk-migrator} prerequisites * Install or switch to Java 11. -* Install https://maven.apache.org/download.cgi[Maven^] 3.9.x. +* Install https://maven.apache.org/download.cgi[Maven] 3.9.x. * Optionally install https://docs.datastax.com/en/dsbulk/docs/installing/install.html[DSBulk Loader, window="_blank"], if you elect to reference your own external installation of DSBulk, instead of the embedded DSBulk that's in {dsbulk-migrator}. -* Install https://github.com/datastax/simulacron#prerequisites[Simulacron^] 0.12.x and its prerequisites, for integration tests. +* Install https://github.com/datastax/simulacron#prerequisites[Simulacron] 0.12.x and its prerequisites, for integration tests. [[building-dsbulk-migrator]] == Building {dsbulk-migrator} -Building {dsbulk-migrator} is accomplished with Maven. First, clone the git repo to your local machine. Example: +Building {dsbulk-migrator} is accomplished with Maven. First, clone the git repo to your local machine. +Example: [source,bash] ---- @@ -31,30 +32,26 @@ mvn clean package The build produces two distributable fat jars: -* `dsbulk-migrator--embedded-driver.jar` : contains an embedded Java driver; suitable for - live migrations using an external DSBulk, or for script generation. This jar is NOT suitable for - live migrations using an embedded DSBulk, since no DSBulk classes are present. - -* `dsbulk-migrator--embedded-dsbulk.jar`: contains an embedded DSBulk and an embedded Java - driver; suitable for all operations. Note that this jar is much bigger than the previous one, due - to the presence of DSBulk classes. - +* `dsbulk-migrator--embedded-driver.jar` : contains an embedded Java driver; suitable for live migrations using an external DSBulk, or for script generation. +This jar is NOT suitable for live migrations using an embedded DSBulk, since no DSBulk classes are present. +* `dsbulk-migrator--embedded-dsbulk.jar`: contains an embedded DSBulk and an embedded Java driver; suitable for all operations. +Note that this jar is much bigger than the previous one, due to the presence of DSBulk classes. [[testing-dsbulk-migrator]] == Testing {dsbulk-migrator} -The project contains a few integration tests. Run them with: +The project contains a few integration tests. +Run them with: [source,bash] ---- mvn clean verify ---- -The integration tests require https://github.com/datastax/simulacron[Simulacron^]. Be sure to meet -all the https://github.com/datastax/simulacron#prerequisites[Simulacron prerequisites^] before running the +The integration tests require https://github.com/datastax/simulacron[Simulacron]. +Be sure to meet all the https://github.com/datastax/simulacron#prerequisites[Simulacron prerequisites] before running the tests. - [[running-dsbulk-migrator]] == Running {dsbulk-migrator} @@ -72,11 +69,9 @@ When generating a migration script, most options serve as default values in the Note however that, even when generating scripts, this tool still needs to access the Origin cluster in order to gather metadata about the tables to migrate. -When generating a DDL file, only a few options are meaningful. Because standard DSBulk is not used, and the -import cluster is never contacted, import options and DSBulk-related options are ignored. The tool -still needs to access the Origin cluster in order to gather metadata about the keyspaces and tables -for which to generate DDL statements. - +When generating a DDL file, only a few options are meaningful. +Because standard DSBulk is not used, and the import cluster is never contacted, import options and DSBulk-related options are ignored. +The tool still needs to access the Origin cluster in order to gather metadata about the keyspaces and tables for which to generate DDL statements. [[dsbulk-migrator-reference]] == {dsbulk-migrator} reference @@ -84,97 +79,97 @@ for which to generate DDL statements. * xref:#dsbulk-live[Live migration command-line options] * xref:#dsbulk-script[Script generation command-line options] * xref:#dsbulk-ddl[DDL generation command-line options] -* xref:#dsbulk-help[Getting {dsbulk-migrator} help] +* xref:#getting-help-with-dsbulk-migrator[Getting {dsbulk-migrator} help] * xref:#dsbulk-examples[{dsbulk-migrator} examples] [[dsbulk-live]] === Live migration command-line options -The following options are available for the `migrate-live` command. Most options have sensible default values and do not -need to be specified, unless you want to override the default value. +The following options are available for the `migrate-live` command. +Most options have sensible default values and do not need to be specified, unless you want to override the default value. [cols="2,8,14"] |=== | `-c` | `--dsbulk-cmd=CMD` -| The external DSBulk command to use. -Ignored if the embedded DSBulk is being used. -The default is simply 'dsbulk', assuming that the command is available through the `PATH` variable contents. +| The external DSBulk command to use. +Ignored if the embedded DSBulk is being used. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. | `-d` | `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a 'data' subdirectory in the current working directory. -The data directory will be created if it does not exist. -Tables will be exported and imported in subdirectories of the data directory specified here. +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. +Tables will be exported and imported in subdirectories of the data directory specified here. There will be one subdirectory per keyspace in the data directory, then one subdirectory per table in each keyspace directory. | `-e` | `--dsbulk-use-embedded` -| Use the embedded DSBulk version instead of an external one. +| Use the embedded DSBulk version instead of an external one. The default is to use an external DSBulk command. | | `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. +| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. Options `--export-host` and `--export-bundle` are mutually exclusive. | | `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. +| The consistency level to use when exporting data. The default is `LOCAL_QUORUM`. | | `--export-dsbulk-option=OPT=VALUE` -| An extra DSBulk option to use when exporting. -Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. -DSBulk options, including driver options, must be passed as `--long.option.name=`. +| An extra DSBulk option to use when exporting. +Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. +DSBulk options, including driver options, must be passed as `--long.option.name=`. Short options are not supported. | | `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the Origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. +| The host name or IP and, optionally, the port of a node from the Origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. Options `--export-host` and `--export-bundle` are mutually exclusive. | | `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. | | `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. | | `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. +| The maximum number of records to export for each table. +Must be a positive number or `-1`. The default is `-1` (export the entire table). | | `--export-password` -| The password to use to authenticate against the Origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. +| The password to use to authenticate against the Origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. | | `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. This is an advanced setting; you should rarely need to modify the default value. | | `--export-username=STRING` -| The username to use to authenticate against the Origin cluster. +| The username to use to authenticate against the Origin cluster. Options `--export-username` and `--export-password` must be provided together, or not at all. | `-h` @@ -183,56 +178,56 @@ Options `--export-username` and `--export-password` must be provided together, o | | `--import-bundle=PATH` -| The path to a secure connect bundle to connect to the Target cluster, if it's a {company} {astra_db} cluster. +| The path to a secure connect bundle to connect to the Target cluster, if it's a {company} {astra_db} cluster. Options `--import-host` and `--import-bundle` are mutually exclusive. | | `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. +| The consistency level to use when importing data. The default is `LOCAL_QUORUM`. | | `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. The default is `1970-01-01T00:00:00Z`. | | `--import-dsbulk-option=OPT=VALUE` -| An extra DSBulk option to use when importing. -Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. -DSBulk options, including driver options, must be passed as `--long.option.name=`. +| An extra DSBulk option to use when importing. +Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. +DSBulk options, including driver options, must be passed as `--long.option.name=`. Short options are not supported. | | `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the Target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. +| The host name or IP and, optionally, the port of a node from the Target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. | -| `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. +| `--import-max-concurrent-files=NUM\|AUTO` +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. | | `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. | | `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. Failed records will appear in a `load.bad` file in the DSBulk operation directory. | | `--import-password` -| The password to use to authenticate against the Target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. +| The password to use to authenticate against the Target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. | @@ -241,50 +236,50 @@ Omit the parameter value to be prompted for the password interactively. | `-k` | `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. | `-l` | `--dsbulk-log-dir=PATH` -| The directory where DSBulk should store its logs. -The default is a 'logs' subdirectory in the current working directory. -This subdirectory will be created if it does not exist. +| The directory where DSBulk should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. Each DSBulk operation will create a subdirectory in the log directory specified here. | | `--max-concurrent-ops=NUM` -| The maximum number of concurrent operations (exports and imports) to carry. -The default is `1`. -Set this to higher values to allow exports and imports to occur concurrently. +| The maximum number of concurrent operations (exports and imports) to carry. +The default is `1`. +Set this to higher values to allow exports and imports to occur concurrently. For example, with a value of `2`, each table will be imported as soon as it is exported, while the next table is being exported. | | `--skip-truncate-confirmation` -| Skip truncate confirmation before actually truncating tables. +| Skip truncate confirmation before actually truncating tables. Only applicable when migrating counter tables, ignored otherwise. | `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. Case-sensitive table names must be entered in their exact case. | | `--table-types=regular\|counter\|all` -| The table types to migrate. +| The table types to migrate. The default is `all`. | | `--truncate-before-export` -| Truncate tables before the export instead of after. -The default is to truncate after the export. +| Truncate tables before the export instead of after. +The default is to truncate after the export. Only applicable when migrating counter tables, ignored otherwise. | `-w` | `--dsbulk-working-dir=PATH` -| The directory where DSBulk should be executed. -Ignored if the embedded DSBulk is being used. +| The directory where DSBulk should be executed. +Ignored if the embedded DSBulk is being used. If unspecified, it defaults to the current working directory. |=== @@ -293,166 +288,166 @@ If unspecified, it defaults to the current working directory. [[dsbulk-script]] === Script generation command-line options -The following options are available for the `generate-script` command. +The following options are available for the `generate-script` command. Most options have sensible default values and do not need to be specified, unless you want to override the default value. [cols="2,8,14"] |=== -| `-c` +| `-c` | `--dsbulk-cmd=CMD` -| The DSBulk command to use. -The default is simply 'dsbulk', assuming that the command is available through the `PATH` variable contents. +| The DSBulk command to use. +The default is simply `dsbulk`, assuming that the command is available through the `PATH` variable contents. | `-d` | `--data-dir=PATH` | The directory where data will be exported to and imported from. -The default is a 'data' subdirectory in the current working directory. +The default is a `data` subdirectory in the current working directory. The data directory will be created if it does not exist. -| +| | `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. +| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. Options `--export-host` and `--export-bundle` are mutually exclusive. -| +| | `--export-consistency=CONSISTENCY` -| The consistency level to use when exporting data. +| The consistency level to use when exporting data. The default is `LOCAL_QUORUM`. -| +| | `--export-dsbulk-option=OPT=VALUE` -| An extra DSBulk option to use when exporting. -Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. -DSBulk options, including driver options, must be passed as `--long.option.name=`. +| An extra DSBulk option to use when exporting. +Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. +DSBulk options, including driver options, must be passed as `--long.option.name=`. Short options are not supported. -| +| | `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the Origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. +| The host name or IP and, optionally, the port of a node from the Origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. Options `--export-host` and `--export-bundle` are mutually exclusive. -| +| | `--export-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to write to. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent files to write to. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. -| +| | `--export-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. -| +| | `--export-max-records=NUM` -| The maximum number of records to export for each table. -Must be a positive number or `-1`. +| The maximum number of records to export for each table. +Must be a positive number or `-1`. The default is `-1` (export the entire table). -| +| | `--export-password` -| The password to use to authenticate against the Origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. +| The password to use to authenticate against the Origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. -| +| | `--export-splits=NUM\|NC` -| The maximum number of token range queries to generate. -Use the `NC` syntax to specify a multiple of the number of available cores. -For example, `8C` = 8 times the number of available cores. -The default is `8C`. -This is an advanced setting. You should rarely need to modify the default value. - -| +| The maximum number of token range queries to generate. +Use the `NC` syntax to specify a multiple of the number of available cores. +For example, `8C` = 8 times the number of available cores. +The default is `8C`. +This is an advanced setting. +You should rarely need to modify the default value. + +| | `--export-username=STRING` -| The username to use to authenticate against the Origin cluster. +| The username to use to authenticate against the Origin cluster. Options `--export-username` and `--export-password` must be provided together, or not at all. -| `-h` +| `-h` | `--help` | Displays this help text. -| +| | `--import-bundle=PATH` -| The path to a secure connect bundle to connect to the Target cluster, if it's a {company} {astra_db} cluster. +| The path to a secure connect bundle to connect to the Target cluster, if it's a {company} {astra_db} cluster. Options `--import-host` and `--import-bundle` are mutually exclusive. -| +| | `--import-consistency=CONSISTENCY` -| The consistency level to use when importing data. +| The consistency level to use when importing data. The default is `LOCAL_QUORUM`. -| +| | `--import-default-timestamp=` -| The default timestamp to use when importing data. -Must be a valid instant in ISO-8601 syntax. +| The default timestamp to use when importing data. +Must be a valid instant in ISO-8601 syntax. The default is `1970-01-01T00:00:00Z`. -| +| | `--import-dsbulk-option=OPT=VALUE` -| An extra DSBulk option to use when importing. -Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. -DSBulk options, including driver options, must be passed as `--long.option.name=`. +| An extra DSBulk option to use when importing. +Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process. +DSBulk options, including driver options, must be passed as `--long.option.name=`. Short options are not supported. -| +| | `--import-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the Target cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. -Options `--import-host` and `--import-bundle` are mutually exclusive. +| The host name or IP and, optionally, the port of a node from the Target cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. +Options `--import-host` and `--import-bundle` are mutually exclusive. -| +| | `--import-max-concurrent-files=NUM\|AUTO` -| The maximum number of concurrent files to read from. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent files to read from. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. -| +| | `--import-max-concurrent-queries=NUM\|AUTO` -| The maximum number of concurrent queries to execute. -Must be a positive number or the special value `AUTO`. +| The maximum number of concurrent queries to execute. +Must be a positive number or the special value `AUTO`. The default is `AUTO`. -| +| | `--import-max-errors=NUM` -| The maximum number of failed records to tolerate when importing data. -The default is `1000`. +| The maximum number of failed records to tolerate when importing data. +The default is `1000`. Failed records will appear in a `load.bad` file in the DSBulk operation directory. -| +| | `--import-password` -| The password to use to authenticate against the Target cluster. -Options `--import-username` and `--import-password` must be provided together, or not at all. +| The password to use to authenticate against the Target cluster. +Options `--import-username` and `--import-password` must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. -| +| | `--import-username=STRING` | The username to use to authenticate against the Target cluster. Options `--import-username` and `--import-password` must be provided together, or not at all. | `-k` | `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. | `-l` | `--dsbulk-log-dir=PATH` -| The directory where DSBulk should store its logs. -The default is a 'logs' subdirectory in the current working directory. -This subdirectory will be created if it does not exist. +| The directory where DSBulk should store its logs. +The default is a `logs` subdirectory in the current working directory. +This subdirectory will be created if it does not exist. Each DSBulk operation will create a subdirectory in the log directory specified here. - | `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. Case-sensitive table names must be entered in their exact case. | @@ -474,58 +469,58 @@ Most options have sensible default values and do not need to be specified, unles | `-a` | `--optimize-for-astra` -| Produce CQL scripts optimized for {company} {astra_db}. -{astra_db} does not allow some options in DDL statements. +| Produce CQL scripts optimized for {company} {astra_db}. +{astra_db} does not allow some options in DDL statements. Using this {dsbulk-migrator} command option, forbidden {astra_db} options will be omitted from the generated CQL files. | `-d` | `--data-dir=PATH` -| The directory where data will be exported to and imported from. -The default is a 'data' subdirectory in the current working directory. -The data directory will be created if it does not exist. +| The directory where data will be exported to and imported from. +The default is a `data` subdirectory in the current working directory. +The data directory will be created if it does not exist. -| +| | `--export-bundle=PATH` -| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. +| The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a {company} {astra_db} cluster. Options `--export-host` and `--export-bundle` are mutually exclusive. -| +| | `--export-host=HOST[:PORT]` -| The host name or IP and, optionally, the port of a node from the Origin cluster. -If the port is not specified, it will default to `9042`. -This option can be specified multiple times. +| The host name or IP and, optionally, the port of a node from the Origin cluster. +If the port is not specified, it will default to `9042`. +This option can be specified multiple times. Options `--export-host` and `--export-bundle` are mutually exclusive. -| +| | `--export-password` -| The password to use to authenticate against the Origin cluster. -Options `--export-username` and `--export-password` must be provided together, or not at all. +| The password to use to authenticate against the Origin cluster. +Options `--export-username` and `--export-password` must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. -| +| | `--export-username=STRING` -| The username to use to authenticate against the Origin cluster. +| The username to use to authenticate against the Origin cluster. Options `--export-username` and `--export-password` must be provided together, or not at all. -| `-h` +| `-h` | `--help` | Displays this help text. | `-k` | `--keyspaces=REGEX` -| A regular expression to select keyspaces to migrate. -The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. +| A regular expression to select keyspaces to migrate. +The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. | `-t` -| `--tables=REGEX` -| A regular expression to select tables to migrate. -The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. +| `--tables=REGEX` +| A regular expression to select tables to migrate. +The default is to migrate all tables in the keyspaces that were selected for migration with `--keyspaces`. Case-sensitive table names must be entered in their exact case. -| +| | `--table-types=regular\|counter\|all` -| The table types to migrate. +| The table types to migrate. The default is `all`. |=== @@ -548,8 +543,6 @@ For individual command help and each one's options: java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help ---- - - [[dsbulk-examples]] == {dsbulk-migrator} examples diff --git a/modules/ROOT/pages/enable-async-dual-reads.adoc b/modules/ROOT/pages/enable-async-dual-reads.adoc index b29f8cc0..e023de8c 100644 --- a/modules/ROOT/pages/enable-async-dual-reads.adoc +++ b/modules/ROOT/pages/enable-async-dual-reads.adoc @@ -3,17 +3,19 @@ ifdef::env-github,env-browser,env-vscode[:imagesprefix: ../images/] ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] -In this phase, you can optionally enable asynchronous dual reads. The idea is to test performance and verify that Target can handle your application's live request load before cutting over from Origin to Target. +In this phase, you can optionally enable asynchronous dual reads. +The idea is to test performance and verify that Target can handle your application's live request load before cutting over from Origin to Target. -include::partial$lightbox-tip.adoc[] +//include::partial$lightbox-tip.adoc[] image::{imagesprefix}migration-phase3ra.png[Phase 3 diagram shows optional step enabling async dual reads to test performance of Target.] -For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. +//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. [TIP] ==== -As you test the performance on Target, be sure to examine the async read metrics. As noted in the xref:#_validating_performance_and_error_rate[section] below, you can learn more in xref:metrics.adoc#_asynchronous_read_requests_metrics[Asynchronous read requests metrics]. +As you test the performance on Target, be sure to examine the async read metrics. +As noted in the xref:#_validating_performance_and_error_rate[section] below, you can learn more in xref:metrics.adoc#_asynchronous_read_requests_metrics[Asynchronous read requests metrics]. ==== == Steps @@ -38,24 +40,32 @@ To apply this change, run the `rolling_update_zdm_proxy.yml` playbook as explain [NOTE] ==== -This optional phase introduces an additional check to make sure that Target can handle the load without timeouts or unacceptable latencies. You would typically perform this step once you have migrated all the existing data from Origin and completed all validation checks and reconciliation, if necessary. +This optional phase introduces an additional check to make sure that Target can handle the load without timeouts or unacceptable latencies. +You would typically perform this step once you have migrated all the existing data from Origin and completed all validation checks and reconciliation, if necessary. ==== == Asynchronous Dual Reads mode -When using the {zdm-proxy}, all writes are synchronously sent to both Origin and Target. Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). +When using the {zdm-proxy}, all writes are synchronously sent to both Origin and Target. +Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). Before changing the read routing so that reads are routed to Target (phase 4), you may want to temporarily send the reads to both clusters, to make sure that Target can handle the full workload of reads and writes. -If you set the proxy's read mode configuration variable (`read_mode`) to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads will be enabled. That change will result in reads being additionally sent to the secondary cluster. The proxy will return the read response to the client application as soon as the primary cluster's response arrives. +If you set the proxy's read mode configuration variable (`read_mode`) to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads will be enabled. +That change will result in reads being additionally sent to the secondary cluster. +The proxy will return the read response to the client application as soon as the primary cluster's response arrives. -The secondary cluster's response will only be used to track metrics. There will be no impact to the client application if the read fails on the secondary cluster, or if the read performance on the secondary cluster is degraded. Therefore, this feature can be used as a safer way to test the full workload on Target before making the switch to set Target as the primary cluster (phase 4). +The secondary cluster's response will only be used to track metrics. +There will be no impact to the client application if the read fails on the secondary cluster, or if the read performance on the secondary cluster is degraded. +Therefore, this feature can be used as a safer way to test the full workload on Target before making the switch to set Target as the primary cluster (phase 4). [NOTE] ==== -In some cases the additional read requests can cause the write requests to fail or timeout on that cluster. This means that, while this feature provides a way to route read requests to Target with a lower chance of having impact on the client application, it doesn't completely eliminate that chance. +In some cases the additional read requests can cause the write requests to fail or timeout on that cluster. +This means that, while this feature provides a way to route read requests to Target with a lower chance of having impact on the client application, it doesn't completely eliminate that chance. ==== +[[_validating_performance_and_error_rate]] == Validating performance and error rate Because the client application is not impacted by these asynchronous reads, the only way to measure the performance and error rate of these asynchronous reads are: @@ -71,5 +81,7 @@ For more, see xref:metrics.adoc#_asynchronous_read_requests_metrics[Asynchronous [TIP] ==== -Once you are satisfied that your Target cluster is ready and tuned appropriately to handle the production read load, you can decide to switch your sync reads to Target. At this point, be sure to also disable async dual reads by reverting `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY`. This step is explained in more detail in the xref:change-read-routing.adoc[next topic]. +Once you are satisfied that your Target cluster is ready and tuned appropriately to handle the production read load, you can decide to switch your sync reads to Target. +At this point, be sure to also disable async dual reads by reverting `read_mode` in `vars/zdm_proxy_core_config.yml` to `PRIMARY_ONLY`. +This step is explained in more detail in the xref:change-read-routing.adoc[next topic]. ==== diff --git a/modules/ROOT/pages/faqs.adoc b/modules/ROOT/pages/faqs.adoc index 834d76f8..be38b733 100644 --- a/modules/ROOT/pages/faqs.adoc +++ b/modules/ROOT/pages/faqs.adoc @@ -40,12 +40,15 @@ The interactive lab spans the pre-migration prerequisites and each of the five k {company} {zdm-product} includes the following: * xref:glossary.adoc#zdm-proxy[**{zdm-proxy}**] is a service that operates between xref:glossary.adoc#origin[**Origin**], which is your existing cluster, and xref:glossary.adoc#target[**Target**], which is the cluster to which you are migrating. -* **{zdm-automation}** is an Ansible-based tool that allows you to deploy and manage the {zdm-proxy} instances and associated monitoring stack. To simplify its setup, the suite includes the {zdm-utility}. This interactive utility creates a Docker container acting as the Ansible Control Host. The Ansible playbooks constitute the {zdm-automation}. +* **{zdm-automation}** is an Ansible-based tool that allows you to deploy and manage the {zdm-proxy} instances and associated monitoring stack. +To simplify its setup, the suite includes the {zdm-utility}. +This interactive utility creates a Docker container acting as the Ansible Control Host. +The Ansible playbooks constitute the {zdm-automation}. * **{cstar-data-migrator}** is designed to: -** Connect to your clusters and compare the data between Origin and Target -** Report differences in a detailed log file -** Reconcile any missing records and fix any data inconsistencies between Origin and Target, if you enable `autocorrect` in a configuration file -* **{dsbulk-migrator}** is provided to migrate smaller amounts of data from Origin to Target +** Connect to your clusters and compare the data between Origin and Target. +** Report differences in a detailed log file. +** Reconcile any missing records and fix any data inconsistencies between Origin and Target by enabling `autocorrect` in a configuration file. +* **{dsbulk-migrator}** is provided to migrate smaller amounts of data from Origin to Target. * Well-defined steps in this migration documentation, organized as a sequence of phases. == What exactly is {zdm-proxy}? @@ -71,7 +74,9 @@ include::partial$migration-scenarios.adoc[] == Does {zdm-shortproduct} migrate clusters? -{zdm-shortproduct} does not migrate clusters. With {zdm-shortproduct}, we are migrating data and applications *between clusters*. At the end of the migration, your application will be running on your new cluster, which will have been populated with all the relevant data. +{zdm-shortproduct} does not migrate clusters. +With {zdm-shortproduct}, we are migrating data and applications *between clusters*. +At the end of the migration, your application will be running on your new cluster, which will have been populated with all the relevant data. == What challenges does {zdm-shortproduct} solve? @@ -85,42 +90,52 @@ The suite of {zdm-product} tools from {company} is free and open-sourced. == Is there support available if I have questions or issues during our migration? -{zdm-proxy} and related software tools in the migration suite include technical assistance by https://support.datastax.com/s/[{company} Support^] for DSE and Luna subscribers, and Astra DB users who are on an Enterprise plan. +{zdm-proxy} and related software tools in the migration suite include technical assistance by https://support.datastax.com/s/[{company} Support] for DSE and Luna subscribers, and Astra DB users who are on an Enterprise plan. Free and Pay As You Go plan users do not have support access and must raise questions in the {astra_ui} chat. https://www.datastax.com/products/luna[Luna] is a subscription to the Apache Cassandra support and expertise at DataStax. -For any observed problems with the {zdm-proxy}, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue^] in the {zdm-proxy} GitHub repo. +For any observed problems with the {zdm-proxy}, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue] in the {zdm-proxy} GitHub repo. -Additional examples serve as templates, from which you can learn about migrations. {company} does not assume responsibility for making the templates work for specific use cases. +Additional examples serve as templates, from which you can learn about migrations. +{company} does not assume responsibility for making the templates work for specific use cases. == Where are the public GitHub repos? -All the {company} {zdm-product} GitHub repos are public and open source. You are welcome to read the code and submit feedback via GitHub Issues per repo. In addition to sending feedback, you may submit Pull Requests (PRs) for potential inclusion. To submit PRs, you must for first agree to the https://cla.datastax.com/[DataStax Contribution License Agreement (CLA)]. +All the {company} {zdm-product} GitHub repos are public and open source. +You are welcome to read the code and submit feedback via GitHub Issues per repo. +In addition to sending feedback, you may submit Pull Requests (PRs) for potential inclusion. -* https://github.com/datastax/zdm-proxy[{zdm-proxy}^] repo for ZDM Proxy. +To submit PRs, you must for first agree to the https://cla.datastax.com/[DataStax Contribution License Agreement (CLA)]. -* https://github.com/datastax/zdm-proxy-automation[{zdm-automation}^] repo for the Ansible-based {zdm-proxy} Automation, which includes the ZDM Utility. +* https://github.com/datastax/zdm-proxy[{zdm-proxy}] repo for ZDM Proxy. -* https://github.com/datastax/cassandra-data-migrator[cassandra-data-migrator^] repo for the tool that supports migrating larger data quantities as well as detailed verifications and reconciliation options. +* https://github.com/datastax/zdm-proxy-automation[{zdm-automation}] repo for the Ansible-based {zdm-proxy} Automation, which includes the ZDM Utility. -* https://github.com/datastax/dsbulk-migrator[dsbulk-migrator^] repo for the tool that allows simple data migrations without validation and reconciliation capabilities. +* https://github.com/datastax/cassandra-data-migrator[cassandra-data-migrator] repo for the tool that supports migrating larger data quantities as well as detailed verifications and reconciliation options. -// * https://github.com/datastax/migration-docs[Migration documentation^] +* https://github.com/datastax/dsbulk-migrator[dsbulk-migrator] repo for the tool that allows simple data migrations without validation and reconciliation capabilities. + +// * https://github.com/datastax/migration-docs[Migration documentation] == Does {zdm-proxy} support Transport Layer Security (TLS)? Yes, and here's a summary: -* For application-to-proxy TLS, the application is the TLS client and the {zdm-proxy} is the TLS server. One-way TLS and Mutual TLS are both supported. -* For proxy-to-cluster TLS, the {zdm-proxy} acts as the TLS client and the cluster as the TLS server. One-way TLS and Mutual TLS are both supported. -* When the {zdm-proxy} connects to {astra_db} clusters, it always implicitly uses Mutual TLS. This is done through the Secure Connect Bundle (SCB) and does not require any extra configuration. +* For application-to-proxy TLS, the application is the TLS client and the {zdm-proxy} is the TLS server. +One-way TLS and Mutual TLS are both supported. +* For proxy-to-cluster TLS, the {zdm-proxy} acts as the TLS client and the cluster as the TLS server. +One-way TLS and Mutual TLS are both supported. +* When the {zdm-proxy} connects to {astra_db} clusters, it always implicitly uses Mutual TLS. +This is done through the Secure Connect Bundle (SCB) and does not require any extra configuration. For TLS details, see xref:tls.adoc[]. == How does {zdm-proxy} handle Lightweight Transactions (LWTs)? -{zdm-proxy} handles LWTs as write operations. The proxy sends the LWT to Origin and Target clusters concurrently, and waits for a response from both. {zdm-proxy} will return a `success` status to the client if both Origin and Target send successful acknowledgements, or otherwise will return a `failure` status if one or both do not return an acknowledgement. +{zdm-proxy} handles LWTs as write operations. +The proxy sends the LWT to Origin and Target clusters concurrently, and waits for a response from both. +{zdm-proxy} will return a `success` status to the client if both Origin and Target send successful acknowledgements, or otherwise will return a `failure` status if one or both do not return an acknowledgement. What sets LWTs apart from regular writes is that they are conditional. For important details, including the client context for a returned `applied` flag, see xref:feasibility-checklists.adoc#_lightweight_transactions_and_the_applied_flag[Lightweight Transactions and the `applied` flag]. @@ -128,20 +143,28 @@ What sets LWTs apart from regular writes is that they are conditional. For impor {zdm-proxy} should not be deployed as a sidecar. -{zdm-proxy} was designed to mimic a Cassandra cluster. For this reason, we recommend deploying multiple {zdm-proxy} instances, each running on a dedicated machine, instance, or VM. +{zdm-proxy} was designed to mimic a Cassandra cluster. +For this reason, we recommend deploying multiple {zdm-proxy} instances, each running on a dedicated machine, instance, or VM. + +For best performance, this deployment should be close to the client applications (ideally on the same local network) but not co-deployed on the same machines as the client applications. -For best performance, this deployment should be close to the client applications (ideally on the same local network) but not co-deployed on the same machines as the client applications. This way, each client application instance can connect to all {zdm-proxy} instances, just as it would connect to all nodes in a Cassandra cluster (or datacenter). +This way, each client application instance can connect to all {zdm-proxy} instances, just as it would connect to all nodes in a Cassandra cluster (or datacenter). This deployment model gives maximum resilience and failure tolerance guarantees and allows the client application driver to continue using the same load balancing and retry mechanisms that it would normally use. -Conversely, deploying a single {zdm-proxy} instance would undermine this resilience mechanism and create a single point of failure, which could affect the client applications if one or more nodes of the underlying clusters (Origin or Target) go offline. In a sidecar deployment, each client application instance would be connecting to a single {zdm-proxy} instance, and would therefore be exposed to this risk. +Conversely, deploying a single {zdm-proxy} instance would undermine this resilience mechanism and create a single point of failure, which could affect the client applications if one or more nodes of the underlying clusters (Origin or Target) go offline. +In a sidecar deployment, each client application instance would be connecting to a single {zdm-proxy} instance, and would therefore be exposed to this risk. For more information, see xref:deployment-infrastructure.adoc#_choosing_where_to_deploy_the_proxy[Choosing where to deploy the proxy]. == What are the benefits of using a cloud-native database? -When moving your client applications and data from on-premise Cassandra Query Language (CQL) based data stores (Cassandra or DSE) to a cloud-native database (CNDB) like {astra_db}, it's important to acknowledge the fundamental differences ahead. With on-premise infrastructure, of course, you have total control of the datacenter's physical infrastructure, software configurations, and your custom procedures. At the same time, with on-premise clusters you take on the cost of infrastructure resources, maintenance, operations, and personnel. +When moving your client applications and data from on-premise Cassandra Query Language (CQL) based data stores (Cassandra or DSE) to a cloud-native database (CNDB) like {astra_db}, it's important to acknowledge the fundamental differences ahead. + +With on-premise infrastructure, you have total control of the datacenter's physical infrastructure, software configurations, and your custom procedures. +At the same time, with on-premise clusters you take on the cost of infrastructure resources, maintenance, operations, and personnel. Ranging from large enterprises to small teams, IT managers, operators, and developers are realizing that the Total Cost of Ownership (TCO) with cloud solutions is much lower than continuing to run on-prem physical data centers. -A CNDB like {astra_db} is a different environment. Running on proven cloud providers like AWS, Google Cloud, and Azure, {astra_db} greatly reduces complexity and increases convenience by surfacing a subset of configurable settings, providing a well-designed UI known as {astra_ui}, plus a set of APIs and commands to interact with your {astra_db} organizations and databases. +A CNDB like {astra_db} is a different environment. +Running on proven cloud providers like AWS, Google Cloud, and Azure, {astra_db} greatly reduces complexity and increases convenience by surfacing a subset of configurable settings, providing a well-designed UI known as {astra_ui}, plus a set of APIs and commands to interact with your {astra_db} organizations and databases. diff --git a/modules/ROOT/pages/feasibility-checklists.adoc b/modules/ROOT/pages/feasibility-checklists.adoc index 228d73f4..808f4bb6 100644 --- a/modules/ROOT/pages/feasibility-checklists.adoc +++ b/modules/ROOT/pages/feasibility-checklists.adoc @@ -14,13 +14,15 @@ Before starting your migration, refer to the following considerations to ensure include::partial$supported-releases.adoc[] ==== -{zdm-proxy} technically doesn't support `v5`. If `v5` is requested, the proxy handles protocol negotiation so that the client application properly downgrades the protocol version to `v4`. This means that any client application using a recent driver that supports protocol version `v5` can be migrated using the {zdm-proxy} (as long as it does not use v5-specific functionality). +{zdm-proxy} technically doesn't support `v5`. +If `v5` is requested, the proxy handles protocol negotiation so that the client application properly downgrades the protocol version to `v4`. +This means that any client application using a recent driver that supports protocol version `v5` can be migrated using the {zdm-proxy} (as long as it does not use v5-specific functionality). [IMPORTANT] ==== *Thrift is not supported by {zdm-proxy}.* -If you are using a very old driver or cluster version that only supports Thrift then you need to change your client application to use CQL and potentially upgrade your cluster before starting the migration process. +If you are using a very old driver or cluster version that only supports Thrift, you need to change your client application to use CQL and potentially upgrade your cluster before starting the migration process. ==== This means that {zdm-proxy} supports migrations of the following cluster versions (Origin/Target): @@ -29,7 +31,7 @@ This means that {zdm-proxy} supports migrations of the following cluster version Apache Cassandra 2.0 migration support may be introduced when protocol version v2 is supported. * DataStax Enterprise 4.7.1+ and higher versions. DataStax Enterprise 4.6 migration support may be introduced when protocol version v2 is supported. -* {company} {astra_db} (Serverless and Classic) +* {company} {astra_db} (Serverless and Classic). [TIP] ==== @@ -38,22 +40,29 @@ Ensure that you test your client application with Target (connected directly wit == Schema/keyspace compatibility -{zdm-proxy} does not modify or transform CQL statements besides the optional feature that replaces `now()` functions with timestamp literals. See <> for more information about this feature. +{zdm-proxy} does not modify or transform CQL statements besides the optional feature that replaces `now()` functions with timestamp literals. +See <> for more information about this feature. -A CQL statement that your client application sends to {zdm-proxy} must be able to succeed on both clusters. This means that any keyspace that your client application uses must exist on both Origin and Target with the same name (although they can have different replication strategies and durable writes settings). Table names must also match. +A CQL statement that your client application sends to {zdm-proxy} must be able to succeed on both clusters. +This means that any keyspace that your client application uses must exist on both Origin and Target with the same name (although they can have different replication strategies and durable writes settings). +Table names must also match. -The schema doesn't have to be an exact match as long as the CQL statements can be executed successfully on both clusters. For example, if a table has 10 columns but your client application only uses 5 of those columns then you could create that table on Target with just those 5 columns. +The schema doesn't have to be an exact match as long as the CQL statements can be executed successfully on both clusters. +For example, if a table has 10 columns but your client application only uses 5 of those columns then you could create that table on Target with just those 5 columns. -You can also change the primary key in some cases. For example, if your compound primary key is `PRIMARY KEY (A, B)` and you always provide parameters for the `A` and `B` columns in your CQL statements then you could change the key to `PRIMARY KEY (B, A)` when creating the schema on Target because your CQL statements will still run successfully. +You can also change the primary key in some cases. +For example, if your compound primary key is `PRIMARY KEY (A, B)` and you always provide parameters for the `A` and `B` columns in your CQL statements then you could change the key to `PRIMARY KEY (B, A)` when creating the schema on Target because your CQL statements will still run successfully. == Considerations for Astra DB migrations -{company} Astra DB implements guardrails and sets limits to ensure good practices, foster availability, and promote optimal configurations for your databases. Please check the list of https://docs.datastax.com/en/astra-serverless/docs/plan/planning.html#_astra_db_database_guardrails_and_limits[guardrails and limits^] and make sure your application workload can be successful within these limits. +{company} Astra DB implements guardrails and sets limits to ensure good practices, foster availability, and promote optimal configurations for your databases. +Please check the list of https://docs.datastax.com/en/astra-serverless/docs/plan/planning.html#_astra_db_database_guardrails_and_limits[guardrails and limits] and make sure your application workload can be successful within these limits. If you need to make changes to the application or data model to ensure that your workload can run successfully in {company} Astra DB, then you need to do these changes before you start the migration process. It is also highly recommended to perform tests and benchmarks when connected directly to Astra DB prior to the migration, so that you don't find unexpected issues during the migration process. +[[_read_only_applications]] === Read-only applications Read-only applications require special handling only if you are using {zdm-proxy} versions older than 2.1.0. @@ -67,16 +76,18 @@ If you have an existing {zdm-proxy} deployment, you can check which version you If a client application only sends `SELECT` statements to a database connection then you may find that {zdm-proxy} terminates these read-only connections periodically, which may result in request errors if the driver is not configured to retry these requests in these conditions. -This happens because {company} Astra DB terminates idle connections after some inactivity period (usually around 10 minutes). If Astra DB is your Target and a client connection is only sending read requests to the {zdm-proxy}, then the Astra DB connection that is paired to that client connection will remain idle and will be eventually terminated. +This happens because {company} Astra DB terminates idle connections after some inactivity period (usually around 10 minutes). +If Astra DB is your Target and a client connection is only sending read requests to the {zdm-proxy}, then the Astra DB connection that is paired to that client connection will remain idle and will be eventually terminated. A potential workaround is to not connect these read-only client applications to {zdm-proxy}, but you need to ensure that these client applications switch reads to Target at any point after all the data has been migrated and all validation and reconciliation has completed. -Another work around is to implement a mechanism in your client application that creates a new `Session` periodically to avoid the {company} Astra DB inactivity timeout. You can also implement some kind of meaningless write request that the application sends periodically to make sure the {company} Astra DB connection doesn't idle. +Another work around is to implement a mechanism in your client application that creates a new `Session` periodically to avoid the {company} Astra DB inactivity timeout. +You can also implement some kind of meaningless write request that the application sends periodically to make sure the {company} Astra DB connection doesn't idle. ==== *Version 2.1.0 and newer* -This issue is solved in version 2.1.0 of the {zdm-proxy}, which introduces periodic heartbeats to keep alive idle cluster connections. We strongly recommend using version 2.1.0 (or newer) to benefit from this improvement, especially if you have a read-only workload. - +This issue is solved in version 2.1.0 of the {zdm-proxy}, which introduces periodic heartbeats to keep alive idle cluster connections. +We strongly recommend using version 2.1.0 (or newer) to benefit from this improvement, especially if you have a read-only workload. [[non-idempotent-operations]] == Lightweight Transactions and other non-idempotent operations @@ -86,11 +97,13 @@ Examples of non-idempotent operations in CQL are: * Lightweight Transactions (LWTs) * Counter updates * Collection updates with `+=` and `-=` operators -* Non-deterministic functions like `now()` and `uuid()` as mentioned in the prior section +* Non-deterministic functions like `now()` and `uuid()` For more information on how to handle non-deterministic functions please refer to <>. -Given that there are two separate clusters involved, the state of each cluster may be different. For conditional writes, this may create a divergent state for a time. It may not make a difference in many cases, but if non-idempotent operations are used, we recommend a reconciliation phase in the migration before and after switching reads to rely on Target (setting Target as the primary cluster). +Given that there are two separate clusters involved, the state of each cluster may be different. +For conditional writes, this may create a divergent state for a time. +It may not make a difference in many cases, but if non-idempotent operations are used, we recommend a reconciliation phase in the migration before and after switching reads to rely on Target (setting Target as the primary cluster). For details about using the {cstar-data-migrator}, see xref:migrate-and-validate-data.adoc[]. @@ -99,17 +112,32 @@ For details about using the {cstar-data-migrator}, see xref:migrate-and-validate Some application workloads can tolerate inconsistent data in some cases (especially for counter values) in which case you may not need to do anything special to handle those non-idempotent operations. ==== +[[_lightweight_transactions_and_the_applied_flag]] === Lightweight Transactions and the `applied` flag -{zdm-proxy} handles LWTs as write operations. The proxy sends the LWT to Origin and Target clusters concurrently, and waits for a response from both. {zdm-proxy} will return a `success` status to the client if both Origin and Target send successful acknowledgements, or otherwise will return a `failure` status if one or both do not return an acknowledgement. +{zdm-proxy} handles LWTs as write operations. +The proxy sends the LWT to Origin and Target clusters concurrently, and waits for a response from both. +{zdm-proxy} will return a `success` status to the client if both Origin and Target send successful acknowledgements, or otherwise will return a `failure` status if one or both do not return an acknowledgement. -What sets LWTs apart from regular writes is that they are conditional. In other words, a LWT can appear to have been successful (its execution worked as expected). However, the change will be applied only if the LWT's condition was met. Whether the condition was met depends on the state of the data on the cluster. In a migration, the clusters will not be in sync until all existing data has been imported into Target. Up to that point, an LWT's condition can be evaluated differently on each side, leading to a different outcome even though the LWT was technically successful on both sides. +What sets LWTs apart from regular writes is that they are conditional. +In other words, a LWT can appear to have been successful (its execution worked as expected). +However, the change will be applied only if the LWT's condition was met. +Whether the condition was met depends on the state of the data on the cluster. +In a migration, the clusters will not be in sync until all existing data has been imported into Target. +Up to that point, an LWT's condition can be evaluated differently on each side, leading to a different outcome even though the LWT was technically successful on both sides. -The response that a cluster sends after executing a LWT includes a flag called `applied`. This flag tells the client whether the LWT update was actually applied. The status depends on the condition, which in turn depends on the state of the data. When {zdm-proxy} receives a response from both Origin and Target, each response would have its own `applied` flag. +The response that a cluster sends after executing a LWT includes a flag called `applied`. +This flag tells the client whether the LWT update was actually applied. +The status depends on the condition, which in turn depends on the state of the data. +When {zdm-proxy} receives a response from both Origin and Target, each response would have its own `applied` flag. -However, {zdm-proxy} can only return a *single response* to the client. Recall that the client has no knowledge that there are two clusters behind the proxy. Therefore, {zdm-proxy} returns the `applied` flag from the cluster that is *currently used as primary*. If your client has logic that depends on the `applied` flag, be aware that during the migration, you will only have visibility of the flag coming from the primary cluster; that is, the cluster to which synchronous reads are routed. +However, {zdm-proxy} can only return a *single response* to the client. +Recall that the client has no knowledge that there are two clusters behind the proxy. +Therefore, {zdm-proxy} returns the `applied` flag from the cluster that is *currently used as primary*. +If your client has logic that depends on the `applied` flag, be aware that during the migration, you will only have visibility of the flag coming from the primary cluster; that is, the cluster to which synchronous reads are routed. -To reiterate, {zdm-proxy} only returns the `applied` value from the primary cluster, which is the cluster from where read results are returned to the client application (by default, Origin). This means that when you set Target as your primary cluster, the `applied` value returned to the client application will come from Target. +To reiterate, {zdm-proxy} only returns the `applied` value from the primary cluster, which is the cluster from where read results are returned to the client application (by default, Origin). +This means that when you set Target as your primary cluster, the `applied` value returned to the client application will come from Target. == Advanced workloads (DataStax Enterprise) @@ -118,11 +146,14 @@ To reiterate, {zdm-proxy} only returns the `applied` value from the primary clus {zdm-proxy} handles all {company} Graph requests as write requests even if the traversals are read-only. There is no special handling for these requests, so you need to take a look at the traversals that your client application sends and determine whether the traversals are idempotent. If the traversals are non-idempotent then the reconciliation step is needed. -Keep in mind that our recommended tools for data migration and reconciliation are CQL-based, so they can be used for migrations where Origin is a database that uses the new {company} Graph engine released with DSE 6.8, but *cannot be used for the old Graph engine* that older DSE versions relied on. See <> for more information about non-idempotent operations. +Keep in mind that our recommended tools for data migration and reconciliation are CQL-based, so they can be used for migrations where Origin is a database that uses the new {company} Graph engine released with DSE 6.8, but *cannot be used for the old Graph engine* that older DSE versions relied on. +See <> for more information about non-idempotent operations. === Search -Read-only Search workloads can be moved directly from Origin to Target without {zdm-proxy} being involved. If your client application uses Search and also issues writes, or if you need the read routing capabilities from {zdm-proxy}, then you can connect your search workloads to it as long as you are using the {company} drivers to submit these queries. This approach means the queries are regular CQL `SELECT` statements, so {zdm-proxy} handles them as regular read requests. +Read-only Search workloads can be moved directly from Origin to Target without {zdm-proxy} being involved. +If your client application uses Search and also issues writes, or if you need the read routing capabilities from {zdm-proxy}, then you can connect your search workloads to it as long as you are using the {company} drivers to submit these queries. +This approach means the queries are regular CQL `SELECT` statements, so {zdm-proxy} handles them as regular read requests. If you use the HTTP API then you can either modify your applications to use the CQL API instead or you will have to move those applications directly from Origin to Target when the migration is complete if that is acceptable. @@ -131,9 +162,11 @@ If you use the HTTP API then you can either modify your applications to use the The binary protocol used by Cassandra, DSE, and {astra_db} supports optional compression of transport-level requests and responses that reduces network traffic at the cost of CPU overhead. -{zdm-proxy} doesn't support protocol compression at this time. This kind of compression is disabled by default on all of our {company} drivers so if you enabled it on your client application then you will need to disable it before starting the migration process. +{zdm-proxy} doesn't support protocol compression at this time. +This kind of compression is disabled by default on all of our {company} drivers so if you enabled it on your client application then you will need to disable it before starting the migration process. -This is *NOT* related to storage compression which you can configure on a table by table basis with the `compression` table property. Storage/table compression does not affect the client application or {zdm-proxy} in any way. +This is *NOT* related to storage compression which you can configure on a table by table basis with the `compression` table property. +Storage/table compression does not affect the client application or {zdm-proxy} in any way. == Authenticator and Authorizer configuration @@ -155,41 +188,52 @@ The authentication configuration on each cluster can be different between Origin Statements with functions like `now()` and `uuid()` will result in data inconsistency between Origin and Target because the values are computed at cluster level. -If these functions are used for columns that are not part of the primary key, you may find it acceptable to have different values in the two clusters depending on your application business logic. However, if these columns are part of the primary key, the data migration phase will not be successful as there will be data inconsistencies between the two clusters and they will never be in sync. +If these functions are used for columns that are not part of the primary key, you may find it acceptable to have different values in the two clusters depending on your application business logic. +However, if these columns are part of the primary key, the data migration phase will not be successful as there will be data inconsistencies between the two clusters and they will never be in sync. [NOTE] ==== {zdm-shortproduct} does not support the `uuid()` function currently. ==== -{zdm-proxy} is able to compute timestamps and replace `now()` function references with such timestamps in CQL statements at proxy level to ensure that these parameters will have the same value when these statements are sent to both clusters. However, this feature is disabled by default because it might result in performance degradation. We highly recommend that you test this properly before using it in production. Also keep in mind that this feature is only supported for `now()` functions at the moment. To enable this feature, set the configuration variable `replace_cql_function` to `true`. For more, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. +{zdm-proxy} is able to compute timestamps and replace `now()` function references with such timestamps in CQL statements at proxy level to ensure that these parameters will have the same value when these statements are sent to both clusters. +However, this feature is disabled by default because it might result in performance degradation. +We highly recommend that you test this properly before using it in production. +Also keep in mind that this feature is only supported for `now()` functions at the moment. +To enable this feature, set the configuration variable `replace_cql_function` to `true`. +For more, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. -If you find that the performance is not acceptable when this feature is enabled, or the feature doesn't cover a particular function that your client application is using, then you will have to make a change to your client application so that the value is computed locally (at client application level) before the statement is sent to the database. Most drivers have utility methods that help you compute these values locally, please refer to the documentation of the driver you are using. +If you find that the performance is not acceptable when this feature is enabled, or the feature doesn't cover a particular function that your client application is using, then you will have to make a change to your client application so that the value is computed locally (at client application level) before the statement is sent to the database. +Most drivers have utility methods that help you compute these values locally, please refer to the documentation of the driver you are using. == Driver retry policy and query idempotence -As part of the normal migration process, the {zdm-proxy} instances will have to be restarted in between phases to apply configuration changes. From the point of view of the client application, this is a similar behavior to a DSE or Cassandra cluster going through a rolling restart in a non-migration scenario. +As part of the normal migration process, the {zdm-proxy} instances will have to be restarted in between phases to apply configuration changes. +From the point of view of the client application, this is a similar behavior to a DSE or Cassandra cluster going through a rolling restart in a non-migration scenario. If your application already tolerates rolling restarts of your current cluster then you should see no issues when there is a rolling restart of {zdm-proxy} instances. To ensure that your client application retries requests when a database connection is closed you should check the section of your driver's documentation related to retry policies. -Most {company} drivers require a statement to be marked as `idempotent` in order to retry it in case of a connection error (such as the termination of a database connection). This means that these drivers treat statements as *non-idempotent* by default and will *not* retry them in the case of a connection error unless action is taken. Whether you need to take action or not depends on what driver you are using. In this section we outline the default behavior of some of these drivers and provide links to the relevant documentation sections. +Most {company} drivers require a statement to be marked as `idempotent` in order to retry it in case of a connection error (such as the termination of a database connection). +This means that these drivers treat statements as *non-idempotent* by default and will *not* retry them in the case of a connection error unless action is taken. +Whether you need to take action or not depends on what driver you are using. +In this section we outline the default behavior of some of these drivers and provide links to the relevant documentation sections. === {company} Java Driver 4.x -The default retry policy takes idempotence in consideration and the query builder tries to infer idempotence automatically. See this Java 4.x https://docs.datastax.com/en/developer/java-driver/latest/manual/core/idempotence/[query idempotence documentation section^]. +The default retry policy takes idempotence in consideration and the query builder tries to infer idempotence automatically. See this Java 4.x https://docs.datastax.com/en/developer/java-driver/latest/manual/core/idempotence/[query idempotence documentation section]. === {company} Java Driver 3.x -The default retry policy takes idempotence in consideration and the query builder tries to infer idempotence automatically. See this Java 3.x https://docs.datastax.com/en/developer/java-driver/3.11/manual/idempotence/[query idempotence documentation section^]. +The default retry policy takes idempotence in consideration and the query builder tries to infer idempotence automatically. See this Java 3.x https://docs.datastax.com/en/developer/java-driver/3.11/manual/idempotence/[query idempotence documentation section]. This behavior was introduced in version 3.1.0 so prior to this version the default retry policy retried all requests regardless of idempotence. === {company} Nodejs Driver 4.x -The default retry policy takes idempotence in consideration. See this Nodejs 4.x https://docs.datastax.com/en/developer/nodejs-driver/latest/features/speculative-executions/#query-idempotence[query idempotence documentation section^]. +The default retry policy takes idempotence in consideration. See this Nodejs 4.x https://docs.datastax.com/en/developer/nodejs-driver/latest/features/speculative-executions/#query-idempotence[query idempotence documentation section]. === {company} C# Driver 3.x and {company} Python Driver 3.x diff --git a/modules/ROOT/pages/glossary.adoc b/modules/ROOT/pages/glossary.adoc index f43fe52e..02913f42 100644 --- a/modules/ROOT/pages/glossary.adoc +++ b/modules/ROOT/pages/glossary.adoc @@ -5,35 +5,45 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] Here are a few terms used throughout the {company} {zdm-product} documentation and code. +[[_ansible_playbooks]] == Ansible playbooks -A repeatable, re-usable, simple configuration management and multi-machine deployment system, one that is well suited to deploying complex applications. For details about the playbooks available in {zdm-automation}, see: +A repeatable, re-usable, simple configuration management and multi-machine deployment system, one that is well suited to deploying complex applications. +For details about the playbooks available in {zdm-automation}, see: * xref:setup-ansible-playbooks.adoc[]. * xref:deploy-proxy-monitoring.adoc[]. +[[_asynchronous_dual_reads]] == Asynchronous dual reads -An optional testing phase in which reads are sent to both Origin and Target, enabling you to check that the intended Target of your migration can handle the full workload of reads and writes before finalizing the migration and moving off the {zdm-proxy} instances. For details, see xref:enable-async-dual-reads.adoc[]. +An optional testing phase in which reads are sent to both Origin and Target, enabling you to check that the intended Target of your migration can handle the full workload of reads and writes before finalizing the migration and moving off the {zdm-proxy} instances. +For details, see xref:enable-async-dual-reads.adoc[]. == CQL -Cassandra Query Language (CQL) is a query language for the Cassandra database. It includes DDL and DML statements. For details, see the https://docs.datastax.com/en/dse/6.8/cql/cql/cqlQuickReference.html[CQL quick reference^]. - +Cassandra Query Language (CQL) is a query language for the Cassandra database. +It includes DDL and DML statements. +For details, see https://docs.datastax.com/en/astra/astra-db-vector/cql/develop-with-cql.html[Develop with the Cassandra Query Language]. == Dual-write logic -{zdm-proxy} handles your client application's real-time write requests and forwards them to two Cassandra-based clusters (Origin and Target) simultaneously. The dual-write logic in {zdm-proxy} means that you do not need to modify your client application to perform dual writes manually during a migration: {zdm-proxy} takes care of it for you. See the diagram in the xref:introduction.adoc#migration-workflow[workflow introduction]. +{zdm-proxy} handles your client application's real-time write requests and forwards them to two Cassandra-based clusters (Origin and Target) simultaneously. +The dual-write logic in {zdm-proxy} means that you do not need to modify your client application to perform dual writes manually during a migration: {zdm-proxy} takes care of it for you. +See the diagram in the xref:introduction.adoc#migration-workflow[workflow introduction]. [[origin]] == Origin Your existing Cassandra-based cluster, whether it's open-source Apache Cassandra®, DataStax Enterprise (DSE), or Astra DB. -[[primary-cluster]] +[[_primary_cluster]] == Primary cluster -The cluster that is currently considered the "primary" source of truth. While writes are always sent to both clusters, the primary cluster is the one to which all synchronous reads are always sent, and their results are returned to the client application. During a migration, Origin is typically the primary cluster. Near the end of the migration, you'll shift the primary cluster to be Target. +The cluster that is currently considered the "primary" source of truth. +While writes are always sent to both clusters, the primary cluster is the one to which all synchronous reads are always sent, and their results are returned to the client application. +During a migration, Origin is typically the primary cluster. +Near the end of the migration, you'll shift the primary cluster to be Target. For more, see <>. @@ -43,26 +53,35 @@ See xref:glossary.adoc#_ansible_playbooks[Ansible playbooks]. == Proxy -Generally speaking, a proxy is a software class functioning as an interface to something else. The proxy could interface to anything: a network connection, a large object in memory, a file, or some other resource. A proxy is a wrapper or agent object that is being called by the client to access the real serving object behind the scenes. In our context here, see <>. +Generally speaking, a proxy is a software class functioning as an interface to something else. +The proxy could interface to anything: a network connection, a large object in memory, a file, or some other resource. +A proxy is a wrapper or agent object that is being called by the client to access the real serving object behind the scenes. +In our context here, see <>. == Read mirroring -See xref:glossary.adoc#_asynchronous_dual_reads[async dual reads]. +See xref:glossary.adoc#_asynchronous_dual_reads[Asynchronous dual reads]. [[secondary-cluster]] == Secondary cluster During a migration, the secondary cluster is the one that is currently **not** the source of truth. -When using the {zdm-proxy}, all writes are synchronously sent to both Origin and Target. Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). In Phase 3 of a migration, you may (optionally) want to temporarily send the reads to both clusters, to make sure that Target can handle the full workload of reads and writes. +When using the {zdm-proxy}, all writes are synchronously sent to both Origin and Target. +Reads operate differently: with the default read mode, reads are only sent to the primary cluster (Origin by default). +In Phase 3 of a migration, you may (optionally) want to temporarily send the reads to both clusters, to make sure that Target can handle the full workload of reads and writes. -If you set the proxy's read mode configuration variable (`read_mode`) to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads are enabled. That change results in reads being additionally sent to the secondary cluster. +If you set the proxy's read mode configuration variable (`read_mode`) to `DUAL_ASYNC_ON_SECONDARY`, then asynchronous dual reads are enabled. +That change results in reads being additionally sent to the secondary cluster. -For more, see <>. Also see xref:enable-async-dual-reads.adoc[]. +For more, see xref:glossary.adoc#_primary_cluster[Primary cluster]. +Also see xref:enable-async-dual-reads.adoc[]. +[[_secure_connect_bundle_scb]] == Secure Connect Bundle (SCB) -A ZIP file generated in https://astra.datastax.com[Astra Portal^] that contains connection metadata and TLS encryption certificates (but not the database credentials) for your {astra_db} database. For details, see https://docs.datastax.com/en/astra-serverless/docs/connect/secure-connect-bundle.html[Working with the Secure Connect Bundle^]. +A ZIP file generated in https://astra.datastax.com[Astra Portal] that contains connection metadata and TLS encryption certificates (but not the database credentials) for your {astra_db} database. +For details, see https://docs.datastax.com/en/astra-serverless/docs/connect/secure-connect-bundle.html[Working with the Secure Connect Bundle]. [[target]] == Target @@ -71,7 +90,11 @@ The new cluster to which you want to migrate client applications and data with z [[zdm-automation]] == {zdm-automation} -An Ansible-based tool that allows you to deploy and manage the {zdm-proxy} instances and associated monitoring stack. To simplify its setup, the suite includes the {zdm-utility}. This interactive utility creates a Docker container acting as the Ansible Control Host. The Ansible playbooks constitute the {zdm-automation}. + +An Ansible-based tool that allows you to deploy and manage the {zdm-proxy} instances and associated monitoring stack. +To simplify its setup, the suite includes the {zdm-utility}. +This interactive utility creates a Docker container acting as the Ansible Control Host. +The Ansible playbooks constitute the {zdm-automation}. [[zdm-proxy]] == ZDM Proxy diff --git a/modules/ROOT/pages/introduction.adoc b/modules/ROOT/pages/introduction.adoc index bfdae087..1c51f73c 100644 --- a/modules/ROOT/pages/introduction.adoc +++ b/modules/ROOT/pages/introduction.adoc @@ -10,7 +10,7 @@ At {company}, we've developed a set of thoroughly-tested self-service tools, aut We call this product suite {company} {zdm-product} ({zdm-shortproduct}). -{zdm-shortproduct} provides a simple and reliable way for you to migrate applications from any CQL-based cluster (https://cassandra.apache.org/_/index.html[Apache Cassandra®^], https://www.datastax.com/products/datastax-enterprise[DataStax Enterprise (DSE)^], https://www.datastax.com/products/datastax-astra[{astra_db}^], or any type of CQL-based database) to any other CQL-based cluster, without any interruption of service to the client applications and data. +{zdm-shortproduct} provides a simple and reliable way for you to migrate applications from any CQL-based cluster (https://cassandra.apache.org/_/index.html[Apache Cassandra®], https://www.datastax.com/products/datastax-enterprise[DataStax Enterprise (DSE)], https://www.datastax.com/products/datastax-astra[{astra_db}], or any type of CQL-based database) to any other CQL-based cluster, without any interruption of service to the client applications and data. * You can move your application to {astra_db}, DSE, or Cassandra with no downtime and with minimal configuration changes. * Your clusters will be kept in sync at all times by a dual-write logic configuration. @@ -35,7 +35,11 @@ include::partial$migration-scenarios.adoc[] [TIP] ==== -An important migration prerequisite is that you already have the matching schema on Target. A CQL statement that your client application sends to {zdm-proxy} must be able to succeed on both Origin and Target clusters. This means that any keyspace that your client application uses must exist on both Origin and Target with the same name. Table names must also match. For more, see xref:feasibility-checklists.adoc#_schemakeyspace_compatibility[Schema/keyspace compatibility]. +An important migration prerequisite is that you already have the matching schema on Target. +A CQL statement that your client application sends to {zdm-proxy} must be able to succeed on both Origin and Target clusters. +This means that any keyspace that your client application uses must exist on both Origin and Target with the same name. +Table names must also match. +For more, see xref:feasibility-checklists.adoc#_schemakeyspace_compatibility[Schema/keyspace compatibility]. ==== == Migration phases @@ -48,6 +52,68 @@ First, a couple of key terms used throughout the ZDM documentation and software For additional terms, see the xref:glossary.adoc[glossary]. +=== Migration diagram + +Discover the migration concepts, software components, and sequence of operations. + +Your migration project occurs through a sequence of phases, which matches the structure of the {product} documentation. + +The highlighted components in each phase emphasize how your client applications perform read and write operations on your Origin and Target clusters. + +==== Pre-migration client application operations + +Let's look at a pre-migration from a high-level view. +At this point, your client applications are performing read/write operations with an existing CQL-compatible database: Apache Cassandra, DSE, or Astra DB. + +image:pre-migration0ra9.png["Pre-migration environment."] + +''' + +==== Phase 1: Deploy ZDM Proxy and connect client applications + +In this first phase, deploy the ZDM Proxy instances and connect client applications to the proxies. +This phase activates the dual-write logic. +Writes are bifurcated (sent to both Origin and Target), while reads are executed on Origin only. + +image:migration-phase1ra9.png["Migration Phase 1."] + +''' + +==== Phase 2: Migrate data + +In this phase, migrate existing data using Cassandra Data Migrator and/or DSBulk Migrator. +Validate that the migrated data is correct, while continuing to perform dual writes. + +image:migration-phase2ra9.png["Migration Phase 2."] + +''' + +==== Phase 3: Enable asynchronous dual reads + +In this phase, you can optionally enable asynchronous dual reads. +The idea is to test performance and verify that Target can handle your application's live request load before cutting over from Origin to Target. + +image:migration-phase3ra9.png["Migration Phase 3."] + +''' + +==== Phase 4: Route reads to Target + +In this phase, read routing on the ZDM Proxy is switched to Target so that all reads are executed on it, while writes are still sent to both clusters. +In other words, Target becomes the primary cluster. + +image:migration-phase4ra9.png["Migration Phase 4."] + +''' + +==== Phase 5: Connect directly to Target + +In this phase, move your client applications off the ZDM Proxy and connect the apps directly to Target. +Once that happens, the migration is complete. + +image:migration-phase5ra9.png["Migration Phase 5."] + +//// === Migration interactive diagram Click the *Start* button on the interactive diagram below to begin a walkthrough of the migration phases. @@ -111,6 +177,7 @@ In this phase, move your client applications off the ZDM Proxy and connect the a image:migration-phase5ra9.png["Illustrates migration Phase 5, as summarized in the text. Back and Restart buttons are available for navigation within the graphic."] -- ==== +//// == A fun way to learn: {zdm-product} Interactive Lab diff --git a/modules/ROOT/pages/manage-proxy-instances.adoc b/modules/ROOT/pages/manage-proxy-instances.adoc index 9a41bbae..ddc55978 100644 --- a/modules/ROOT/pages/manage-proxy-instances.adoc +++ b/modules/ROOT/pages/manage-proxy-instances.adoc @@ -42,7 +42,8 @@ This is all that is needed. [NOTE] ==== -This playbook simply restarts the existing {zdm-proxy} containers. It does **not** apply any configuration change or change the version. +This playbook simply restarts the existing {zdm-proxy} containers. +It does **not** apply any configuration change or change the version. If you wish to xref:change-mutable-config-variable[apply configuration changes] or xref:_upgrade_the_proxy_version[perform version upgrades] in a rolling fashion, follow the instructions in the respective sections. ==== @@ -59,16 +60,19 @@ The pause between the restart of each {zdm-proxy} instance defaults to 10 second [TIP] ==== -To check the state of your {zdm-proxy} instances, you have a couple of options. See xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[Indications of success on Origin and Target clusters]. +To check the state of your {zdm-proxy} instances, you have a couple of options. +See xref:deploy-proxy-monitoring.adoc#_indications_of_success_on_origin_and_target_clusters[Indications of success on Origin and Target clusters]. ==== == Access the proxy logs To confirm that the {zdm-proxy} instances are operating normally, or investigate any issue, you can view or collect their logs. +[[_view_the_logs]] === View the logs -The {zdm-proxy} runs as a Docker container on each proxy host. Its logs can be viewed by connecting to a proxy host and running the following command. +The {zdm-proxy} runs as a Docker container on each proxy host. +Its logs can be viewed by connecting to a proxy host and running the following command. [source,bash] ---- @@ -77,9 +81,11 @@ docker container logs zdm-proxy-container To leave the logs open and continuously output the latest log messages, append the `--follow` (or `-f`) option to the command above. +[[_collect_the_logs]] === Collect the logs -You can easily retrieve the logs of all {zdm-proxy} instances using a dedicated playbook (`collect_zdm_proxy_logs.yml`). You can view the playbook's configuration values in `vars/zdm_proxy_log_collection_config.yml`, but no changes to it are required. +You can easily retrieve the logs of all {zdm-proxy} instances using a dedicated playbook (`collect_zdm_proxy_logs.yml`). +You can view the playbook's configuration values in `vars/zdm_proxy_log_collection_config.yml`, but no changes to it are required. Connect to the Ansible Control Host container as explained above and run: @@ -107,14 +113,18 @@ The following configuration variables are considered mutable and can be changed Commonly changed variables, located in `vars/zdm_proxy_core_config.yml`: * `primary_cluster`: -** This variable determines which cluster is currently considered the xref:glossary.adoc#_primary_cluster[primary cluster]. At the start of the migration, the primary cluster is Origin, as it contains all the data. In Phase 4 of the migration, once all the existing data has been transferred and any validation/reconciliation step has been successfully executed, you can switch the primary cluster to be Target. +** This variable determines which cluster is currently considered the xref:glossary.adoc#_primary_cluster[primary cluster]. +At the start of the migration, the primary cluster is Origin, as it contains all the data. +In Phase 4 of the migration, once all the existing data has been transferred and any validation/reconciliation step has been successfully executed, you can switch the primary cluster to be Target. ** Valid values: `ORIGIN`, `TARGET`. * `read_mode`: ** This variable determines how reads are handled by the {zdm-proxy}. ** Valid values: *** `PRIMARY_ONLY`: reads are only sent synchronously to the primary cluster. This is the default behavior. -*** `DUAL_ASYNC_ON_SECONDARY`: reads are sent synchronously to the primary cluster and also asynchronously to the secondary cluster. See xref:enable-async-dual-reads.adoc[]. -** Typically, when choosing `DUAL_ASYNC_ON_SECONDARY` you will want to ensure that `primary_cluster` is still set to `ORIGIN`. When you are ready to use Target as the primary cluster, you should revert `read_mode` to `PRIMARY_ONLY`. +*** `DUAL_ASYNC_ON_SECONDARY`: reads are sent synchronously to the primary cluster and also asynchronously to the secondary cluster. +See xref:enable-async-dual-reads.adoc[]. +** Typically, when choosing `DUAL_ASYNC_ON_SECONDARY` you will want to ensure that `primary_cluster` is still set to `ORIGIN`. +When you are ready to use Target as the primary cluster, you should revert `read_mode` to `PRIMARY_ONLY`. * `log_level`: ** Defaults to `INFO`. ** Only set to `DEBUG` if necessary and revert to `INFO` as soon as possible, as the extra logging can have a slight performance impact. @@ -125,39 +135,54 @@ Other, rarely changed variables: * Target username/password, in `vars/zdm_proxy_cluster_config.yml`) * Advanced configuration variables, located in `vars/zdm_proxy_advanced_config.yml`: ** `zdm_proxy_max_clients_connections`: -*** Maximum number of client connections that the {zdm-proxy} should accept. Each client connection results in additional cluster connections and causes the allocation of several in-memory structures, so this variable can be tweaked to cap the total number on each instance. A high number of client connections per proxy instance may cause some performance degradation, especially at high throughput. +*** Maximum number of client connections that the {zdm-proxy} should accept. +Each client connection results in additional cluster connections and causes the allocation of several in-memory structures, so this variable can be tweaked to cap the total number on each instance. +A high number of client connections per proxy instance may cause some performance degradation, especially at high throughput. *** Defaults to `1000`. ** `replace_cql_functions`: *** Whether the {zdm-proxy} should replace standard CQL function calls in write requests with a value computed at proxy level. *** Currently, only the replacement of `now()` is supported. -*** Boolean value. Disabled by default. Enabling this will have a noticeable performance impact. +*** Boolean value. +Disabled by default. +Enabling this will have a noticeable performance impact. ** `zdm_proxy_request_timeout_ms`: *** Global timeout (in ms) of a request at proxy level. -*** This variable determines how long the {zdm-proxy} will wait for one cluster (in case of reads) or both clusters (in case of writes) to reply to a request. If this timeout is reached, the {zdm-proxy} will abandon that request and no longer consider it as pending, thus freeing up the corresponding internal resources. Note that, in this case, the {zdm-proxy} will not return any result or error: when the client application's own timeout is reached, the driver will time out the request on its side. -*** Defaults to `10000` ms. If your client application has a higher client-side timeout because it is expected to generate requests that take longer to complete, you need to increase this timeout accordingly. +*** This variable determines how long the {zdm-proxy} will wait for one cluster (in case of reads) or both clusters (in case of writes) to reply to a request. +If this timeout is reached, the {zdm-proxy} will abandon that request and no longer consider it as pending, thus freeing up the corresponding internal resources. +Note that, in this case, the {zdm-proxy} will not return any result or error: when the client application's own timeout is reached, the driver will time out the request on its side. +*** Defaults to `10000` ms. +If your client application has a higher client-side timeout because it is expected to generate requests that take longer to complete, you need to increase this timeout accordingly. ** `origin_connection_timeout_ms` and `target_connection_timeout_ms`: *** Timeout (in ms) when attempting to establish a connection from the proxy to Origin or Target. *** Defaults to `30000` ms. ** `async_handshake_timeout_ms`: *** Timeout (in ms) when performing the initialization (handshake) of a proxy-to-secondary cluster connection that will be used solely for asynchronous dual reads. -*** If this timeout occurs, the asynchronous reads will not be sent. This has no impact on the handling of synchronous requests: the {zdm-proxy} will continue to handle all synchronous reads and writes normally. +*** If this timeout occurs, the asynchronous reads will not be sent. +This has no impact on the handling of synchronous requests: the {zdm-proxy} will continue to handle all synchronous reads and writes normally. *** Defaults to `4000` ms. ** `heartbeat_interval_ms`: -*** Frequency (in ms) with which heartbeats will be sent on cluster connections (i.e. all control and request connections to Origin and Target). Heartbeats keep idle connections alive. +*** Frequency (in ms) with which heartbeats will be sent on cluster connections (i.e. all control and request connections to Origin and Target). +Heartbeats keep idle connections alive. *** Defaults to `30000` ms. ** `metrics_enabled`: *** Whether metrics collection should be enabled. -*** Boolean value. Defaults to `true`, but can be set to `false` to completely disable metrics collection. This is not recommended. +*** Boolean value. +Defaults to `true`, but can be set to `false` to completely disable metrics collection. +This is not recommended. ** [[zdm_proxy_max_stream_ids]]`zdm_proxy_max_stream_ids`: -*** In the CQL protocol every request has a unique id, named stream id. This variable allows you to tune the maximum pool size of the available stream ids managed by the {zdm-proxy} per client connection. In the application client, the stream ids are managed internally by the driver, and in most drivers the max number is 2048 (the same default value used in the proxy). If you have a custom driver configuration with a higher value, you should change this property accordingly. +*** In the CQL protocol every request has a unique id, named stream id. +This variable allows you to tune the maximum pool size of the available stream ids managed by the {zdm-proxy} per client connection. +In the application client, the stream ids are managed internally by the driver, and in most drivers the max number is 2048 (the same default value used in the proxy). +If you have a custom driver configuration with a higher value, you should change this property accordingly. *** Defaults to `2048`. Deprecated variables, which will be removed in a future {zdm-proxy} release: * `forward_client_credentials_to_origin`: ** Whether the credentials provided by the client application are for Origin. -** Boolean value. Defaults to `false` (the client application is expected to pass Target credentials), can be set to `true` if the client passes credentials for Origin instead. +** Boolean value. +Defaults to `false` (the client application is expected to pass Target credentials), can be set to `true` if the client passes credentials for Origin instead. To change any of these variables, edit the desired values in `vars/zdm_proxy_core_config.yml`, `vars/zdm_proxy_cluster_config.yml` (credentials only) and/or `vars/zdm_proxy_advanced_config.yml` (mutable variables only, as listed above). @@ -168,7 +193,9 @@ To apply the configuration changes to the {zdm-proxy} instances in a rolling fas ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory ---- -This playbook operates by recreating each proxy container one by one. The {zdm-proxy} deployment remains available at all times and can be safely used throughout this operation. The playbook automates the following steps: +This playbook operates by recreating each proxy container one by one. +The {zdm-proxy} deployment remains available at all times and can be safely used throughout this operation. +The playbook automates the following steps: . It stops one container gracefully, waiting for it to shut down. . It recreates the container and starts it up. @@ -176,26 +203,31 @@ This playbook operates by recreating each proxy container one by one. The {zdm-p [IMPORTANT] ==== A configuration change is a destructive action because containers are considered immutable. -Note that this will remove the previous container and its logs. Make sure you collect the logs prior to this operation if you want to keep them. +Note that this will remove the previous container and its logs. +Make sure you collect the logs prior to this operation if you want to keep them. ==== . It checks that the container has come up successfully by checking the readiness endpoint: .. If unsuccessful, it repeats the check for six times at 5-second intervals and eventually interrupts the whole process if the check still fails. .. If successful, it waits for 10 seconds and then moves on to the next container. -The pause between the restart of each {zdm-proxy} instance defaults to 10 seconds. If you wish to change this value, you can edit `vars/zdm_playbook_internal_config.yml` (located in `zdm-proxy-automation/ansible/vars`) and set it to the desired number of seconds. +The pause between the restart of each {zdm-proxy} instance defaults to 10 seconds. +If you wish to change this value, you can edit `vars/zdm_playbook_internal_config.yml` (located in `zdm-proxy-automation/ansible/vars`) and set it to the desired number of seconds. [NOTE] ==== All configuration variables that are not listed in this section are considered immutable and can only be changed by recreating the deployment. -If you wish to change any of the immutable configuration variables on an existing deployment, you will need to re-run the deployment playbook (`deploy_zdm_proxy.yml`, as documented in xref:deploy-proxy-monitoring.adoc[this page]). This playbook can be run as many times as necessary. +If you wish to change any of the immutable configuration variables on an existing deployment, you will need to re-run the deployment playbook (`deploy_zdm_proxy.yml`, as documented in xref:deploy-proxy-monitoring.adoc[this page]). +This playbook can be run as many times as necessary. Please note that running the `deploy_zdm_proxy.yml` playbook will result in a brief window of unavailability of the whole {zdm-proxy} deployment while all the {zdm-proxy} instances are torn down and recreated. ==== +[[_upgrade_the_proxy_version]] == Upgrade the proxy version -The {zdm-proxy} version is displayed at startup, in a message such as `Starting ZDM proxy version ...`. It can also be retrieved at any time by using the `version` option as in the following command. +The {zdm-proxy} version is displayed at startup, in a message such as `Starting ZDM proxy version ...`. +It can also be retrieved at any time by using the `version` option as in the following command. Example: @@ -211,7 +243,9 @@ Here's an example for {zdm-proxy} 2.1.x: docker run --rm datastax/zdm-proxy:2.1.x -version ---- -The playbook for configuration changes can also be used to upgrade the {zdm-proxy} version in a rolling fashion. All containers will be recreated with the image of the specified version. The same behavior and observations as above apply here. +The playbook for configuration changes can also be used to upgrade the {zdm-proxy} version in a rolling fashion. +All containers will be recreated with the image of the specified version. +The same behavior and observations as above apply here. To perform an upgrade, change the version tag number to the desired version in `vars/zdm_proxy_container.yml`: @@ -238,14 +272,24 @@ ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory == Scaling operations -{zdm-automation} doesn't provide a way to perform scaling up/down operations in a rolling fashion out of the box. If you need a larger {zdm-proxy} deployment, you have two options: +{zdm-automation} doesn't provide a way to perform scaling up/down operations in a rolling fashion out of the box. +If you need a larger {zdm-proxy} deployment, you have two options: + +. Creating a new deployment and moving your client applications to it. +This is the recommended approach, which can be done through the automation without any downtime. +. Adding more instances to the existing deployment. +This is slightly more manual and requires a brief downtime window. + +The first option requires that you deploy a new {zdm-proxy} cluster on the side, and move the client applications to this new proxy cluster. +This can be done by creating a new {zdm-proxy} deployment with the desired topology on a new set of machines (following the normal process), and then changing the contact points in the application configuration so that the application instances point to the new {zdm-proxy} deployment. -. Creating a new deployment and moving your client applications to it. This is the recommended approach, which can be done through the automation without any downtime. -. Adding more instances to the existing deployment. This is slightly more manual and requires a brief downtime window. +This first option just requires a rolling restart of the application instances (to apply the contact point configuration update) and does not cause any interruption of service, because the application instances can just move seamlessly from the old deployment to the new one, which are able to serve requests straight away. -The first option requires that you deploy a new {zdm-proxy} cluster on the side, and move the client applications to this new proxy cluster. This can be done by creating a new {zdm-proxy} deployment with the desired topology on a new set of machines (following the normal process), and then changing the contact points in the application configuration so that the application instances point to the new {zdm-proxy} deployment. This just requires a rolling restart of the application instances (to apply the contact point configuration update) and does not cause any interruption of service, because the application instances can just move seamlessly from the old deployment to the new one, which are able to serve requests straight away. +The second option consists of changing the topology of an existing ZDM proxy deployment. +For example, let's say that you wish to add three new nodes to an existing six-node deployment. +To do this, you need to amend the inventory file so that it contains one line for each machine where you want a proxy instance to be deployed (in this case, the amended inventory file will contain nine proxy IPs, six of which were already there plus the three new ones) and then run the `deploy_zdm_proxy.yml` playbook again. -The second option consists of changing the topology of an existing ZDM proxy deployment. For example, let's say that you wish to add three new nodes to an existing six-node deployment. To do this, you need to amend the inventory file so that it contains one line for each machine where you want a proxy instance to be deployed (in this case, the amended inventory file will contain nine proxy IPs, six of which were already there plus the three new ones) and then run the `deploy_zdm_proxy.yml` playbook again. This will stop the existing six proxies, destroy them, create a new nine-node deployment from scratch based on the amended inventory and start it up, therefore resulting in a brief interruption of availability of the whole {zdm-proxy} deployment. +This second option will stop the existing six proxies, destroy them, create a new nine-node deployment from scratch based on the amended inventory and start it up, therefore resulting in a brief interruption of availability of the whole {zdm-proxy} deployment. [NOTE] ==== diff --git a/modules/ROOT/pages/metrics.adoc b/modules/ROOT/pages/metrics.adoc index 4d582671..6a06da29 100644 --- a/modules/ROOT/pages/metrics.adoc +++ b/modules/ROOT/pages/metrics.adoc @@ -9,15 +9,17 @@ This topic provides detailed information about the metrics captured by the {zdm- The {zdm-proxy} gathers a large number of metrics, which allows you to gain deep insights into how it is operating with regard to its communication with client applications and clusters, as well as its request handling. -Having visibility on all aspects of the {zdm-proxy}'s behavior is extremely important in the context of a migration of critical client applications, and is a great help in building confidence in the process and troubleshooting any issues. For this reason, we strongly encourage you to monitor the {zdm-proxy}, either by deploying the self-contained monitoring stack provided by the {zdm-automation} or by importing the pre-built Grafana dashboards in your own monitoring infrastructure. +Having visibility on all aspects of the {zdm-proxy}'s behavior is extremely important in the context of a migration of critical client applications, and is a great help in building confidence in the process and troubleshooting any issues. +For this reason, we strongly encourage you to monitor the {zdm-proxy}, either by deploying the self-contained monitoring stack provided by the {zdm-automation} or by importing the pre-built Grafana dashboards in your own monitoring infrastructure. == Retrieving the {zdm-proxy} metrics {zdm-proxy} exposes an HTTP endpoint that returns metrics in the Prometheus format. -{zdm-automation} can deploy Prometheus and Grafana, configuring them automatically, as explained xref:deploy-proxy-monitoring.adoc#_setting_up_the_monitoring_stack[here]. The Grafana dashboards are ready to go with metrics that are being scraped from the {zdm-proxy} instances. +{zdm-automation} can deploy Prometheus and Grafana, configuring them automatically, as explained xref:deploy-proxy-monitoring.adoc#_setting_up_the_monitoring_stack[here]. +The Grafana dashboards are ready to go with metrics that are being scraped from the {zdm-proxy} instances. -If you already have a Grafana deployment then you can import the dashboards from the two ZDM dashboard files from this https://github.com/datastax/zdm-proxy-automation/tree/main/grafana-dashboards[{zdm-automation} GitHub location^]. +If you already have a Grafana deployment then you can import the dashboards from the two ZDM dashboard files from this https://github.com/datastax/zdm-proxy-automation/tree/main/grafana-dashboards[{zdm-automation} GitHub location]. == Grafana dashboard for {zdm-proxy} metrics @@ -32,7 +34,9 @@ image::{imagesprefix}zdm-grafana-proxy-dashboard1.png[Grafana dashboard shows th === Proxy-level metrics * Latency: -** Read Latency: total latency measured by the {zdm-proxy} (including post-processing like response aggregation) for read requests. This metric has two labels (`reads_origin` and `reads_target`): the label that has data will depend on which cluster is receiving the reads, i.e. which cluster is currently considered the xref:glossary.adoc#_primary_cluster[primary cluster]. This is configured by the {zdm-automation} through the variable `primary_cluster`, or directly through the environment variable `ZDM_PRIMARY_CLUSTER` of the {zdm-proxy}. +** Read Latency: total latency measured by the {zdm-proxy} (including post-processing like response aggregation) for read requests. +This metric has two labels (`reads_origin` and `reads_target`): the label that has data will depend on which cluster is receiving the reads, i.e. which cluster is currently considered the xref:glossary.adoc#_primary_cluster[primary cluster]. +This is configured by the {zdm-automation} through the variable `primary_cluster`, or directly through the environment variable `ZDM_PRIMARY_CLUSTER` of the {zdm-proxy}. ** Write Latency: total latency measured by the {zdm-proxy} (including post-processing like response aggregation) for write requests. * Throughput (same structure as the previous latency metrics): @@ -47,8 +51,10 @@ image::{imagesprefix}zdm-grafana-proxy-dashboard1.png[Grafana dashboard shows th ** Cache Misses: meaning, a prepared statement was sent to the {zdm-proxy}, but it wasn't on its cache, so the proxy returned an `UNPREPARED` response to make the driver send the `PREPARE` request again. ** Number of cached prepared statements. -* Request Failure Rates: number of request failures per interval. You can set the interval via the `Error Rate interval` dashboard variable at the top. -** Read Failure Rate: one `cluster` label with two settings: `origin` and `target`. The label that contains data depends on which cluster is currently considered the primary (same as the latency and throughput metrics explained above). +* Request Failure Rates: number of request failures per interval. +You can set the interval via the `Error Rate interval` dashboard variable at the top. +** Read Failure Rate: one `cluster` label with two settings: `origin` and `target`. +The label that contains data depends on which cluster is currently considered the primary (same as the latency and throughput metrics explained above). ** Write Failure Rate: one `failed_on` label with three settings: `origin`, `target` and `both`. *** `failed_on=origin`: the write request failed on Origin ONLY. *** `failed_on=target`: the write request failed on Target ONLY. @@ -60,6 +66,7 @@ image::{imagesprefix}zdm-grafana-proxy-dashboard1.png[Grafana dashboard shows th To see error metrics by error type, see the node-level error metrics on the next section. +[[_node_level_metrics]] === Node-level metrics * Latency: metrics on this bucket are not split by request type like the proxy level latency metrics so writes and reads are mixed together: @@ -73,7 +80,9 @@ To see error metrics by error type, see the node-level error metrics on the next * Number of Used Stream Ids: ** Tracks the total number of used xref:manage-proxy-instances.adoc#zdm_proxy_max_stream_ids[stream ids] ("request ids") per connection type (Origin, Target and Async). -* Number of errors per error type per Origin node and per Target node. Possible values for the `error` type label: +* Number of errors per error type per Origin node and per Target node. +Possible values for the `error` type label: ++ ** `error=client_timeout` ** `error=read_failure` ** `error=read_timeout` @@ -83,15 +92,18 @@ To see error metrics by error type, see the node-level error metrics on the next ** `error=unavailable` ** `error=unprepared` +[[_asynchronous_read_requests_metrics]] === Asynchronous read requests metrics -These metrics are specific to asynchronous reads, so they are only populated if asynchronous dual reads are enabled. This is done by setting the {zdm-automation} variable `read_mode`, or its equivalent environment variable `ZDM_READ_MODE`, to `DUAL_ASYNC_ON_SECONDARY` as explained xref:enable-async-dual-reads.adoc[here]. +These metrics are specific to asynchronous reads, so they are only populated if asynchronous dual reads are enabled. +This is done by setting the {zdm-automation} variable `read_mode`, or its equivalent environment variable `ZDM_READ_MODE`, to `DUAL_ASYNC_ON_SECONDARY` as explained xref:enable-async-dual-reads.adoc[here]. These metrics track: * Latency. * Throughput. -* Number of dedicated connections per node for async reads: whether it's Origin or Target connections depends on the {zdm-proxy} configuration. That is, if the primary cluster is Origin, then the asynchronous reads are sent to Target. +* Number of dedicated connections per node for async reads: whether it's Origin or Target connections depends on the {zdm-proxy} configuration. +That is, if the primary cluster is Origin, then the asynchronous reads are sent to Target. * Number of errors per error type per node. === Insights via the {zdm-proxy} metrics @@ -104,7 +116,8 @@ Some examples of problems manifesting on these metrics: == Go runtime metrics dashboard and system dashboard -This dashboard in Grafana is not as important as the {zdm-proxy} dashboard. However, it may be useful to troubleshoot performance issues. Here you can see memory usage, Garbage Collection (GC) duration, open fds (file descriptors - useful to detect leaked connections), and the number of goroutines: +This dashboard in Grafana is not as important as the {zdm-proxy} dashboard. However, it may be useful to troubleshoot performance issues. +Here you can see memory usage, Garbage Collection (GC) duration, open fds (file descriptors - useful to detect leaked connections), and the number of goroutines: image::{imagesprefix}zdm-golang-dashboard.png[Golang metrics dashboard example is shown.] @@ -115,4 +128,6 @@ Some examples of problem areas on these Go runtime metrics: * Always increasing memory usage. * Always increasing number of goroutines. -The ZDM monitoring stack also includes a system-level dashboard collected through the Prometheus Node Exporter. This dashboard contains hardware and OS-level metrics for the host on which the proxy runs. This can be useful to check the available resources and identify low-level bottlenecks or issues. +The ZDM monitoring stack also includes a system-level dashboard collected through the Prometheus Node Exporter. +This dashboard contains hardware and OS-level metrics for the host on which the proxy runs. +This can be useful to check the available resources and identify low-level bottlenecks or issues. diff --git a/modules/ROOT/pages/migrate-and-validate-data.adoc b/modules/ROOT/pages/migrate-and-validate-data.adoc index 99c664ed..558cf889 100644 --- a/modules/ROOT/pages/migrate-and-validate-data.adoc +++ b/modules/ROOT/pages/migrate-and-validate-data.adoc @@ -12,11 +12,11 @@ For full details, see these topics: These tools provide sophisticated features that help you migrate your data from any Cassandra **Origin** (Apache Cassandra®, {company} Enterprise (DSE), {company} {astra_db}) to any Cassandra **Target** (Apache Cassandra, DSE, {company} {astra_db}). -include::partial$lightbox-tip.adoc[] +//include::partial$lightbox-tip.adoc[] image::{imagesprefix}migration-phase2ra.png[Phase 2 diagram shows using tools to migrate data from Origin to Target.] -For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. +//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. == What's the difference between these data migration tools? @@ -30,9 +30,9 @@ In general: Refer to the following GitHub repos: -* https://github.com/datastax/cassandra-data-migrator[Cassandra Data Migrator^] repo. +* https://github.com/datastax/cassandra-data-migrator[Cassandra Data Migrator] repo. -* https://github.com/datastax/dsbulk-migrator[{dsbulk-migrator}^] repo. +* https://github.com/datastax/dsbulk-migrator[{dsbulk-migrator}] repo. A number of helpful assets are provided in each repo. diff --git a/modules/ROOT/pages/phase1.adoc b/modules/ROOT/pages/phase1.adoc index ce791f60..051e767a 100644 --- a/modules/ROOT/pages/phase1.adoc +++ b/modules/ROOT/pages/phase1.adoc @@ -11,8 +11,8 @@ This section presents the following: * xref:connect-clients-to-proxy.adoc[] * xref:manage-proxy-instances.adoc[] -include::partial$lightbox-tip.adoc[] +//include::partial$lightbox-tip.adoc[] image::{imagesprefix}migration-phase1ra.png[Phase 1 diagram shows deployed ZDM Proxy instances, client app connections to proxies, and Target is setup.] -For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. +//For illustrations of all the migration phases, see the xref:introduction.adoc#_migration_phases[Introduction]. diff --git a/modules/ROOT/pages/release-notes.adoc b/modules/ROOT/pages/release-notes.adoc index 78a74a5e..2e5c9a83 100644 --- a/modules/ROOT/pages/release-notes.adoc +++ b/modules/ROOT/pages/release-notes.adoc @@ -8,7 +8,8 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] **03 February 2023** -Released {zdm-automation} 2.3.0, which enables ansible scripts and terraform to work with both Ubuntu and RedHat-family Linux distributions. Documentation updates included the following in the xref:deployment-infrastructure.adoc#_machines[Machines] section of the Deployment and infrastructure considerations topic: +Released {zdm-automation} 2.3.0, which enables ansible scripts and terraform to work with both Ubuntu and RedHat-family Linux distributions. +Documentation updates included the following in the xref:deployment-infrastructure.adoc#_machines[Machines] section of the Deployment and infrastructure considerations topic: "Ubuntu Linux 20.04 or newer, RedHat Family Linux 7 or newer" @@ -16,20 +17,25 @@ Released {zdm-automation} 2.3.0, which enables ansible scripts and terraform to **31 January 2023** -Starting in version 2.2.0 of the {zdm-automation}, we added the `zdm_proxy_cluster_config.yml` file to contain all the configuration variables for Origin and Target. Prior to version 2.2.0, the variables were in the `zdm_proxy_core_config.yml` file. +Starting in version 2.2.0 of the {zdm-automation}, we added the `zdm_proxy_cluster_config.yml` file to contain all the configuration variables for Origin and Target. +Prior to version 2.2.0, the variables were in the `zdm_proxy_core_config.yml` file. [TIP] ==== -This change is backward compatible. If you previously populated the variables in `zdm_proxy_core_config.yml`, these variables will be honored and take precedence over any variables in `zdm_proxy_cluster_config.yml`, if both files are present. +This change is backward compatible. +If you previously populated the variables in `zdm_proxy_core_config.yml`, these variables will be honored and take precedence over any variables in `zdm_proxy_cluster_config.yml`, if both files are present. ==== -We encourage existing 2.x ZDM users to upgrade to the 2.3.0 version of {zdm-automation}. To do so, simply `git pull` the `main` branch of https://github.com/datastax/zdm-proxy-automation from within the Ansible Control Host container. You can also check out a https://github.com/datastax/zdm-proxy-automation/releases/tag/v2.3.0[specific tag^], such as 2.3.0. +We encourage existing 2.x ZDM users to upgrade to the 2.3.0 version of {zdm-automation}. +To do so, simply `git pull` the `main` branch of https://github.com/datastax/zdm-proxy-automation from within the Ansible Control Host container. +You can also check out a https://github.com/datastax/zdm-proxy-automation/releases/tag/v2.3.0[specific tag], such as 2.3.0. For more about the YML files used to configure access to your clusters, see xref:deploy-proxy-monitoring.adoc#_configure_the_zdm_proxy[this topic]. [NOTE] ==== -The latest {zdm-proxy} version is still 2.1.0. The latest {zdm-automation} version is 2.3.0. +The latest {zdm-proxy} version is 2.1.0. +The latest {zdm-automation} version is 2.3.1. ==== If you are using a {zdm-automation} version up to and including 2.1.0, please use `zdm_proxy_core_config.yml` to configure access to your clusters. @@ -42,7 +48,8 @@ The ZDM 2.1.0 release adds {zdm-proxy} heartbeat functionality and provides seve The periodic heartbeat feature in 2.1.0 has been implemented to keep alive idle cluster connections. -By default, {zdm-proxy} now sends heartbeats after 30 seconds of inactivity on a cluster connection. You can tune the heartbeat interval with the Ansible configuration variable `heartbeat_insterval_ms`, or by directly setting the `ZDM_HEARTBEAT_INTERVAL_MS` environment variable if you do not use the {zdm-automation}. +By default, {zdm-proxy} now sends heartbeats after 30 seconds of inactivity on a cluster connection. +You can tune the heartbeat interval with the Ansible configuration variable `heartbeat_insterval_ms`, or by directly setting the `ZDM_HEARTBEAT_INTERVAL_MS` environment variable if you do not use the {zdm-automation}. DataStax strongly recommends that you use version 2.1.0 (or newer) to benefit from this improvement, especially if you have a read-only workload. @@ -54,21 +61,25 @@ To find out how to upgrade an existing {zdm-proxy} deployment, see xref:manage-p For the latest information about {zdm-proxy} new features and other changes, please refer to these GitHub-hosted documents in the open-source {zdm-proxy} repo: -* https://github.com/datastax/zdm-proxy/blob/main/RELEASE_NOTES.md[RELEASE_NOTES^] +* https://github.com/datastax/zdm-proxy/blob/main/RELEASE_NOTES.md[RELEASE_NOTES] -* https://github.com/datastax/zdm-proxy/blob/main/CHANGELOG/CHANGELOG-2.1.md[CHANGELOG 2.1^] +* https://github.com/datastax/zdm-proxy/blob/main/CHANGELOG/CHANGELOG-2.1.md[CHANGELOG 2.1] === ZDM 2.1.0 documentation updates The following topics have been updated for the 2.1.0 release: -* xref:feasibility-checklists.adoc#_read_only_applications[Feasibility checks for read-only applications, window="_blank"]. See the notes indicating that this issue is solved by the {zdm-proxy} 2.1.0 release. +* xref:feasibility-checklists.adoc#_read_only_applications[Feasibility checks for read-only applications, window="_blank"]. +See the notes indicating that this issue is solved by the {zdm-proxy} 2.1.0 release. -* xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable, window="_blank"]. See the `heartbeat_interval_ms` and `zdm-proxy_max_stream_ids` information. +* xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable, window="_blank"]. +See the `heartbeat_interval_ms` and `zdm-proxy_max_stream_ids` information. -* xref:troubleshooting-scenarios.adoc#_async_read_timeouts_stream_id_map_exhausted[Async read timeouts, window="_blank"]. See the clarification in the *Workaround* section indicating that this issue is solved by the {zdm-proxy} 2.1.0 release. +* xref:troubleshooting-scenarios.adoc#_async_read_timeouts_stream_id_map_exhausted[Async read timeouts, window="_blank"]. +See the clarification in the *Workaround* section indicating that this issue is solved by the {zdm-proxy} 2.1.0 release. -* xref:troubleshooting-tips.adoc#_node_level_metrics[Node-level metrics, window="_blank"]. See the "Number of Used Stream Ids" section. +* xref:metrics.adoc#_node_level_metrics[Node-level metrics, window="_blank"]. +See the "Number of Used Stream Ids" section. == ZDM 2.0.0 release @@ -79,29 +90,33 @@ The following topics have been updated for the 2.1.0 release: This 2.0.0 version marks the public release of the self-service {company} {zdm-product} product suite. -The following GitHub repos are public. You are welcome to read the source and submit feedback via GitHub Issues per repo. +The following GitHub repos are public. +You are welcome to read the source and submit feedback via GitHub Issues per repo. -* https://github.com/datastax/zdm-proxy[{zdm-proxy}^] open-source repo: in addition to sending feedback, you may submit Pull Requests (PRs) for potential inclusion, provided you accept the https://cla.datastax.com/[{company} Contributor License Agreement (CLA)^]. For more information, see xref:contributions.adoc[]. +* https://github.com/datastax/zdm-proxy[{zdm-proxy}] open-source repo: in addition to sending feedback, you may submit Pull Requests (PRs) for potential inclusion, provided you accept the https://cla.datastax.com/[{company} Contributor License Agreement (CLA)]. +For more information, see xref:contributions.adoc[]. -* https://github.com/datastax/zdm-proxy-automation[{zdm-automation}^] repo for Ansible-based {zdm-proxy} automation. +* https://github.com/datastax/zdm-proxy-automation[{zdm-automation}] repo for Ansible-based {zdm-proxy} automation. -* https://github.com/datastax/dsbulk-migrator[DSBulk Migrator^] repo for migration of smaller data quantities. +* https://github.com/datastax/dsbulk-migrator[DSBulk Migrator] repo for migration of smaller data quantities. -* https://github.com/datastax/cassandra-data-migrator[Cassandra Data Migrator^] repo for migration of larger data quantities and where detailed verifications and reconciliation options are needed. +* https://github.com/datastax/cassandra-data-migrator[Cassandra Data Migrator] repo for migration of larger data quantities and where detailed verifications and reconciliation options are needed. include::partial$note-downtime.adoc[] -For the latest information about {zdm-proxy} new features and other changes, please refer to the GitHub-hosted https://github.com/datastax/zdm-proxy/blob/main/RELEASE_NOTES.md[RELEASE_NOTES^] in the open-source {zdm-proxy} repo. The document includes CHANGELOG links for each {zdm-proxy} `N.n` release. +For the latest information about {zdm-proxy} new features and other changes, please refer to the GitHub-hosted https://github.com/datastax/zdm-proxy/blob/main/RELEASE_NOTES.md[RELEASE_NOTES] in the open-source {zdm-proxy} repo. +The document includes CHANGELOG links for each {zdm-proxy} `N.n` release. -==== [TIP] -The {zdm-product} process requires you to be able to perform rolling restarts of your client applications during the migration. This is standard practice for client applications that are deployed over multiple instances and is a widely used approach to roll out releases and configuration changes. +==== +The {zdm-product} process requires you to be able to perform rolling restarts of your client applications during the migration. +This is standard practice for client applications that are deployed over multiple instances and is a widely used approach to roll out releases and configuration changes. ==== === ZDM 2.0.0 documentation updates -Starting with the 2.0.0 version on 18-Oct-2022, the {zdm-product} documentation set is available online, starting https://docs.datastax.com/en/astra-serverless/docs/migrate/introduction.html[here]. +Starting with the 2.0.0 version on 18-Oct-2022, the {zdm-product} documentation set is available online, starting xref:introduction.adoc[here]. == Supported releases diff --git a/modules/ROOT/pages/rollback.adoc b/modules/ROOT/pages/rollback.adoc index e5b7f082..e07dd831 100644 --- a/modules/ROOT/pages/rollback.adoc +++ b/modules/ROOT/pages/rollback.adoc @@ -8,10 +8,13 @@ At any point during the migration process until the very last phase, if you hit The migration can be started from scratch once the issue has been addressed. -include::partial$lightbox-tip-all-phases.adoc[] +//include::partial$lightbox-tip-all-phases.adoc[] image::{imagesprefix}migration-all-phases.png[Migration phases from start to finish.] -After moving your client applications off the {zdm-proxy} instances (Phase 5), writes are no longer sent to both Origin and Target clusters: the data on Origin is no longer kept up-to-date, and you lose this seamless rollback option. This is the point at which you commit to using Target permanently. The {zdm-proxy} deployment can be destroyed, and Origin is no longer needed by the client applications that have been migrated. +After moving your client applications off the {zdm-proxy} instances (Phase 5), writes are no longer sent to both Origin and Target clusters: the data on Origin is no longer kept up-to-date, and you lose this seamless rollback option. +This is the point at which you commit to using Target permanently. +The {zdm-proxy} deployment can be destroyed, and Origin is no longer needed by the client applications that have been migrated. -However, should you decide to move back to Origin at a later point, or move to a new cluster entirely, you can simply execute the same migration process. In this case, the new Origin will now be the former Target, and the new Target will be whatever cluster you wish to migrate to (which could even be the former Origin). +However, should you decide to move back to Origin at a later point, or move to a new cluster entirely, you can simply execute the same migration process. +In this case, the new Origin will now be the former Target, and the new Target will be whatever cluster you wish to migrate to (which could even be the former Origin). diff --git a/modules/ROOT/pages/setup-ansible-playbooks.adoc b/modules/ROOT/pages/setup-ansible-playbooks.adoc index 488a9b0c..dc9c7f16 100644 --- a/modules/ROOT/pages/setup-ansible-playbooks.adoc +++ b/modules/ROOT/pages/setup-ansible-playbooks.adoc @@ -11,17 +11,25 @@ Once completed, you will have a working and fully monitored {zdm-proxy} deployme == Introduction -The {zdm-automation} uses **Ansible**, which deploys and configures the {zdm-proxy} instances and monitoring stack via playbooks. This step expects that the infrastructure has been already provisioned. See xref:deployment-infrastructure.adoc[Deployment and infrastructure considerations], which include the infrastructure requirements. +The {zdm-automation} uses **Ansible**, which deploys and configures the {zdm-proxy} instances and monitoring stack via playbooks. +This step expects that the infrastructure has been already provisioned. +See xref:deployment-infrastructure.adoc[Deployment and infrastructure considerations], which include the infrastructure requirements. -Configuring a machine to serve as the Ansible Control Host is very easy using the {zdm-utility}. This is a Golang (Go) executable program that runs anywhere. This utility prompts you for a few configuration values, with helpful embedded explanations and error handling, then automatically creates the Ansible Control Host container ready for you to use. From this container, you will be able to easily configure and run the {zdm-automation} Ansible playbooks. +Configuring a machine to serve as the Ansible Control Host is very easy using the {zdm-utility}. +This is a Golang (Go) executable program that runs anywhere. +This utility prompts you for a few configuration values, with helpful embedded explanations and error handling, then automatically creates the Ansible Control Host container ready for you to use. +From this container, you will be able to easily configure and run the {zdm-automation} Ansible playbooks. image::{imagesprefix}docker-container-and-zdm-utility.png[ZDM Proxy connections from Docker container created by ZDM Utility] == Prerequisites -. You must have already provisioned the ZDM infrastructure, which means you must have the server machines ready, and know their IP addresses. These can be in the cloud provider of your choice or on-premise. -. Docker needs to be installed on the machine that will be running the Ansible Control Host container. For comprehensive installation instructions, please refer to the https://docs.docker.com/engine/install/#server[official Docker documentation]. -. The `docker` command must not require superuser privileges. The instructions to do this can be found https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user[here]. +. You must have already provisioned the ZDM infrastructure, which means you must have the server machines ready, and know their IP addresses. +These can be in the cloud provider of your choice or on-premise. +. Docker needs to be installed on the machine that will be running the Ansible Control Host container. +For comprehensive installation instructions, please refer to the https://docs.docker.com/engine/install/#server[official Docker documentation]. +. The `docker` command must not require superuser privileges. +The instructions to do this can be found https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user[here]. [NOTE] ==== @@ -55,28 +63,32 @@ datastax/zdm-proxy:2.x In this guide, we'll use a jumphost to run the Ansible Control Host container. -A jumphost is a server on a network used to access and manage devices in a separate security zone, providing a controlled means of access between them. The jumphost can be, for example, a Linux server machine that is able to access the server machines that you wish to use for your {zdm-proxy} deployment. +A jumphost is a server on a network used to access and manage devices in a separate security zone, providing a controlled means of access between them. +The jumphost can be, for example, a Linux server machine that is able to access the server machines that you wish to use for your {zdm-proxy} deployment. The jumphost will serve three purposes: -* Accessing the {zdm-proxy} machines -* Running the Ansible Control Host container, from which the {zdm-automation} can be run +* Accessing the {zdm-proxy} machines. +* Running the Ansible Control Host container, from which the {zdm-automation} can be run. * Running the {zdm-shortproduct} monitoring stack, which uses Prometheus and Grafana to expose the metrics of all the {zdm-proxy} instances in a preconfigured dashboard. [TIP] ==== -To simplify accessing the jumphost and {zdm-proxy} instances from your machine, create a xref:deployment-infrastructure.adoc#_connecting_to_the_zdm_infrastructure_from_an_external_machine[custom SSH configuration file]. The following steps will assume that this file exists. +To simplify accessing the jumphost and {zdm-proxy} instances from your machine, create a xref:deployment-infrastructure.adoc#_connecting_to_the_zdm_infrastructure_from_an_external_machine[custom SSH configuration file]. +The following steps will assume that this file exists. ==== Let's get started. == Proxy deployment setup on the jumphost -To run the {zdm-automation}, the Ansible Control Host needs to be able to connect to all other instances of the {zdm-proxy} deployment. For this reason, it needs to have the SSH key required by those instances. +To run the {zdm-automation}, the Ansible Control Host needs to be able to connect to all other instances of the {zdm-proxy} deployment. +For this reason, it needs to have the SSH key required by those instances. === Add SSH keys to the jumphost -From your local machine, transfer (`scp`) the SSH private key for the {zdm-shortproduct} deployment to the jumphost. Example: +From your local machine, transfer (`scp`) the SSH private key for the {zdm-shortproduct} deployment to the jumphost. +Example: [source,bash] ---- @@ -92,9 +104,8 @@ ssh -F jumphost == Running the {zdm-utility} -From the jumphost, download the {zdm-utility}'s executable. Releases are available here: - -link:https://github.com/datastax/zdm-proxy-automation/releases[https://github.com/datastax/zdm-proxy-automation/releases^] +From the jumphost, download the {zdm-utility}'s executable. +Releases are available https://github.com/datastax/zdm-proxy-automation/releases[here]. The downloadable archive name format is `zdm-util--`. @@ -114,14 +125,16 @@ Here's an example to wget {zdm-utility} 2.3.0: wget https://github.com/datastax/zdm-proxy-automation/releases/download/v2.3.0/zdm-util-linux-amd64-v2.3.0.tgz ---- -Once downloaded, unzip it. Here's an example with {zdm-utility} 2.3.0: +Once downloaded, unzip it. +Here's an example with {zdm-utility} 2.3.0: [source,bash] ---- tar -xvf zdm-util-linux-amd64-v2.3.0.tgz ---- -Run the {zdm-utility}. Here's an example with {zdm-utility} 2.3.0: +Run the {zdm-utility}. +Here's an example with {zdm-utility} 2.3.0: [source,bash] ---- @@ -132,9 +145,12 @@ The utility prompts you for a few configuration values, then creates and initial [TIP] ==== -The {zdm-utility} will store the configuration that you provide into a file named `ansible_container_init_config` in the current directory. If you run the utility again, it will detect the file and ask you if you wish to use that configuration or discard it. If the configuration is not fully valid, you will be prompted for the missing or invalid parameters only. +The {zdm-utility} will store the configuration that you provide into a file named `ansible_container_init_config` in the current directory. +If you run the utility again, it will detect the file and ask you if you wish to use that configuration or discard it. +If the configuration is not fully valid, you will be prompted for the missing or invalid parameters only. -You can also pass a custom configuration file to the {zdm-utility} with the optional command-line parameter `-utilConfigFile`. Example: +You can also pass a custom configuration file to the {zdm-utility} with the optional command-line parameter `-utilConfigFile`. +Example: Here's an example with {zdm-utility} 2.3.0: @@ -146,30 +162,41 @@ Here's an example with {zdm-utility} 2.3.0: [NOTE] ==== -The {zdm-utility} will validate each variable that you enter. In case of invalid variables, it will display specific messages to help you fix the problem. +The {zdm-utility} will validate each variable that you enter. +In case of invalid variables, it will display specific messages to help you fix the problem. -You have five attempts to enter valid variables. You can always run the {zdm-utility} again, if necessary. +You have five attempts to enter valid variables. +You can always run the {zdm-utility} again, if necessary. ==== -. Enter the path to, and name of, the SSH private key to access the proxy hosts. Example: +. Enter the path to, and name of, the SSH private key to access the proxy hosts. +Example: + [source,bash] ---- ~/my-zdm-key ---- -. Enter the common prefix of the private IP addresses of the proxy hosts. Example: +. Enter the common prefix of the private IP addresses of the proxy hosts. +Example: + [source,bash] ---- 172.18.* ---- -. You're asked if you have an existing Ansible inventory file. If you do, and you transferred it to the jumphost, you can just specify it. If you do not, the {zdm-utility} will create one based on your answers to prompts and save it. Here we'll assume that you do not have one. Enter `n`. + +. You're asked if you have an existing Ansible inventory file. +If you do, and you transferred it to the jumphost, you can just specify it. +If you do not, the {zdm-utility} will create one based on your answers to prompts and save it. +Here we'll assume that you do not have one. Enter `n`. ++ The created file will be named `zdm_ansible_inventory` in your working directory. -. Next, indicate if this deployment is for local testing and evaluation (such as when you're creating a demo or just experimenting with the {zdm-proxy}). In this example, we'll enter `n` because this scenario is for a production deployment. -. Now enter at least three proxy private IP addresses for the machines that will run the {zdm-proxy} instances, for a production deployment. (If we had indicated above that we're doing local testing in dev, only one proxy would have been required.) Example values entered at the {zdm-utility}'s prompt, for production: +. Next, indicate if this deployment is for local testing and evaluation (such as when you're creating a demo or just experimenting with the {zdm-proxy}). +In this example, we'll enter `n` because this scenario is for a production deployment. +. Now enter at least three proxy private IP addresses for the machines that will run the {zdm-proxy} instances, for a production deployment. +(If we had indicated above that we're doing local testing in dev, only one proxy would have been required.) +Example values entered at the {zdm-utility}'s prompt, for production: + [source,bash] ---- @@ -180,20 +207,27 @@ The created file will be named `zdm_ansible_inventory` in your working directory + To finish entering private IP addresses, simply press ENTER at the prompt. -. Optionally, when prompted, you can enter the private IP address of your Monitoring instance, which will use Prometheus to store data and Grafana to visualize it into a preconfigured dashboard. It is strongly recommended exposing the {zdm-proxy} metrics in the preconfigured dashboard that ships with the {zdm-automation} for easy monitoring. You can skip this step if you haven't decided which machine to use for monitoring, or if you wish to use your own monitoring stack. +. Optionally, when prompted, you can enter the private IP address of your Monitoring instance, which will use Prometheus to store data and Grafana to visualize it into a preconfigured dashboard. +It is strongly recommended exposing the {zdm-proxy} metrics in the preconfigured dashboard that ships with the {zdm-automation} for easy monitoring. +You can skip this step if you haven't decided which machine to use for monitoring, or if you wish to use your own monitoring stack. + [NOTE] ==== -We highly recommend that you configure a monitoring instance, unless you intend to use a monitoring stack that you already have. For migrations that may run for multiple days, it is essential that you use metrics to understand the performance and health of the {zdm-proxy} instances. +We highly recommend that you configure a monitoring instance, unless you intend to use a monitoring stack that you already have. +For migrations that may run for multiple days, it is essential that you use metrics to understand the performance and health of the {zdm-proxy} instances. -You cannot rely solely on information in the logs. They report connection or protocol errors, but do not give you enough information on how the {zdm-proxy} is working and how each cluster is responding. Metrics, however, provide especially helpful data and the graphs show you how they vary over time. The monitoring stack ships with preconfigured Grafana dashboards that are automatically set up as part of the monitoring deployment. +You cannot rely solely on information in the logs. +They report connection or protocol errors, but do not give you enough information on how the {zdm-proxy} is working and how each cluster is responding. +Metrics, however, provide especially helpful data and the graphs show you how they vary over time. +The monitoring stack ships with preconfigured Grafana dashboards that are automatically set up as part of the monitoring deployment. For details about the metrics you can observe in these preconfigured Grafana dashboards, see xref:troubleshooting-tips.adoc#how-to-leverage-metrics[this section] of the troubleshooting tips. ==== + You can choose to deploy the monitoring stack on the jumphost or on a different machine, as long as it can connect to the {zdm-proxy} instances over TCP on ports 9100 (to collect host-level metrics) and on the port on which the {zdm-proxy} exposes its own metrics, typically 14001. -In this example, we'll enter the same IP of the Ansible control host (the jumphost machine on which we're running the {zdm-utility}). Example: +In this example, we'll enter the same IP of the Ansible control host (the jumphost machine on which we're running the {zdm-utility}). +Example: [source,bash] ---- @@ -204,7 +238,8 @@ At this point, the {zdm-utility}: * Has created the Ansible Inventory to the default file, `zdm_ansible_inventory`. * Has written the {zdm-utility} configuration to the default file, `ansible_container_init_config`. -* Presents a summary of the configuration thus far, and prompts you to Continue. Example: +* Presents a summary of the configuration thus far, and prompts you to Continue. +Example: image::{imagesprefix}zdm-go-utility-results3.png[A summary of the configuration provided is displayed in the terminal] @@ -220,7 +255,8 @@ image::{imagesprefix}zdm-go-utility-success3.png[Ansible Docker container succes [NOTE] ==== -Depending on your circumstances, you can make different choices in the ZDM Utility, which will result in a path that is slightly different to the one explained here. The utility will guide you through the process with meaningful, self-explanatory messages and help you rectify any issue that you may encounter. +Depending on your circumstances, you can make different choices in the ZDM Utility, which will result in a path that is slightly different to the one explained here. +The utility will guide you through the process with meaningful, self-explanatory messages and help you rectify any issue that you may encounter. The successful outcome will always be a configured Ansible Control Host container ready to run the {zdm-automation}. ==== \ No newline at end of file diff --git a/modules/ROOT/pages/tls.adoc b/modules/ROOT/pages/tls.adoc index 8e17ab49..62d169de 100644 --- a/modules/ROOT/pages/tls.adoc +++ b/modules/ROOT/pages/tls.adoc @@ -6,7 +6,8 @@ ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] {zdm-proxy} supports proxy-to-cluster and application-to-proxy TLS encryption. -The TLS configuration is an optional part of the initial {zdm-proxy} configuration. See the information here in this topic, and then refer to the {zdm-automation} topics that cover: +The TLS configuration is an optional part of the initial {zdm-proxy} configuration. +See the information here in this topic, and then refer to the {zdm-automation} topics that cover: * xref:setup-ansible-playbooks.adoc[] * xref:deploy-proxy-monitoring.adoc[] @@ -15,16 +16,21 @@ The TLS configuration is an optional part of the initial {zdm-proxy} configurati * All TLS configuration is optional. Enable TLS between the {zdm-proxy} and any cluster that requires it, and/or between your client application and the {zdm-proxy} if required. -* Proxy-to-cluster TLS can be configured between the {zdm-proxy} and Origin/Target (either or both) as desired. Each set of configurations is independent of the other. When using proxy-to-cluster TLS, the {zdm-proxy} acts as the TLS client and the cluster as the TLS server. One-way TLS and Mutual TLS are both supported and can be enabled depending on each cluster's requirements. +* Proxy-to-cluster TLS can be configured between the {zdm-proxy} and Origin/Target (either or both) as desired. +Each set of configurations is independent of the other. When using proxy-to-cluster TLS, the {zdm-proxy} acts as the TLS client and the cluster as the TLS server. +One-way TLS and Mutual TLS are both supported and can be enabled depending on each cluster's requirements. -* When using application-to-proxy TLS, your client application is the TLS client and the {zdm-proxy} is the TLS server. One-way TLS and Mutual TLS are both supported. +* When using application-to-proxy TLS, your client application is the TLS client and the {zdm-proxy} is the TLS server. +One-way TLS and Mutual TLS are both supported. * When the {zdm-proxy} connects to Astra DB clusters, it always implicitly uses Mutual TLS. This is done through the Secure Connect Bundle (SCB) and does not require any extra configuration. +[[_retrieving_files_from_a_jks_keystore]] == Retrieving files from a JKS keystore -If you are already using TLS between your client application and Origin, the files needed to configure TLS will already be used in the client application's configuration (TLS client files) and Origin's configuration (TLS Server files). In some cases, these files may be contained in a JKS keystore. +If you are already using TLS between your client application and Origin, the files needed to configure TLS will already be used in the client application's configuration (TLS client files) and Origin's configuration (TLS Server files). +In some cases, these files may be contained in a JKS keystore. The {zdm-proxy} does not accept a JKS keystore, requiring the raw files instead. @@ -43,11 +49,12 @@ keytool -exportcert -keystore -alias -file zdm-ansible-container:/home/ubuntu/zdm_proxy_tls_files`. -* Ensure that you have a shell open to the container. If you do not, you can open it with `docker exec -it zdm-ansible-container bash`. -* From this shell, edit the file `zdm-proxy-automation/ansible/vars/zdm_proxy_custom_tls_config.yml`, uncommenting and populating the relevant configuration variables. These are in the bottom section of `vars/proxy_custom_tls_config_input.yml` and are all prefixed with `zdm_proxy`: +* Ensure that you have a shell open to the container. +If you do not, you can open it with `docker exec -it zdm-ansible-container bash`. +* From this shell, edit the file `zdm-proxy-automation/ansible/vars/zdm_proxy_custom_tls_config.yml`, uncommenting and populating the relevant configuration variables. +These are in the bottom section of `vars/proxy_custom_tls_config_input.yml` and are all prefixed with `zdm_proxy`: ** `zdm_proxy_tls_user_dir_path_name`: uncomment and leave to its preset value of `/home/ubuntu/zdm_proxy_tls_files`. -** `zdm_proxy_tls_server_ca_filename`: filename (without path) of the server CA that the proxy must use. Always required. -** `zdm_proxy_tls_server_cert_filename` and `zdm_proxy_tls_server_key_filename` : filenames (without path) of the server certificate and server key that the proxy must use. Both always required. -** `zdm_proxy_tls_require_client_auth`: whether you want to enable Mutual TLS between the application and the proxy. Optional: defaults to `false` ( = one-way TLS ), can be set to `true` to enable Mutual TLS. +** `zdm_proxy_tls_server_ca_filename`: filename (without path) of the server CA that the proxy must use. +Always required. +** `zdm_proxy_tls_server_cert_filename` and `zdm_proxy_tls_server_key_filename` : filenames (without path) of the server certificate and server key that the proxy must use. +Both always required. +** `zdm_proxy_tls_require_client_auth`: whether you want to enable Mutual TLS between the application and the proxy. +Optional: defaults to `false` ( = one-way TLS ), can be set to `true` to enable Mutual TLS. [TIP] ==== @@ -133,6 +148,7 @@ Remember that in this case, the {zdm-proxy} is the TLS server; thus the word `se == Apply the configuration -This is all that is needed at this point. As part of its normal execution, the proxy deployment playbook will automatically distribute all TLS files and apply the TLS configuration to all {zdm-proxy} instances. +This is all that is needed at this point. +As part of its normal execution, the proxy deployment playbook will automatically distribute all TLS files and apply the TLS configuration to all {zdm-proxy} instances. Just go back to xref:deploy-proxy-monitoring.adoc#_advanced_configuration_optional[Optional advanced configuration] to finalize the {zdm-proxy} configuration and then execute the deployment playbook. \ No newline at end of file diff --git a/modules/ROOT/pages/troubleshooting-scenarios.adoc b/modules/ROOT/pages/troubleshooting-scenarios.adoc index 91d6b471..85af535f 100644 --- a/modules/ROOT/pages/troubleshooting-scenarios.adoc +++ b/modules/ROOT/pages/troubleshooting-scenarios.adoc @@ -3,7 +3,8 @@ ifdef::env-github,env-browser,env-vscode[:imagesprefix: ../images/] ifndef::env-github,env-browser,env-vscode[:imagesprefix: ] -Refer the following troubleshooting scenarios for information about resolving common migration issues. Each section presents: +Refer the following troubleshooting scenarios for information about resolving common migration issues. +Each section presents: * Symptoms * Cause @@ -17,13 +18,17 @@ You changed the values of some configuration variables in the automation and the === Cause -The {zdm-proxy} configuration comprises a number of variables, but only a subset of these can be changed on an existing deployment in a rolling fashion. The variables that can be changed with a rolling update are listed xref:manage-proxy-instances.adoc#change-mutable-config-variable[here]. +The {zdm-proxy} configuration comprises a number of variables, but only a subset of these can be changed on an existing deployment in a rolling fashion. +The variables that can be changed with a rolling update are listed xref:manage-proxy-instances.adoc#change-mutable-config-variable[here]. -All other configuration variables excluded from the list above are considered immutable and can only be changed by a redeployment. This is by design: immutable configuration variables should not be changed after finalizing the deployment prior to starting the migration, so allowing them to be changed through a rolling update would risk accidentally propagating some misconfiguration that could compromise the deployment's integrity. +All other configuration variables excluded from the list above are considered immutable and can only be changed by a redeployment. +This is by design: immutable configuration variables should not be changed after finalizing the deployment prior to starting the migration, so allowing them to be changed through a rolling update would risk accidentally propagating some misconfiguration that could compromise the deployment's integrity. === Solution or Workaround -To change the value of configuration variables that are considered immutable, simply run the `deploy_zdm_proxy.yml` playbook again. This playbook can be run as many times as necessary and will just recreate the entire {zdm-proxy} deployment from scratch with the provided configuration. Please note that this does not happen in a rolling fashion: the existing {zdm-proxy} instances will be torn down all at the same time prior to being recreated, resulting in a brief window in which the whole {zdm-proxy} deployment will become unavailable. +To change the value of configuration variables that are considered immutable, simply run the `deploy_zdm_proxy.yml` playbook again. +This playbook can be run as many times as necessary and will just recreate the entire {zdm-proxy} deployment from scratch with the provided configuration. +Please note that this does not happen in a rolling fashion: the existing {zdm-proxy} instances will be torn down all at the same time prior to being recreated, resulting in a brief window in which the whole {zdm-proxy} deployment will become unavailable. == Unsupported protocol version error on the client application @@ -47,17 +52,17 @@ In the Java 4.x driver logs, the following issues can manifest during session in === Cause -https://datastax-oss.atlassian.net/browse/JAVA-2905[JAVA-2905^] is a driver bug that manifests itself in this way. It affects Java driver 4.x, and was fixed on the 4.10.0 release. +https://datastax-oss.atlassian.net/browse/JAVA-2905[JAVA-2905] is a driver bug that manifests itself in this way. It affects Java driver 4.x, and was fixed on the 4.10.0 release. === Solution or Workaround If you are using spring boot and/or spring-data-cassandra then an upgrade of these dependencies will be necessary to a version that has the java driver fix. -Alternatively, you can force the protocol version on the driver to the max supported version by both clusters. V4 is a good recommendation that usually fits all but if the user is migrating from DSE to DSE then DSE_V1 should be used for DSE 5.x and DSE_V2 should be used for DSE 6.x. - -To force the protocol version on the Java driver, check this section of the https://docs.datastax.com/en/developer/java-driver/3.11/manual/native_protocol/#controlling-the-protocol-version[driver manual, window="_blank"]. We don't believe this issue affects Java driver 3.x, but here are the https://docs.datastax.com/en/developer/java-driver/3.11/manual/native_protocol/#controlling-the-protocol-version[instructions, window="_blank"] on how to force the version on 3.x, if necessary. - +Alternatively, you can force the protocol version on the driver to the max supported version by both clusters. +V4 is a good recommendation that usually fits all but if the user is migrating from DSE to DSE then DSE_V1 should be used for DSE 5.x and DSE_V2 should be used for DSE 6.x. +To force the protocol version on the Java driver, check this section of the https://docs.datastax.com/en/developer/java-driver/3.11/manual/native_protocol/#controlling-the-protocol-version[driver manual, window="_blank"]. +We don't believe this issue affects Java driver 3.x, but here are the https://docs.datastax.com/en/developer/java-driver/3.11/manual/native_protocol/#controlling-the-protocol-version[instructions, window="_blank"] on how to force the version on 3.x, if necessary. == Protocol errors in the proxy logs but clients can connect successfully @@ -74,9 +79,11 @@ msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time": === Cause -Protocol errors like these are a normal part of the handshake process where the protocol version is being negotiated. These protocol version downgrades happen when either the {zdm-proxy} or at least one of the clusters doesn't support the version requested by the client. +Protocol errors like these are a normal part of the handshake process where the protocol version is being negotiated. +These protocol version downgrades happen when either the {zdm-proxy} or at least one of the clusters doesn't support the version requested by the client. -V5 downgrades are enforced by the {zdm-proxy} but any other downgrade is requested by one of the clusters when they don't support the version that the client requested. The proxy supports V3, V4, DSE_V1 and DSE_V2. +V5 downgrades are enforced by the {zdm-proxy} but any other downgrade is requested by one of the clusters when they don't support the version that the client requested. +The proxy supports V3, V4, DSE_V1 and DSE_V2. //// ZDM-71 tracks a request to support v2. @@ -86,9 +93,9 @@ ZDM-71 tracks a request to support v2. These log messages are informative only (log level `DEBUG`). -If you find one of these messages with a higher log level (especially `level=error`) then there might be a bug. At that point the issue will need to be investigated by the ZDM team. This log message with a log level of `ERROR` means that the protocol error occurred after the handshake, and this is a fatal unexpected error that results in a disconnect for that particular connection. - - +If you find one of these messages with a higher log level (especially `level=error`) then there might be a bug. +At that point the issue will need to be investigated by the ZDM team. +This log message with a log level of `ERROR` means that the protocol error occurred after the handshake, and this is a fatal unexpected error that results in a disconnect for that particular connection. == Error during proxy startup: `Invalid or unsupported protocol version: 3` @@ -126,7 +133,9 @@ time="2022-10-01T19:58:15+01:00" level=error msg="Couldn't start proxy, retrying === Cause -The control connections of the {zdm-proxy} don't perform protocol version negotiation, they only attempt to use protocol version 3. If one of the origin clusters doesn't support at least V3 (e.g. Cassandra 2.0, DSE 4.6), then ZDM cannot be used for that migration at the moment. We plan to introduce support for Cassandra 2.0 and DSE 4.6 very soon. +The control connections of the {zdm-proxy} don't perform protocol version negotiation, they only attempt to use protocol version 3. +If one of the origin clusters doesn't support at least V3 (e.g. Cassandra 2.0, DSE 4.6), then ZDM cannot be used for that migration at the moment. +We plan to introduce support for Cassandra 2.0 and DSE 4.6 very soon. === Solution or Workaround @@ -157,15 +166,18 @@ This error means that at least one of these three sets of credentials is incorre === Solution or Workaround -If the authentication error is preventing the proxy from starting then it's either the Origin or Target credentials that are incorrect or have insufficient permissions. The log message shows whether it is the Target or Origin handshake that is failing. +If the authentication error is preventing the proxy from starting then it's either the Origin or Target credentials that are incorrect or have insufficient permissions. +The log message shows whether it is the Target or Origin handshake that is failing. If the proxy is able to start up -- that is, this message can be seen in the logs: `Proxy started. Waiting for SIGINT/SIGTERM to shutdown.` -then the authentication error is happening when a client application tries to open a connection to the proxy. In this case, the issue is with the Client credentials so the application itself is using invalid credentials (incorrect username/password or insufficient permissions). +then the authentication error is happening when a client application tries to open a connection to the proxy. +In this case, the issue is with the Client credentials so the application itself is using invalid credentials (incorrect username/password or insufficient permissions). -Note that the proxy startup message has log level `INFO` so if the configured log level on the proxy is `warning` or `error`, you will have to rely on other ways to know whether the {zdm-proxy} started correctly. You can check if the docker container is running (or process if docker isn't being used) or if there is a log message similar to `Error launching proxy`. +Note that the proxy startup message has log level `INFO` so if the configured log level on the proxy is `warning` or `error`, you will have to rely on other ways to know whether the {zdm-proxy} started correctly. +You can check if the docker container is running (or process if docker isn't being used) or if there is a log message similar to `Error launching proxy`. == The {zdm-proxy} listens on a custom port, and all applications are able to connect to one proxy instance only @@ -186,7 +198,8 @@ For example, using the Java driver, if the {zdm-proxy} instances were listening `.addContactPoints("172.18.10.36:14035", "172.18.11.48:14035", "172.18.12.61:14035")` -The contact point is used as the first point of contact to the cluster, but the driver discovers the rest of the nodes via CQL queries. However, this discovery process doesn't discover the ports, just the addresses so the driver uses the addresses it discovers with the port that is configured at startup. +The contact point is used as the first point of contact to the cluster, but the driver discovers the rest of the nodes via CQL queries. +However, this discovery process doesn't discover the ports, just the addresses so the driver uses the addresses it discovers with the port that is configured at startup. As a result, port 14035 will only be used for the contact point initially discovered, while for all other nodes the driver will attempt to use the default 9042 port. @@ -201,7 +214,7 @@ In the application, ensure that the custom port is explicitly indicated using th ---- -== Syntax error " no viable alternative at input 'CALL' " in proxy logs +== Syntax error "no viable alternative at input 'CALL'" in proxy logs === Symptoms @@ -216,13 +229,16 @@ at input 'CALL' ([CALL]...))\"\n","stream":"stderr","time":"2022-07-20T13:10:47. === Cause -The log message indicates that the server doesn't recognize the word “CALL” in the query string which most likely means that it is a RPC (remote procedure call). From the proxy logs alone, it is not possible to see what method is being called by the query but it's very likely the RPC that the drivers use to send DSE Insights data to the server. +The log message indicates that the server doesn't recognize the word “CALL” in the query string which most likely means that it is an RPC (remote procedure call). +From the proxy logs alone, it is not possible to see what method is being called by the query but it's very likely the RPC that the drivers use to send DSE Insights data to the server. -Most {company} drivers have DSE Insights reporting enabled by default when they detect a server version that supports it (regardless of whether the feature is enabled on the server side or not). The driver might also have it enabled for Astra DB depending on what server version Astra DB is returning for queries involving the `system.local` and `system.peers` tables. +Most {company} drivers have DSE Insights reporting enabled by default when they detect a server version that supports it (regardless of whether the feature is enabled on the server side or not). +The driver might also have it enabled for Astra DB depending on what server version Astra DB is returning for queries involving the `system.local` and `system.peers` tables. === Solution or Workaround -These log messages are harmless but if your need to get rid of them, you can disable the DSE Insights driver feature through the driver configuration. Refer to https://github.com/datastax/java-driver/blob/65d2c19c401175dcc6c370560dd5f783d05b05b9/core/src/main/resources/reference.conf#L1328[this property, window="_blank"] for Java driver 4.x. +These log messages are harmless but if your need to get rid of them, you can disable the DSE Insights driver feature through the driver configuration. +Refer to https://github.com/datastax/java-driver/blob/65d2c19c401175dcc6c370560dd5f783d05b05b9/core/src/main/resources/reference.conf#L1328[this property, window="_blank"] for Java driver 4.x. @@ -238,9 +254,8 @@ The {zdm-automation} specifies a custom set of credentials instead of relying on === Solution or Workaround -Check the credentials that are being used by looking up the `vars/zdm_monitoring_config.yml` file on the {zdm-automation} directory. These credentials can also be modified before deploying the metrics stack. - - +Check the credentials that are being used by looking up the `vars/zdm_monitoring_config.yml` file on the {zdm-automation} directory. +These credentials can also be modified before deploying the metrics stack. == Proxy starts but client cannot connect (connection timeout/closed) @@ -282,8 +297,7 @@ ERRO[0076] Client Handler could not be created: ORIGIN-CONNECTOR context timed o The control connection (during {zdm-proxy} startup) cycles through the nodes until it finds one that can be connected to. For client connections, each proxy instance cycles through its "assigned nodes" only. -_(The "assigned nodes" are a different subset of the cluster nodes for each proxy instance, -generally non-overlapping between proxy instances so as to avoid any interference with the load balancing already in place at client-side driver level. +_(The "assigned nodes" are a different subset of the cluster nodes for each proxy instance, generally non-overlapping between proxy instances so as to avoid any interference with the load balancing already in place at client-side driver level. The assigned nodes are not necessarily contact points: even discovered nodes undergo assignment to proxy instances.)_ In the example above, the {zdm-proxy} doesn't have connectivity to 10.0.63.20, which was chosen as the origin node for the incoming client connection, but it was able to connect to 10.0.63.163 during startup. @@ -292,8 +306,6 @@ In the example above, the {zdm-proxy} doesn't have connectivity to 10.0.63.20, w Ensure that network connectivity exists and is stable between the {zdm-proxy} instances and all Cassandra / DSE nodes of the local datacenter. - - == Client application driver takes too long to reconnect to a proxy instance === Symptoms @@ -312,8 +324,6 @@ Restart the client application to force an immediate reconnect. If you expect {zdm-proxy} instances to go down frequently, change the reconnection policy on the driver so that the interval between reconnection attempts has a shorter limit. - - == Error with Astra DevOps API when using the {zdm-automation} === Symptoms @@ -333,8 +343,8 @@ The Astra DevOps API is likely temporarily unavailable. === Solution or Workaround -Download the Astra DB Secure Connect Bundle (SCB) manually and provide its path to the {zdm-automation} as explained xref:deploy-proxy-monitoring.adoc#_core_configuration[here]. For information about the SCB, see https://docs.datastax.com/en/astra-serverless/docs/connect/secure-connect-bundle.html[working with Secure Connect Bundle, window="_blank"]. - +Download the Astra DB Secure Connect Bundle (SCB) manually and provide its path to the {zdm-automation} as explained xref:deploy-proxy-monitoring.adoc#_core_configuration[here]. +For information about the SCB, see https://docs.datastax.com/en/astra-serverless/docs/connect/secure-connect-bundle.html[working with Secure Connect Bundle, window="_blank"]. == Metadata service (Astra) returned not successful status code 4xx or 5xx @@ -361,10 +371,10 @@ Start by opening Astra Portal and checking the `Status` of your database. If it is `Hibernated`, click the “Resume” button and wait for it to become `Active`. If it is `Active` already, then it is likely an issue with permissions. -We recommend starting with a token that has the Database Administrator role in Astra DB to confirm that it is a permissions issue. Refer to https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-permissions.html[Manage user permissions, window="_blank"]. - - +We recommend starting with a token that has the Database Administrator role in Astra DB to confirm that it is a permissions issue. +Refer to https://docs.datastax.com/en/astra/astra-db-vector/administration/manage-database-access.html[Manage user permissions, window="_blank"]. +[[_async_read_timeouts_stream_id_map_exhausted]] == Async read timeouts / stream id map exhausted === Symptoms @@ -382,19 +392,22 @@ Dual reads are enabled and the following messages are found in the {zdm-proxy} l === Cause -The last log message is logged when the async connection runs out of stream ids. The async connection is a connection dedicated to the async reads (asynchronous dual reads feature). This can be caused by timeouts (first log message) or the connection not being able to keep up with the load. +The last log message is logged when the async connection runs out of stream ids. +The async connection is a connection dedicated to the async reads (asynchronous dual reads feature). +This can be caused by timeouts (first log message) or the connection not being able to keep up with the load. -If the log files are being spammed with these messages then it is likely that an outage occurred which caused all responses to arrive after requests timed out (second log message). In this case the async connection might not be able to recover. +If the log files are being spammed with these messages then it is likely that an outage occurred which caused all responses to arrive after requests timed out (second log message). +In this case the async connection might not be able to recover. === Solution or Workaround Keep in mind that any errors in the async request path (dual reads) will not affect the client application so these log messages might be useful to predict what may happen when the reads are switched over to the TARGET cluster but async read errors/warnings by themselves do not cause any impact to the client. -Starting in version 2.1.0, you can now tune the maximum number of stream ids available per connection, which by default is 2048. You can increase it to match your driver configuration through the xref:manage-proxy-instances.adoc#zdm_proxy_max_stream_ids[zdm_proxy_max_stream_ids] property. - -If these errors are being constantly written to the log files (for minutes or even hours) then it is likely that only an application OR {zdm-proxy} restart will fix it. If you find an issue like this please submit an https://github.com/datastax/zdm-proxy/issues[Issue, window="_blank"] in our GitHub repo. - +Starting in version 2.1.0, you can now tune the maximum number of stream ids available per connection, which by default is 2048. +You can increase it to match your driver configuration through the xref:manage-proxy-instances.adoc#zdm_proxy_max_stream_ids[zdm_proxy_max_stream_ids] property. +If these errors are being constantly written to the log files (for minutes or even hours) then it is likely that only an application OR {zdm-proxy} restart will fix it. +If you find an issue like this please submit an https://github.com/datastax/zdm-proxy/issues[Issue, window="_blank"] in our GitHub repo. == Client application closed connection errors every 10 minutes when migrating to Astra DB @@ -403,7 +416,6 @@ If these errors are being constantly written to the log files (for minutes or ev This issue is fixed in {zdm-proxy} 2.1.0. See the Fix section below. ==== - === Symptoms Every 10 minutes a message is logged in the {zdm-proxy} logs showing a disconnect that was caused by Astra DB. @@ -415,7 +427,8 @@ Every 10 minutes a message is logged in the {zdm-proxy} logs showing a disconnec === Cause -Astra DB terminates idle connections after 10 minutes of inactivity. If a client application is only sending reads through a connection then the Target (i.e. Astra in this case) connection will not get any traffic because ZDM forwards all reads to the Origin connection. +Astra DB terminates idle connections after 10 minutes of inactivity. +If a client application is only sending reads through a connection then the Target (i.e. Astra in this case) connection will not get any traffic because ZDM forwards all reads to the Origin connection. === Solution or Workaround @@ -438,23 +451,26 @@ The results of these tests show latency/throughput values are worse with ZDM tha === Cause -ZDM will always add additional latency which, depending on the nature of the test, will also result in a lower throughput. Whether this performance hit is expected or not depends on the difference between the ZDM test results and the test results with the cluster that performed the worst. +ZDM will always add additional latency which, depending on the nature of the test, will also result in a lower throughput. +Whether this performance hit is expected or not depends on the difference between the ZDM test results and the test results with the cluster that performed the worst. -Writes in ZDM require an `ACK` from both clusters while reads only require the result from the Origin cluster (or target if the proxy is set up to route reads to the target cluster). This means that if Origin has better performance than Target then ZDM will inevitably have a worse performance for writes. +Writes in ZDM require an `ACK` from both clusters while reads only require the result from the Origin cluster (or target if the proxy is set up to route reads to the target cluster). +This means that if Origin has better performance than Target then ZDM will inevitably have a worse performance for writes. From our testing benchmarks, a performance degradation of up to 2x latency is not unheard of even without external factors adding more latency, but it is still worth checking some things that might add additional latency like whether the proxy is deployed on the same Availability Zone (AZ) as the Origin cluster or application instances. -Simple statements and batch statements are things that will make the proxy add additional latency compared to normal prepared statements. Simple statements should be discouraged especially with the zdm-proxy because currently the proxy takes a considerable amount of time just parsing the queries and with prepared statements the proxy only has to parse them once. +Simple statements and batch statements are things that will make the proxy add additional latency compared to normal prepared statements. +Simple statements should be discouraged especially with the {zdm-proxy} because currently the proxy takes a considerable amount of time just parsing the queries and with prepared statements the proxy only has to parse them once. === Solution or Workaround If you are using simple statements, consider using prepared statements as the best first step. -Increasing the number of proxies might help, but only if the VMs resources (CPU, RAM or network IO) are near capacity. The {zdm-proxy} doesn't use a lot of RAM, but it uses a lot of CPU and network IO. +Increasing the number of proxies might help, but only if the VMs resources (CPU, RAM or network IO) are near capacity. +The {zdm-proxy} doesn't use a lot of RAM, but it uses a lot of CPU and network IO. Deploying the proxy instances on VMs with faster CPUs and faster network IO might help, but only your own tests will reveal whether it helps, because it depends on the workload type and details about your environment such as network/VPC configurations, hardware, and so on. - == `InsightsRpc` related permissions errors === Symptoms @@ -469,7 +485,8 @@ time="2023-05-05T19:14:31Z" level=debug msg="Recording TARGET-CONNECTOR other er === Cause -This could be the case if the origin (DSE) cluster has Metrics Collector enabled to report metrics for {company} drivers and `my_user` does not have the required permissions. {zdm-proxy} simply passes through these. +This could be the case if the origin (DSE) cluster has Metrics Collector enabled to report metrics for {company} drivers and `my_user` does not have the required permissions. +{zdm-proxy} simply passes through these. === Solution or Workaround @@ -482,4 +499,4 @@ There are two options to get this fixed. ==== Option 2: Use this option if disabling metrics collector is not an option -* Using a superuser role, grant the appropriate permissions to `my_user` role by running `GRANT EXECUTE ON REMOTE OBJECT InsightsRpc TO my_user;` +* Using a superuser role, grant the appropriate permissions to `my_user` role by running `GRANT EXECUTE ON REMOTE OBJECT InsightsRpc TO my_user;` \ No newline at end of file diff --git a/modules/ROOT/pages/troubleshooting-tips.adoc b/modules/ROOT/pages/troubleshooting-tips.adoc index 91062c3e..e85793df 100644 --- a/modules/ROOT/pages/troubleshooting-tips.adoc +++ b/modules/ROOT/pages/troubleshooting-tips.adoc @@ -11,8 +11,7 @@ Depending on how you deployed {zdm-proxy}, there may be different ways to access the logs. If you used the {zdm-automation}, see xref:manage-proxy-instances.adoc#_view_the_logs[View the logs] for a quick way to view the logs of a single proxy instance. -Follow the instructions on xref:manage-proxy-instances.adoc#_collect_the_logs[Collect the logs], -instead, for a playbook that systematically retrieves all logs by all instances and packages them in a zip archive for later inspection. +Follow the instructions on xref:manage-proxy-instances.adoc#_collect_the_logs[Collect the logs] for a playbook that systematically retrieves all logs by all instances and packages them in a zip archive for later inspection. If you did not use the {zdm-automation}, you might have to access the logs differently. If Docker is used, enter the following command to export the logs of a container to a file: @@ -31,7 +30,9 @@ Keep in mind that docker logs are deleted if the container is recreated. Make sure that the log level of the {zdm-proxy} is set to the appropriate value: -* If you deployed the {zdm-proxy} through the {zdm-automation}, the log level is determined by the variable `log_level` in `vars/zdm_proxy_core_config.yml`. This value can be changed in a rolling fashion by editing this variable and running the playbook `rolling_update_zdm_proxy.yml`. For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. +* If you deployed the {zdm-proxy} through the {zdm-automation}, the log level is determined by the variable `log_level` in `vars/zdm_proxy_core_config.yml`. +This value can be changed in a rolling fashion by editing this variable and running the playbook `rolling_update_zdm_proxy.yml`. +For more information, see xref:manage-proxy-instances.adoc#change-mutable-config-variable[Change a mutable configuration variable]. * If you did not use the {zdm-automation} to deploy the {zdm-proxy}, change the environment variable `ZDM_LOG_LEVEL` on each proxy instance and restart it. @@ -39,7 +40,8 @@ Here are the most common messages you'll find in the proxy logs: === {zdm-proxy} startup message -Assuming the Log Level is not filtering out `INFO` entries, you can look for the following type of log message in order to verify that the {zdm-proxy} is starting up correctly. Example: +Assuming the Log Level is not filtering out `INFO` entries, you can look for the following type of log message in order to verify that the {zdm-proxy} is starting up correctly. +Example: [source,json] ---- @@ -50,7 +52,10 @@ msg=\"Proxy started. Waiting for SIGINT/SIGTERM to shutdown. === {zdm-proxy} configuration -The first few lines of the {zdm-proxy} log file contains all the configuration variables and values. They are printed in a long JSON string format. You can copy/paste the string into a JSON formatter/viewer to make it easier to read. Example log message: +The first few lines of the {zdm-proxy} log file contains all the configuration variables and values. +They are printed in a long JSON string format. +You can copy/paste the string into a JSON formatter/viewer to make it easier to read. +Example log message: [source,json] ---- @@ -60,7 +65,9 @@ msg=\"Parsed configuration: {\\\"ProxyIndex\\\":1,\\\"ProxyAddresses\\\":"...", ","stream":"stderr","time":"2023-01-13T11:50:48.339225051Z"} ---- -Seeing the configuration settings is useful while troubleshooting issues. However, remember to check the log level variable to ensure you're viewing the intended types of messages. Setting the log level setting to `DEBUG` might cause a slight performance degradation. +Seeing the configuration settings is useful while troubleshooting issues. +However, remember to check the log level variable to ensure you're viewing the intended types of messages. +Setting the log level setting to `DEBUG` might cause a slight performance degradation. === Be aware of current log level @@ -72,7 +79,8 @@ When you find a log message that looks like an error, the most important thing i * Log messages with `level=warn` are usually related to events that are not fatal to the overall running workload, but may cause issues with individual requests or connections. -* In general, log messages with `level=error` or `level=warn` should be brought to the attention of DataStax, if the meaning is not clear. In the {zdm-proxy} GitHub repo, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue^] to ask questions about log messages of type `error` or `warn` that are unclear. +* In general, log messages with `level=error` or `level=warn` should be brought to the attention of DataStax, if the meaning is not clear. +In the {zdm-proxy} GitHub repo, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue] to ask questions about log messages of type `error` or `warn` that are unclear. === Protocol log messages @@ -86,8 +94,11 @@ to the client to force a downgrade: PROTOCOL (code=Code Protocol [0x0000000A], msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time":"2023-01-13T12:02:12.379287735Z"} ---- -There are cases where protocol errors are fatal so they will kill an active connection that was being used to serve requests. However, if you find a log message similar to the example above with log level `debug`, then it's likely not an issue. Instead, it's more likely an expected part of the handshake process during the connection initialization; that is, the normal protocol version negotiation. +There are cases where protocol errors are fatal so they will kill an active connection that was being used to serve requests. +However, if you find a log message similar to the example above with log level `debug`, then it's likely not an issue. +Instead, it's more likely an expected part of the handshake process during the connection initialization; that is, the normal protocol version negotiation. +[[_how_to_identify_the_zdm_proxy_version]] == How to identify the {zdm-proxy} version In the {zdm-proxy} logs, the first message contains the version string (just before the message that shows the configuration): @@ -99,7 +110,8 @@ time="2023-01-13T13:37:28+01:00" level=info msg="Starting ZDM proxy version 2.1. time="2023-01-13T13:37:28+01:00" level=info msg="Parsed configuration: {removed for simplicity}" ---- -You can also provide a `-version` command line parameter to the {zdm-proxy} and it will only print the version. Example: +You can also provide a `-version` command line parameter to the {zdm-proxy} and it will only print the version. +Example: [source,bash] ---- @@ -119,22 +131,31 @@ See xref:metrics.adoc[]. == Reporting an issue -If you encounter a problem during your migration, please contact us. In the {zdm-proxy} GitHub repo, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue^]. Only to the extent that the issue's description does not contain **your proprietary or private** information, please include the following: +If you encounter a problem during your migration, please contact us. +In the {zdm-proxy} GitHub repo, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue]. +Only to the extent that the issue's description does not contain **your proprietary or private** information, please include the following: * {zdm-proxy} version (see xref:_how_to_identify_the_zdm_proxy_version[here]). * {zdm-proxy} logs: ideally at `debug` level if you can reproduce the issue easily and can tolerate a restart of the proxy instances to apply the configuration change. * Version of database software on the Origin and Target clusters (relevant for DSE and Apache Cassandra deployments only). * If Astra DB is being used, please let us know in the issue description. -* Screenshots of the {zdm-proxy} metrics dashboards from Grafana or whatever visualization tool you use. If you can provide a way for us to access those metrics directly that would be even better. +* Screenshots of the {zdm-proxy} metrics dashboards from Grafana or whatever visualization tool you use. +If you can provide a way for us to access those metrics directly that would be even better. * Application/Driver logs. * Driver and version that the client application is using. === Reporting a performance issue -If the issue is related to performance, troubleshooting can be more complicated and dynamic. Because of this we request additional information to be provided which usually comes down to the answers to a few questions (in addition to the information from the prior section): +If the issue is related to performance, troubleshooting can be more complicated and dynamic. +Because of this we request additional information to be provided which usually comes down to the answers to a few questions (in addition to the information from the prior section): * Which statement types are being used: simple, prepared, batch? -* If batch statements are being used, which driver API is being used to create these batches? Are you passing a `BEGIN BATCH` cql query string to a simple/prepared statement? Or are you using the actual batch statement objects that drivers allow you to create? +* If batch statements are being used, which driver API is being used to create these batches? +Are you passing a `BEGIN BATCH` cql query string to a simple/prepared statement? +Or are you using the actual batch statement objects that drivers allow you to create? * How many parameters does each statement have? -* Is CQL function replacement enabled? You can see if this feature is enabled by looking at the value of the Ansible advanced configuration variable `replace_cql_functions` if using the automation, or the environment variable `ZDM_REPLACE_CQL_FUNCTIONS` otherwise. CQL function replacement is disabled by default. -* If permissible within your security rules, please provide us access to the {zdm-proxy} metrics dashboard. Screenshots are fine but for performance issues it is more helpful to have access to the actual dashboard so the team can use all the data from these metrics in the troubleshooting process. \ No newline at end of file +* Is CQL function replacement enabled? +You can see if this feature is enabled by looking at the value of the Ansible advanced configuration variable `replace_cql_functions` if using the automation, or the environment variable `ZDM_REPLACE_CQL_FUNCTIONS` otherwise. +CQL function replacement is disabled by default. +* If permissible within your security rules, please provide us access to the {zdm-proxy} metrics dashboard. +Screenshots are fine but for performance issues it is more helpful to have access to the actual dashboard so the team can use all the data from these metrics in the troubleshooting process. \ No newline at end of file diff --git a/modules/ROOT/pages/troubleshooting.adoc b/modules/ROOT/pages/troubleshooting.adoc index 539f7371..2619e225 100644 --- a/modules/ROOT/pages/troubleshooting.adoc +++ b/modules/ROOT/pages/troubleshooting.adoc @@ -12,10 +12,11 @@ The troubleshooting information for {zdm-product} is organized as follows: ==== If you still have questions, please submit a GitHub Issue in the relevant public repo: -* https://github.com/datastax/zdm-proxy/issues[{zdm-proxy}^]. -* https://github.com/datastax/zdm-proxy-automation/issues[{zdm-automation}^], which includes the {zdm-utility}. -* https://github.com/datastax/cassandra-data-migrator/issues[{cstar-data-migrator}^]. -* https://github.com/datastax/dsbulk-migrator/issues[{dsbulk-migrator}^]. +* https://github.com/datastax/zdm-proxy/issues[{zdm-proxy}]. +* https://github.com/datastax/zdm-proxy-automation/issues[{zdm-automation}], which includes the {zdm-utility}. +* https://github.com/datastax/cassandra-data-migrator/issues[{cstar-data-migrator}]. +* https://github.com/datastax/dsbulk-migrator/issues[{dsbulk-migrator}]. -You may also contact your {company} account representative, or our https://support.datastax.com/s/[Support team^] if you have a DataStax Luna service contract. https://www.datastax.com/products/luna[Luna] is a subscription to the Apache Cassandra support and expertise at DataStax. +You may also contact your {company} account representative, or our https://support.datastax.com/s/[Support team] if you have a DataStax Luna service contract. +https://www.datastax.com/products/luna[Luna] is a subscription to the Apache Cassandra support and expertise at DataStax. ==== diff --git a/modules/ROOT/partials/interactive-lab.adoc b/modules/ROOT/partials/interactive-lab.adoc index 9d56d869..b6905dea 100644 --- a/modules/ROOT/partials/interactive-lab.adoc +++ b/modules/ROOT/partials/interactive-lab.adoc @@ -1,8 +1,6 @@ Now that you've seen a conceptual overview of the process, let's put what you learned into practice. -We've built a complementary learning resource that is a companion to this comprehensive ZDM documentation. It's the {zdm-product} Interactive Lab, available for you here: - -https://www.datastax.com/dev/zdm[https://www.datastax.com/dev/zdm,window="_blank"] +We've built a complementary learning resource that is a companion to this comprehensive ZDM documentation. It's the https://www.datastax.com/dev/zdm[{zdm-product} Interactive Lab]. * All you need is a browser and a GitHub account. * There's nothing to install for the lab, which opens in a pre-configured GitPod environment. diff --git a/modules/ROOT/partials/migration-scenarios.adoc b/modules/ROOT/partials/migration-scenarios.adoc index 7d096651..6c20837e 100644 --- a/modules/ROOT/partials/migration-scenarios.adoc +++ b/modules/ROOT/partials/migration-scenarios.adoc @@ -12,20 +12,20 @@ Here are just a few examples of migration scenarios that are supported when movi * From an existing self-managed Cassandra or DSE cluster to cloud-native {astra_db}. For example: -** Cassandra 2.1.6+, 3.11.x, 4.0.x, or 4.1.x to {astra_db} +** Cassandra 2.1.6+, 3.11.x, 4.0.x, or 4.1.x to {astra_db}. -** DSE 4.7.1+, 4.8.x, 5.1.x, 6.7.x or 6.8.x to {astra_db} +** DSE 4.7.1+, 4.8.x, 5.1.x, 6.7.x or 6.8.x to {astra_db}. * From an existing Cassandra or DSE cluster to another Cassandra or DSE cluster. For example: -** Cassandra 2.1.6+ or 3.11.x to Cassandra 4.0.x or 4.1.x +** Cassandra 2.1.6+ or 3.11.x to Cassandra 4.0.x or 4.1.x. -** DSE 4.7.1+, 4.8.x, 5.1.x or 6.7.x to DSE 6.8.x +** DSE 4.7.1+, 4.8.x, 5.1.x or 6.7.x to DSE 6.8.x. -** Cassandra 2.1.6+, 3.11.x, 4.0.x, or 4.1.x to DSE 6.8.x +** Cassandra 2.1.6+, 3.11.x, 4.0.x, or 4.1.x to DSE 6.8.x. -** DSE 4.7.1+ or 4.8.x to Cassandra 4.0.x or 4.1.x +** DSE 4.7.1+ or 4.8.x to Cassandra 4.0.x or 4.1.x. -* From https://docs.datastax.com/en/astra-classic/docs[{astra_db} Classic] to https://docs.datastax.com/en/astra-serverless/docs[{astra_db} Serverless] +* From https://docs.datastax.com/en/astra-classic/docs[{astra_db} Classic] to https://docs.datastax.com/en/astra/astra-db-vector/[{astra-db-serverless}]. -* From any CQL-based database type/version to the equivalent CQL-based database type/version. +* From any CQL-based database type/version to the equivalent CQL-based database type/version. \ No newline at end of file diff --git a/modules/ROOT/partials/supported-releases.adoc b/modules/ROOT/partials/supported-releases.adoc index 91317572..50bffd71 100644 --- a/modules/ROOT/partials/supported-releases.adoc +++ b/modules/ROOT/partials/supported-releases.adoc @@ -1,4 +1,4 @@ Overall, you can use {zdm-proxy} to migrate: -* **From:** Any Cassandra 2.1.6 or higher release, or from any DSE 4.7.1 or higher release -* **To:** Any equivalent or higher release of Cassandra, or to any equivalent or higher release of DSE, or to {astra_db} +* **From:** Any Cassandra 2.1.6 or higher release, or from any DSE 4.7.1 or higher release. +* **To:** Any equivalent or higher release of Cassandra, or to any equivalent or higher release of DSE, or to {astra_db}. diff --git a/modules/ROOT/partials/tip-scb.adoc b/modules/ROOT/partials/tip-scb.adoc index 817d5bc9..45803109 100644 --- a/modules/ROOT/partials/tip-scb.adoc +++ b/modules/ROOT/partials/tip-scb.adoc @@ -7,5 +7,5 @@ The SCB can be downloaded from the Astra Portal as follows: . Select the **Java driver**, choosing the driver based on the CQL APIs. . Click on **Download bundle** (choosing a region if prompted to do so). -For more information on the SCB and how to retrieve it, see https://docs.datastax.com/en/astra-serverless/docs/connect/drivers/legacy-drivers.html#_working_with_secure_connect_bundle[the Astra documentation^]. +For more information on the SCB and how to retrieve it, see https://docs.datastax.com/en/astra/astra-db-vector/drivers/secure-connect-bundle.html[the {astra-db-serverless} documentation]. --