diff --git a/docs/modules/ROOT/pages/capacity-planning.adoc b/docs/modules/ROOT/pages/capacity-planning.adoc index fdd47de23..2640b5c54 100644 --- a/docs/modules/ROOT/pages/capacity-planning.adoc +++ b/docs/modules/ROOT/pages/capacity-planning.adoc @@ -13,30 +13,30 @@ The cluster performance depends on multiple factors, including data size, number of backups, queries, and which features are used. Therefore, planning the cluster remains a complex task that requires knowledge of Hazelcast's architecture and concepts. Here, we introduce some basic guidelines -that help to properly size a cluster. +that help to size a cluster properly. We recommend always benchmarking your setup before deploying it to -production. We also recommend that bechmarking systems resemble the +production. We also recommend that benchmarking systems resemble the production system as much as possible to avoid unexpected results. -We provide a <> +We provide a <> that you can use as a starting point. Hazelcast clusters will run both data processing and data storage workloads, so planning for both types of workload is important. -In order to correctly size the cluster for your use case, answers to as many of the +To correctly size the cluster for your use case, answer as many of the following questions as possible are necessary: -* How much data you want to keep in the in-memory store at any given time? +* How much data do you want to keep in the in-memory store at any given time? * How many copies (backups) of that data do you require? * Are you going to use synchronous or asynchronous xref:data-structures:backing-up-maps.adoc[backups]? * When running queries how many indexes or index fields for each object will you have? * What is your read/write ratio? (Example: 70% of time is spent reading data, 30% is spent writing) ** Based on the read/write ratio and Transactions Per Second (TPS), you can learn about the amount of memory -required to accommodate the data, both existing and new. Usually an eviction mechanism keeps +required to accommodate the data, both existing and new. Usually, an eviction mechanism keeps the map/cache size in check, but the eviction itself does not always clear the memory almost -immediately. Therefore, the answers to this question gives a good insight. +immediately. Therefore, the answers to this question give a good insight. * Are you using multiple clusters (which may involve xref:wan:wan.adoc[WAN Replication])? * What are the throughput and latency requirements? ** The answer should be about Hazelcast access, not the application throughput. @@ -45,7 +45,7 @@ transaction may need to use Hazelcast 3 times during the execution. So the actual Hazelcast throughput would need to be 30,000 TPS. Similarly for latency, the answer should not be about end-to-end latency but the application-to-Hazelcast latency. * How many concurrent Hazelcast xref:configuration:jet-configuration.adoc[jobs] will the cluster run? -* What is the approximation duration of a job? +* What is the approximate duration of a job? * When you use stream processing, what is the average approximation latency for processing of a single event? * What is the intended xref:pipelines:sources-sinks.adoc[sink] for your jobs (database, dashboard, file system, etc.)? ** If the sink is a Hazelcast map, then the standard caching questions apply, i.e., @@ -106,7 +106,7 @@ the data previously owned by the newly offline member will be redistributed acro the remaining members. For this reason, we recommend that you plan to use only 60% of available memory, with 40% headroom to handle member failure or shutdown. -If you use High-Density Memory Store, Hazelcast automatically +If you use the High-Density Memory Store, Hazelcast automatically assigns a percentage of available off-heap memory to the internal memory manager. Since allocation happens lazily, if you want to be informed about how much off-heap memory is being used by the @@ -131,7 +131,7 @@ instead. Memory consumption is affected by: * **Resources deployed with your job:** Attaching big -files such as models for ML inference pipelines can consume significant resources. +files, such as models for ML inference pipelines, can consume significant resources. * **State of the running jobs:** This varies, as it's affected by the shape of your pipeline and by the data being processed. Most of the memory is consumed by operations that aggregate and buffer data. Typically the @@ -189,7 +189,7 @@ NOTE: If you are an Enterprise customer using the High-Density Memory Store with large data sizes, we recommend a large increase in partition count, starting with 5009 or higher. NOTE: The partition count cannot be easily changed after a cluster is created, so if you have a large cluster -be sure to test and set an optimum partition count prior to deployment. If you need to change th partition +be sure to test and set an optimum partition count prior to deployment. If you need to change the partition count after a cluster is already running, you will need to schedule a maintenance window to entirely bring the cluster down. If your cluster uses the xref:storage:persistence.adoc[Persistence] or xref:cp-subsystem:persistence.adoc[CP Persistence] features, those persistent files will need to be removed after the cluster is shut down, as they contain