diff --git a/docs/source/install/min-prod-hw.rst b/docs/source/install/min-prod-hw.rst index 8737a3a8c..6fda62cc1 100644 --- a/docs/source/install/min-prod-hw.rst +++ b/docs/source/install/min-prod-hw.rst @@ -1,7 +1,7 @@ Minimal Production System Recommendations ----------------------------------------- -* **CPU** - at least 2 physical cores/ 4vCPUs +* **CPU** - For clusters with up to 100 cores use 2vCPUS, for larger clusters 4vCPUs * **Memory** - 15GB+ DRAM and proportional to the number of cores. * **Disk** - persistent disk storage is proportional to the number of cores and Prometheus retention period (see the following section) * **Network** - 1GbE/10GbE preferred diff --git a/docs/source/procedures/datadog/cloud-integration.rst b/docs/source/procedures/datadog/cloud-integration.rst new file mode 100644 index 000000000..553a11dcb --- /dev/null +++ b/docs/source/procedures/datadog/cloud-integration.rst @@ -0,0 +1,60 @@ +============================================= +ScyllaDB Cloud Monitoring Datadog Integration +============================================= + +For security reasons, the ScyllaDB cloud does not have direct access to the Prometheus server. +To allow external server scrapping, you will need to enable the Prometheus proxy. +The Datadog agent reads from the proxy, which reads from the Promethues server. + +1. Installing and configuring the Datadog Agent. +2. Add Datadog recording rules. +3. Loading Scylla dashboard to Datadog. +4. Optionally load Monitor (Alerts). + +Scylla Monitoring Datadog Integration Overview +============================================== +A typical ScyllaDB cluster generates thousands of metrics, sometimes even tens of thousands. +The sheer number of metrics is too much for Datadog. + +Instead of letting the Datadog agent scrap all metrics, the monitoring stack marks a small subset of metrics with a label and lets the Datadog agent scrap only those. + +Install And configure the Datadog Agent +======================================= + +Start by following `Installation `_ guide. The datadog agent should run on a machine that can reach the Prometheus Proxy server. + +Once the Datadog agent is working, download the configuration file and place it under /etc/datadog-agent/conf.d/prometheus.d/conf.yaml + +Download the configuration file :download:`conf.yaml ` move it to: /etc/datadog-agent/conf.d/prometheus.d/conf.yaml + + +Edit the file. You must replace the cluster id (CLUSTER_ID) and the token (TOKEN). + +Post configuration +^^^^^^^^^^^^^^^^^^ +Restart the agent based on your installation. Scylla metrics should be visible in Datadog. + + +.. note:: By default, Datadog will not scrap per-shard metrics. To enable per-shard metrics, edit the conf.yaml file and replace dd=~"1" with dd=~"1|2" + +Upload the Dashboard +==================== +Download the dashboard file :download:`dashboard.json `. +Create a new dashboard in Datadog and import the json file you downloaded. + +Using the Dashboard +=================== +We created a Datadog dashboard that resembles the Grafana dashboards. + +.. image:: datadog.png + +The dashboard contains some specific filtering and perspectives: +First, you can choose between shard, instance, dc, or cluster view. +This will aggregate the metrics in the graphs accordingly. +Second, you can filter to see specific shards, nodes, or DCs. + +.. note:: Pay attention that some of the combinations are conflicting. For example, you cannot filter by DC when looking at a cluster view. If no data displayed, remove the filters first. + +Adding Monitor +============== +Alerts in Datadog called Monitor. Download the monitor file :download:`monitor.json `. Go to the Monitor section in datadog and import the json. diff --git a/docs/source/procedures/datadog/index.rst b/docs/source/procedures/datadog/index.rst index 7cc9f37b9..7bb6f4d66 100644 --- a/docs/source/procedures/datadog/index.rst +++ b/docs/source/procedures/datadog/index.rst @@ -10,7 +10,7 @@ The integration consists of: 3. Loading Scylla dashboard to Datadog. 4. Optionally load Monitor (Alerts). -.. note:: Scylla Cloud users, use and update the proper configuration file. +.. note:: Scylla Cloud users, Check the cloud users `specific guide `_. Scylla Monitoring Datadog Integration Overview ============================================== @@ -31,17 +31,7 @@ Install And configure the Datadog Agent Start by following `Installation `_ guide. The datadog agent should run on a machine that can reach the Prometheus server. Once the Datadog agent is working, download the configuration file and place it under /etc/datadog-agent/conf.d/prometheus.d/conf.yaml - -Scylla Cloud Users -^^^^^^^^^^^^^^^^^^ -Scylla Cloud users, download the configuration file :download:`conf.yaml ` move it to: /etc/datadog-agent/conf.d/prometheus.d/conf.yaml - - -Edit the file. You must replace the cluster id (CLUSTER_ID) and the token (TOKEN). - -Other Scylla Users -^^^^^^^^^^^^^^^^^^ -Other Scylla users, download the configuration file :download:`conf.yaml ` and replace the ip address of the Prometheus server. +Download the configuration file :download:`conf.yaml ` and replace the ip address of the Prometheus server. Post configuration @@ -53,11 +43,9 @@ Restart the agent based on your installation. Scylla metrics should be visible i Add datadog recording rules =========================== -Non Scylla Cloud users, download the rules configuration file :download:`datadog.rules.yml ` if you need per-shard metrics, download :download:`datadog.rules-with-shards.yml ` and place it under prometheus/prom_rules/. +Download the rules configuration file :download:`datadog.rules.yml ` if you need per-shard metrics, download :download:`datadog.rules-with-shards.yml ` and place it under prometheus/prom_rules/. Per-shards metrics adds load and cost to both the Prometheus server and Datadog agent and server, so only use it if needed. -Cloud users, skip this step, it's been take care for by the cloud. - Upload the Dashboard ==================== Download the dashboard file :download:`dashboard.json `. diff --git a/docs/source/procedures/index.rst b/docs/source/procedures/index.rst index 76a24002a..c55898e8b 100644 --- a/docs/source/procedures/index.rst +++ b/docs/source/procedures/index.rst @@ -7,6 +7,7 @@ ScyllaDB Monitoring Stack Procedures :hidden: + Cloud Users Datadog integration Datadog Integration Alert Manager Adding and Modifying Dashboards @@ -14,6 +15,8 @@ ScyllaDB Monitoring Stack Procedures There are several reference guides available which give additional information. Choose a topic to begin: +* :doc:`Cloud Users Datadog integration ` +* :doc:`Datadog Integration ` * :doc:`Alert Manager ` * :doc:`Adding and Modifying Dashboards ` * :doc:`Upgrade Guides `