Skip to content

Commit

Permalink
Merge pull request #2213 from amnonh/update_docs
Browse files Browse the repository at this point in the history
Update docs
  • Loading branch information
amnonh authored Mar 10, 2024
2 parents 382da47 + 79695bc commit 02c54be
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/source/install/min-prod-hw.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Minimal Production System Recommendations
-----------------------------------------

* **CPU** - at least 2 physical cores/ 4vCPUs
* **CPU** - For clusters with up to 100 cores use 2vCPUS, for larger clusters 4vCPUs
* **Memory** - 15GB+ DRAM and proportional to the number of cores.
* **Disk** - persistent disk storage is proportional to the number of cores and Prometheus retention period (see the following section)
* **Network** - 1GbE/10GbE preferred
Expand Down
60 changes: 60 additions & 0 deletions docs/source/procedures/datadog/cloud-integration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
=============================================
ScyllaDB Cloud Monitoring Datadog Integration
=============================================

For security reasons, the ScyllaDB cloud does not have direct access to the Prometheus server.
To allow external server scrapping, you will need to enable the Prometheus proxy.
The Datadog agent reads from the proxy, which reads from the Promethues server.

1. Installing and configuring the Datadog Agent.
2. Add Datadog recording rules.
3. Loading Scylla dashboard to Datadog.
4. Optionally load Monitor (Alerts).

Scylla Monitoring Datadog Integration Overview
==============================================
A typical ScyllaDB cluster generates thousands of metrics, sometimes even tens of thousands.
The sheer number of metrics is too much for Datadog.

Instead of letting the Datadog agent scrap all metrics, the monitoring stack marks a small subset of metrics with a label and lets the Datadog agent scrap only those.

Install And configure the Datadog Agent
=======================================

Start by following `Installation <https://docs.datadoghq.com/agent/>`_ guide. The datadog agent should run on a machine that can reach the Prometheus Proxy server.

Once the Datadog agent is working, download the configuration file and place it under /etc/datadog-agent/conf.d/prometheus.d/conf.yaml

Download the configuration file :download:`conf.yaml <cloud-conf.yaml>` move it to: /etc/datadog-agent/conf.d/prometheus.d/conf.yaml


Edit the file. You must replace the cluster id (CLUSTER_ID) and the token (TOKEN).

Post configuration
^^^^^^^^^^^^^^^^^^
Restart the agent based on your installation. Scylla metrics should be visible in Datadog.


.. note:: By default, Datadog will not scrap per-shard metrics. To enable per-shard metrics, edit the conf.yaml file and replace dd=~"1" with dd=~"1|2"

Upload the Dashboard
====================
Download the dashboard file :download:`dashboard.json <dashboard.json>`.
Create a new dashboard in Datadog and import the json file you downloaded.

Using the Dashboard
===================
We created a Datadog dashboard that resembles the Grafana dashboards.

.. image:: datadog.png

The dashboard contains some specific filtering and perspectives:
First, you can choose between shard, instance, dc, or cluster view.
This will aggregate the metrics in the graphs accordingly.
Second, you can filter to see specific shards, nodes, or DCs.

.. note:: Pay attention that some of the combinations are conflicting. For example, you cannot filter by DC when looking at a cluster view. If no data displayed, remove the filters first.

Adding Monitor
==============
Alerts in Datadog called Monitor. Download the monitor file :download:`monitor.json <monitor.json>`. Go to the Monitor section in datadog and import the json.
18 changes: 3 additions & 15 deletions docs/source/procedures/datadog/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The integration consists of:
3. Loading Scylla dashboard to Datadog.
4. Optionally load Monitor (Alerts).

.. note:: Scylla Cloud users, use and update the proper configuration file.
.. note:: Scylla Cloud users, Check the cloud users `specific guide <cloud-integration>`_.

Scylla Monitoring Datadog Integration Overview
==============================================
Expand All @@ -31,17 +31,7 @@ Install And configure the Datadog Agent
Start by following `Installation <https://docs.datadoghq.com/agent/>`_ guide. The datadog agent should run on a machine that can reach the Prometheus server.

Once the Datadog agent is working, download the configuration file and place it under /etc/datadog-agent/conf.d/prometheus.d/conf.yaml

Scylla Cloud Users
^^^^^^^^^^^^^^^^^^
Scylla Cloud users, download the configuration file :download:`conf.yaml <cloud-conf.yaml>` move it to: /etc/datadog-agent/conf.d/prometheus.d/conf.yaml


Edit the file. You must replace the cluster id (CLUSTER_ID) and the token (TOKEN).

Other Scylla Users
^^^^^^^^^^^^^^^^^^
Other Scylla users, download the configuration file :download:`conf.yaml <conf.yaml>` and replace the ip address of the Prometheus server.
Download the configuration file :download:`conf.yaml <conf.yaml>` and replace the ip address of the Prometheus server.


Post configuration
Expand All @@ -53,11 +43,9 @@ Restart the agent based on your installation. Scylla metrics should be visible i

Add datadog recording rules
===========================
Non Scylla Cloud users, download the rules configuration file :download:`datadog.rules.yml <datadog.rules.yml>` if you need per-shard metrics, download :download:`datadog.rules-with-shards.yml <datadog.rules-with-shards.yml>` and place it under prometheus/prom_rules/.
Download the rules configuration file :download:`datadog.rules.yml <datadog.rules.yml>` if you need per-shard metrics, download :download:`datadog.rules-with-shards.yml <datadog.rules-with-shards.yml>` and place it under prometheus/prom_rules/.
Per-shards metrics adds load and cost to both the Prometheus server and Datadog agent and server, so only use it if needed.

Cloud users, skip this step, it's been take care for by the cloud.

Upload the Dashboard
====================
Download the dashboard file :download:`dashboard.json <dashboard.json>`.
Expand Down
3 changes: 3 additions & 0 deletions docs/source/procedures/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@ ScyllaDB Monitoring Stack Procedures
:hidden:


Cloud Users Datadog integration <datadog/cloud-integration>
Datadog Integration <datadog/index>
Alert Manager <alerts/index>
Adding and Modifying Dashboards <updating-dashboard>
Upgrade Guides </upgrade/index>

There are several reference guides available which give additional information. Choose a topic to begin:

* :doc:`Cloud Users Datadog integration <datadog/cloud-integration>`
* :doc:`Datadog Integration <datadog/index>`
* :doc:`Alert Manager <alerts/index>`
* :doc:`Adding and Modifying Dashboards <updating-dashboard>`
* :doc:`Upgrade Guides </upgrade/index>`

0 comments on commit 02c54be

Please sign in to comment.