Skip to content

Commit

Permalink
0721 public docs release (#1039)
Browse files Browse the repository at this point in the history
* [DOCS] - 910 add pipReport and pipeline_report structure in docs (#945)

* adding pipReport in data dictionary

* change in pipReport location

* change in pipeline report data dictionary

* [Doc] - ETLDataPrefix - Migration / Merge Process (#977)

* Added change for Snapshot and Migration

* Added the doc for Migration, Restore and Snapshot Process

* Changed the images for job running

* Changed the images for job running

* Changed the images for job running

* Changed the doc as per comments

* Changed the doc as per comments

---------

Co-authored-by: Sourav Banerjee <[email protected]>

* [DOCS] adding gcp mws cluster logs setup (#972)

* adding gcp mws cluster logs setup

* removed customer reference from GCP Storage access requirements

* 964 dbsql warehouse docs (#995)

* adding dbsql warehouse

* adding docs for warehouse

* change in warehouse data dictionary

* [Doc] -  Verbose Audit Logging (#973)

* Added Change for Verbose Audit Logging

* Added the latest image for erd diagram

* Changed the doc as per comments

* Update notebookCommands_gold doc to add the notebookcommands scope

---------

Co-authored-by: Sourav Banerjee <[email protected]>

* [DOCS] General doc updates (#1037)

* made some updates to Modules page

* Added info about accountMod table, filled in todos, fixed typos

* Added note about NotebookCommands being only available for notebooks run on clusters

* Sepparated GCP and AWS content

* Updated UC FAQ

* Fixed issues that Holly found in #996

* Added changes to the changelogs plus added Dashboards page

* updated milestone link

* update static files

---------

Co-authored-by: Aman <[email protected]>
Co-authored-by: Sourav Banerjee <[email protected]>
Co-authored-by: Sourav Banerjee <[email protected]>
  • Loading branch information
4 people authored Sep 11, 2023
1 parent 340b19c commit 4a7a68f
Show file tree
Hide file tree
Showing 99 changed files with 16,216 additions and 1,508 deletions.
96 changes: 73 additions & 23 deletions docs-compose/overwatch_docs/content/ChangeLog/_index.md

Large diffs are not rendered by default.

82 changes: 82 additions & 0 deletions docs-compose/overwatch_docs/content/Dashboards/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Dashboards"
date: 2022-12-13T13:49:40-05:00
---

We have created a set of dashboards containing some essential, pre-defined metrics, to help you get started on your Overwatch journey.
These are meant to be a learning resource for you to understand the data model, as well as a practical resource to help you get value out of Overwatch right away.
As a first iteration, these are notebook-based dashboards, in the future we'll have these available in DBSQL as well.

## Available Dashboards

### Workspace
Start here, this is your initial overall view of the state of the workspaces you monitor with Overwatch

Metrics available in this dashboard
| ----------- | -----------|
| Daily cluster spend chart | Compute Time of scheduled jobs on each workspace |
| Cluster spend on each workspace | Node type count by azure workspace |
| DBU Cost vs Compute Cost | Node type count by AWS workspace |
| Cluster spend by type on each workspace | Node type cost by azure workspace |
| Cluster count by type on each workspace | Node type cost by AWS workspace |
| Count of scheduled jobs on each workspace | Workspace Tags count by workspace |

### Clusters
This dashboard will deep dive into cluster-specific metrics

Metrics available in this dashboard
| ----------- | -----------|
| DBU Spend by cluster category | Total DBU Incurred by top spending clusters per category |
| DBU spend by the most expensive cluster per day per workspace | Percentage of Autoscaling clusters per category |
| Top spending clusters per day Scale up time of clusters (with & without pools) by cluster category |
| DBU Spend by the top 3 expensive Interactive clusters (without auto-termination) per day | Cluster Failure States and count of failures |
| Cluster count in each category | Cost of cluster failures per Failure States per workspace |
| Cluster node type breakdown | Cluster Failure States and failure count distribution |
| Cluster node type breakdown by potential | Interactive cluster restarts per day per cluster |
| Cluster node potential breakdown by cluster category |

### Job
This dashboard will deep dive into job/workload specific metrics

Metrics available in this dashboard
| ----------- | -----------|
| Daily Cost on Jobs | Daily Job status distribution |
| Job Count by workspace | Number of job Runs (Succeeded vs Failed) |
| Jobs running in Interactive Clusters (Top 20 workspaces) | Compute Time of Run Failures By Workspace |

### Notebook
In this dashboard we will cover metrics that could help in the tuning of workloads by analyzing the code run in Notebooks.

Metrics available in this dashboard
| ----------- | -----------|
| Data Throughput Per Notebook Path | Largest shuffle explosions
| Longest Running Notebooks | Total Spills per notebook run |
| Data Throughput Per Notebook Path | Largest shuffle explosions |
| Top notebooks returning a lot of data to the UI | Processing Speed (MB/sec) |
| Spark Actions (count) | Longest Running Failed Spark Jobs |
| Notebooks with the largest records | Serialization/deserialization time (ExecutorDeserializeTime + ResultSerializationTime) |
| Task count by task type | Notebook compute hours |
| Large Tasks Count (> 400MB) | Most popular (distinct users) notebooks per path depth |
| Jobs Executing on Notebooks (count) | || Longest Running Notebooks | Total Spills per notebook run
| Top notebooks returning a lot of data to the UI | Processing Speed (MB/sec)
| Spark Actions (count) | Longest Running Failed Spark Jobs
| Notebooks with the largest records | Serialization/deserialization time (ExecutorDeserializeTime + ResultSerializationTime)
| Task count by task type | Notebook compute hours
| Large Tasks Count (> 400MB) | Most popular (distinct users) notebooks per path depth
| Jobs Executing on Notebooks (count)

### DBSQL
A generic view at the performance of your DBSQL-specific queries

Metrics available in this dashboard
| ----------- | -----------|
| Global query duration and Global Query Core Hours | Core hours by users (Top 20) |
| Query count through time | Distinct user count my warehouse |
| Query count by warehouse | Core hours by date |
| Core hours by warehouse | Core hours by Is Serverless |

## Dashboard files
Please download the dbc file and import it into your workspace, read through the readme, and you should be
able to get them running right away.

- **Version 1** - [DBC](/assets/Dashboards/Dashboards_v1.0.dbc) Released September 11, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,6 @@ need be carried out on a single workspace per Overwatch output target (i.e. stor
* This is critical as there are some regressions in 10.4LTS that have been handled in Overwatch 0.6.1 but not
in previous versions
* (Azure) Follow recommendations for Event Hub. If using less than 32 partitions, increase this immediately.
* If workspace is VERY large or has 50+ concurrent users, suggest increasing `minEventsPerTrigger` in the audit
log config. This is defaulted to 10, increase it to 100+. TLDR, this is the minimum number of events in Event Hub
allowed before Overwatch will progress past the audit log events module. If the workspace is generating more than 10
events faster than Overwatch can complete the streaming batch then the module may never complete or may get stuck
here for some time.
* Consider increasing `maxEventsPerTrigger` from default of 10000 to 50000 to load more audit logs per batch. This
will only help if there are 10K+ audit events per day on the workspace.
* Ensure direct networking routes are used between storage and compute and that they are in the same region
* Overwatch target can be in separate region for multi-region architectures but sources should be co-located with
compute.
Expand Down
Loading

0 comments on commit 4a7a68f

Please sign in to comment.