Skip to content

Commit

Permalink
GitBook: [#63] Non Rollup Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Sachin Bansal authored and gitbook-bot committed Dec 29, 2021
1 parent ba28e1e commit 87f223c
Show file tree
Hide file tree
Showing 39 changed files with 144 additions and 106 deletions.
Binary file added .gitbook/assets/AddConnection (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AddConnection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Anomalies.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyCard_Daily.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyCard_Daily_cropped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyCard_Hourly.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyCard_Hourly_cropped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyDefinition_CueL.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyDefinitions (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyDefinitions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/AnomalyDeviation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Dataset_Mapping_cropped (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Dataset_Mapping_cropped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Dataset_SQL.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Dataset_SQL_cropped (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Dataset_SQL_cropped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/MinAvgValue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/MinContribution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Overview.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Overview_Anomaly (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Overview_Anomaly.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Overview_RCA (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/Overview_RCA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/RCA_Analyze.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/RCA_Logs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .gitbook/assets/RCA_Result.png
Binary file added .gitbook/assets/TopN.png
Binary file added .gitbook/assets/cueObserve.png
Binary file added .gitbook/assets/new.png
82 changes: 39 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,74 @@
<p align="center">
<a href="https://cueobserve.cuebook.ai" target="_blank">
<img alt="CueObserve Logo" width="300" src="docs/images/cueObserve.png">
</a>
</p>
<p align="center">
<a href="https://codeclimate.com/github/cuebook/CueObserve/maintainability"><img src="https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/maintainability" /></a>
<a href="https://codeclimate.com/github/cuebook/CueObserve/test_coverage"><img src="https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/test_coverage" /></a>
<a href="https://github.com/cuebook/cueobserve/actions/workflows/pr_checks.yml">
<img src="https://github.com/cuebook/cueobserve/actions/workflows/pr_checks.yml/badge.svg" alt="Test Coverage">
</a>
<a href="https://github.com/cuebook/cueobserve/blob/main/LICENSE.md">
<img src="https://img.shields.io/github/license/cuebook/cueobserve" alt="License">
</a>
</p>
<br>
# Overview

[![CueObserve Logo](.gitbook/assets/cueObserve.png)](https://cueobserve.cuebook.ai)

[![](https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/maintainability)](https://codeclimate.com/github/cuebook/CueObserve/maintainability) [![](https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/test\_coverage)](https://codeclimate.com/github/cuebook/CueObserve/test\_coverage) [![Test Coverage](https://github.com/cuebook/cueobserve/actions/workflows/pr\_checks.yml/badge.svg) ](https://github.com/cuebook/cueobserve/actions/workflows/pr\_checks.yml)[![License](https://img.shields.io/github/license/cuebook/cueobserve)](https://github.com/cuebook/cueobserve/blob/main/LICENSE.md)

\


CueObserve helps you monitor your metrics. Know when, where, and why a metric isn't right.

CueObserve uses **timeseries Anomaly detection** to find **where** and **when** a metric isn't right. It then offers **one-click Root Cause analysis** so that you know **why** a metric isn't right.

CueObserve works with data in your SQL data warehouses and databases. It currently supports Snowflake, BigQuery, Redshift, Druid, Postgres, MySQL, SQL Server and ClickHouse.

![CueObserve Anomaly](<.gitbook/assets/Overview\_Anomaly (1).png>) ![CueObserve RCA](<.gitbook/assets/Overview\_RCA (1).png>)

![CueObserve Anomaly](docs/images/Overview_Anomaly.png)
![CueObserve RCA](docs/images/Overview_RCA.png)
### Getting Started


## Getting Started
Install via Docker

```
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose -f cueobserve-docker-compose.yml up -d
```
Now visit [http://localhost:3000](http://localhost:3000) in your browser.

## Demo Video
<a href="http://www.youtube.com/watch?feature=player_embedded&v=VZvgNa65GQU" target="_blank">
<img src="http://img.youtube.com/vi/VZvgNa65GQU/hqdefault.jpg" alt="Watch CueObserve video"/>
</a>
Now visit [http://localhost:3000](http://localhost:3000) in your browser.

### Demo Video

[![Watch CueObserve video](http://img.youtube.com/vi/VZvgNa65GQU/hqdefault.jpg)](http://www.youtube.com/watch?feature=player\_embedded\&v=VZvgNa65GQU)

### How it works

## How it works
You write a SQL GROUP BY query, map its columns as dimensions and measures, and save it as a virtual Dataset.

![Dataset SQL](docs/images/Dataset_SQL_cropped.png)
![Dataset SQL](<.gitbook/assets/Dataset\_SQL\_cropped (1).png>)

![Dataset Schema Map](docs/images/Dataset_Mapping_cropped.png)
![Dataset Schema Map](<.gitbook/assets/Dataset\_Mapping\_cropped (1).png>)

You then define one or more anomaly detection jobs on the dataset.

![Anomaly Definition](docs/images/AnomalyDefinitions.png)
![Anomaly Definition](<.gitbook/assets/AnomalyDefinitions (1).png>)

When an anomaly detection job runs, CueObserve does the following:

1. Executes the SQL GROUP BY query on your data warehouse and stores the result as a Pandas dataframe.
2. Generates one or more timeseries from the dataframe, as defined in your anomaly detection job.
3. Generates a forecast for each timeseries using [Prophet](https://github.com/facebook/prophet).
4. Creates a visual card for each timeseries. Marks the card as an anomaly if the last data point is anomalous.

## Features
- Automated SQL to timeseries transformation.
- Run anomaly detection on the aggregate metric or split it by any dimension. Limit the split to significant dimension values.
- Use Prophet or simple mathematical rules to detect anomalies.
- In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
- Slack alerts when anomalies are detected.
- Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs.
### Features

* Automated SQL to timeseries transformation.
* Run anomaly detection on the aggregate metric or split it by any dimension. Limit the split to significant dimension values.
* Use Prophet or simple mathematical rules to detect anomalies.
* In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
* Slack alerts when anomalies are detected.
* Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs.

### Limitations
- Currently supports Prophet for timeseries forecasting.
- Not being built for real-time anomaly detection on streaming data.
#### Limitations

## Support
For general help using CueObserve, read the [documentation](https://cueobserve.cuebook.ai/), or go to [Github Discussions](https://github.com/cuebook/cueobserve/discussions).
* Currently supports Prophet for timeseries forecasting.
* Not being built for real-time anomaly detection on streaming data.

### Support

For general help using CueObserve, read the [documentation](https://cueobserve.cuebook.ai), or go to [Github Discussions](https://github.com/cuebook/cueobserve/discussions).

To report a bug or request a feature, open an [issue](https://github.com/cuebook/cueobserve/issues).

## Contributing
We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an [issue](https://github.com/cuebook/cueobserve/issues) or a [discussion](https://github.com/cuebook/cueobserve/discussions). Contributors are expected to adhere to our [code of conduct](https://github.com/cuebook/cueobserve/blob/main/CODE_OF_CONDUCT.md).
### Contributing

We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an [issue](https://github.com/cuebook/cueobserve/issues) or a [discussion](https://github.com/cuebook/cueobserve/discussions). Contributors are expected to adhere to our [code of conduct](https://github.com/cuebook/cueobserve/blob/main/CODE\_OF\_CONDUCT.md).
6 changes: 3 additions & 3 deletions anomalies.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

Anomalies screen lists all published anomalies. Click on a row to view its anomaly card.

![](.gitbook/assets/anomalies.png)
![](.gitbook/assets/Anomalies.png)

Daily anomalies automatically unpublish if there's no anomaly for the next 5 days. Hourly anomalies unpublish after 1 day.

## Anomaly Cards

Anomaly cards follow a template. If you want, you can modify the templates.

![Hourly Anomaly card](.gitbook/assets/anomalycard_hourly_cropped.png)
![Hourly Anomaly card](.gitbook/assets/AnomalyCard\_Hourly\_cropped.png)

![Daily Anomaly card](.gitbook/assets/anomalycard_daily_cropped.png)
![Daily Anomaly card](.gitbook/assets/AnomalyCard\_Daily\_cropped.png)
14 changes: 7 additions & 7 deletions anomaly-definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

You can define one or more anomaly detection jobs on a dataset. The anomaly detection job can monitor a measure at an aggregate level or split the measure by a dimension.

To define an anomaly job, you
To define an anomaly job, you&#x20;

1. Select a dataset
2. Select a measure from the dataset
3. Select a dimension to split the measure _(optional)_
4. Select an anomaly rule

![](.gitbook/assets/anomalydefinitions.png)
![](.gitbook/assets/AnomalyDefinitions.png)

## Split Measure by Dimension

Expand All @@ -19,7 +19,7 @@ To split a measure by a dimension, select the dimension and then limit the numbe

Choose the optional **High/Low** to detect only one type of anomalies. Choose **High** for an increase in measure or **Low** for a drop in measure.

![](.gitbook/assets/anomalydefinition_cuel.gif)
![](.gitbook/assets/AnomalyDefinition\_CueL.gif)

### Limit Dimension Values

Expand All @@ -31,21 +31,21 @@ Top N limits the number of dimension values based on the dimension value's contr

Say you want to monitor Orders measure. But you want to monitor it for your top 10 states only. You would then define anomaly something like below:

![](.gitbook/assets/topn.png)
![](.gitbook/assets/TopN.png)

#### Min % Contribution

Minimum % Contribution limits the number of dimension values based on the dimension value's contribution to the measure.

Say you want to monitor Orders measure for every state that contributed at least 2% to the total Orders, your anomaly definition would look something like below:

![](.gitbook/assets/mincontribution.png)
![](.gitbook/assets/MinContribution.png)

#### Min Avg Value

Minimum Average Value limits the number of dimension values based on the measure's average value.

![](.gitbook/assets/minavgvalue.png)
![](.gitbook/assets/MinAvgValue.png)

In the example above, only states where _average(Orders) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders.

Expand All @@ -64,7 +64,7 @@ This algorithm uses the open-source [Prophet](https://github.com/facebook/prophe

The metric's percentage deviation (_45% in the image below_) is calculated with respect to the threshold of the forecast's confidence range.

![](.gitbook/assets/anomalydeviation.png)
![](.gitbook/assets/AnomalyDeviation.png)

### Percentage Change

Expand Down
15 changes: 13 additions & 2 deletions datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,26 @@

Datasets are similar to aggregated SQL VIEWS of your data. When you run an anomaly detection job, the associated dataset's SQL query is run and the results are stored as a Pandas dataframe in memory.

![](.gitbook/assets/dataset_sql.png)
![](.gitbook/assets/Dataset\_SQL.png)

You write a SQL GROUP BY query with aggregate functions to roll-up your data. You then map the columns as dimensions or measures.

![](.gitbook/assets/dataset_mapping_cropped.png)
![](.gitbook/assets/Dataset\_Mapping\_cropped.png)

1. Dataset must have only one timestamp column. This timestamp column is used to generate timeseries data for anomaly detection.
2. Dataset must have at least one aggregate column. CueObserve currently supports only COUNT or SUM as aggregate functions. Aggregate columns must be mapped as measures.
3. Dataset can have one or more dimension columns (optional).
4. Dataset can be classified as a non-rollup dataset, details are provided below.

### **Non-Rollup Datasets**

A dataset can be created as a non-rollup dataset using a switch to inform the system that it does not need to roll up aggregate the data during the pre-processing of the data.

![Non Roll-up switch](.gitbook/assets/new.png)

By default, all datasets are "rolled up" i.e. metric data points are aggregated(summed up) on the timestamp buckets for a specific dimension value.

But for metrics like percentage etc. such aggregation might not be relevant, so one can specify to the system that it is a non-rollup dataset. Currently we support only single dimension on Non-rollup datasets to avoid duplicate timestamp values after pre-processing.

## SQL GROUP BY Query

Expand Down
16 changes: 8 additions & 8 deletions development.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ description: >-
CueObserve has multi-service architecture, with services as mentioned:

1. `Frontend` single-page application written on [ReactJS](https://reactjs.org). It's code can be found in `ui` folder and runs on [http://localhost:3000/](https://reactjs.org).
2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly.
2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly.&#x20;
3. `Alerts` micro-service, currently responsible for sending alerting/notifications only to slack. It's code is in `alerts-api` folder and runs on [localhost:8100](http://localhost:8100).
4. [Celery](https://docs.celeryproject.org) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery.
5. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks.
Expand All @@ -25,14 +25,14 @@ Get the code by cloning our open source [github repo](https://github.com/cuebook
```
git clone https://github.com/cuebook/CueObserve.git
cd CueObserve
docker-compose -f docker-compose-dev.yml --env-file .env up --build
docker-compose -f docker-compose-dev.yml --env-file .env.dev up --build
```

`docker-compose`'s build command will pull several components and install them on local, so this will take a few minutes to complete.

### Backend Development

The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework.
The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework.&#x20;

#### Configure environment variables

Expand All @@ -57,17 +57,17 @@ export DJANGO_SUPERUSER_EMAIL="[email protected]"
export `=False
```

Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`.
Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`.&#x20;

The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com).
The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com).&#x20;

#### Celery Development
#### Celery Development&#x20;

CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.
CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.&#x20;

### Testing

At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap.
At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap.&#x20;

Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/). To run test cases `exec` into cueo-backend and run command

Expand Down
28 changes: 8 additions & 20 deletions getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,23 @@
## Install via Docker-Compose

```
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose -f cueobserve-docker-compose.yml up -d
mkdir -p ~/cuebook
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose-prod.yml -q -O ~/cuebook/docker-compose-prod.yml
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/.env -q -O ~/cuebook/.env
cd ~/cuebook
```

**Development Mode:**

```
docker-compose -f docker-compose-dev.yml up -d
```

**OR Production Mode:**

```
docker-compose up -d
```

**OR** Install via Docker **(Deprecated Method)**

```
docker run -p 3000:3000 cuebook/cueobserve
docker-compose -f docker-compose-prod.yml --env-file .env up -d
```

Now visit [localhost:3000](http://localhost:3000) in your browser.
Now visit [localhost:3000](http://localhost:3000) in your browser.&#x20;

## Add Connection

Go to the Connections screen to create a connection.

![](<.gitbook/assets/addconnection (1).png>)
![](<.gitbook/assets/AddConnection (1).png>)

## Add Dataset

Expand All @@ -43,6 +31,6 @@ Create an anomaly detection job on your dataset. See [Anomaly Definitions](anoma

Once you have created an anomaly job, click on the \`Run\` icon button to trigger the anomaly job. It might take a few seconds for the job to execute.

![](.gitbook/assets/anomalydefinitions.png)
![](.gitbook/assets/AnomalyDefinitions.png)

Once the job is successful, go to the Anomalies screen to view your anomalies.
Loading

0 comments on commit 87f223c

Please sign in to comment.