Skip to content

Event Volume Anomaly Detection Solution Accelerator #1151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 44 additions & 0 deletions tutorials/event-volume-anomaly-detection/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
position: 1
title: Event Volume Anomaly Detection
---

Welcome to the **Event Volume Anomaly Detection** solution accelerator.

This accelerator provides an introduction to detecting anomalies in Snowplow event volumes, specifically within **BigQuery** on **Google Cloud Platform (GCP)**. The focus is on identifying anomalies in **event volume data**, using data stored in BigQuery. This can help detect potential tracking issues or sudden increases in failed events by monitoring event volume trends.

This guide walks you through the steps required to:
- Load Snowplow event data into **BigQuery**
- Train an **ARIMA+** model for time series anomaly detection
- Identify statistically significant drops or spikes in event volumes
- Visualize anomalies using **matplotlib** and **seaborn**

![Application Output](images/anomaly-detection.png)

## Requirements

To use this accelerator, you need:
- **Access to a GCP project** (including the project ID)
- **BigQuery permissions** (read, write, and query access)
- *A Snowplow pipeline is not required*, as sample data is provided.

This accelerator typically takes around **30 minutes** to complete. The notebook requires minimal computational resources with the provided sample dataset.

## Get started

This accelerator is available on **GitHub**:
- [GitHub repository](https://github.com/snowplow-industry-solutions/event-volume-anomaly-detection/blob/main/notebooks/bigquery/anomaly_detection.ipynb)

You can clone this repository and run it locally on your PC or import it into a [Google Colab notebook](https://colab.research.google.com/).

## Next steps

Once you've completed this accelerator, you can:
- Adapt the model to detect other anomalies in your own Snowplow event data
- Fine-tune the confidence threshold to reduce false positives
- Expand into detecting other types of anomalies such as missing properties or bot traffic

By leveraging Snowplow’s granular event data, you can proactively monitor data quality.

Ready to get started? [Jump into the notebook](https://github.com/snowplow-industry-solutions/event-volume-anomaly-detection/blob/main/notebooks/bigquery/anomaly_detection.ipynb) and start detecting anomalies!

5 changes: 5 additions & 0 deletions tutorials/event-volume-anomaly-detection/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"title": "Event Volume Anomaly Detection",
"label": "Solution accelerator",
"description": "How to build a real-time event volume anomaly detection system using Snowplow and BigQuery"
}