Skip to content

Commit

Permalink
Add Prometheus support (#9)
Browse files Browse the repository at this point in the history
* Docs updates

* Basic support for Prometheus

* Improved registering prometheus gagues

* Add Prometheus metrics support and update API routes

* Merge branch 'add-prometheus' of https://github.com/benc-uk/nanomon into add-prometheus

* Linting and refactor/renaming

* status label clashing

* Readme update

* Naming

* Diagram

* Diagram

* Diagram

* Complete refactor of Prometheus to the runner and not the API
So much cleaner is not even funny

* Prometheus all the things (again)

* Linty McLint Face

* Pipeline bumps

* tests update

* Tweak

* Couple of missed mongo v8 bumps
  • Loading branch information
benc-uk authored Oct 31, 2024
1 parent f66b48c commit 1a59dd5
Show file tree
Hide file tree
Showing 17 changed files with 215 additions and 44 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ci-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ jobs:
name: Testing & Linting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Go
uses: actions/setup-go@v4
uses: actions/setup-go@v5
with:
go-version-file: go.mod

Expand All @@ -54,7 +54,7 @@ jobs:
# Run API integration tests with output in JUnit format
- name: Run API integration tests
run: |
make run-all &
make run-all PROMETHEUS_ENABLED=1 &
sleep 20
make test-api TEST_REPORT=1
Expand All @@ -79,7 +79,7 @@ jobs:
runs-on: ubuntu-latest
needs: lint-test
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Build images
run: make images
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
name: Build and push images
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set BUILD_INFO with date
run: |
Expand Down
13 changes: 10 additions & 3 deletions api/api-reference.http
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,14 @@ DELETE {{endpoint}}/monitors
DELETE {{endpoint}}/results


### Get results
### Get results for a single monitor
GET {{endpoint}}/monitors/{{ createMon.id }}/results?max=20


### Get results for ALL monitors
GET {{endpoint}}/results?max=50


### Update a monitor
PUT {{endpoint}}/monitors/{{ createMon.id }}
Content-Type: application/json
Expand All @@ -62,7 +66,6 @@ Content-Type: application/json
"enabled": true
}


### Get results for ALL monitors
GET {{endpoint}}/results?max=50

Expand All @@ -86,4 +89,8 @@ Content-Type: application/json
"target": "http://localhost:8000/status",
"enabled": true
}
]
]


### Get Prometheus metrics from the runner, note this doesn't call the API server
GET http://localhost:8080/metrics
12 changes: 12 additions & 0 deletions api/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# API Specs & Schema

The API specs for the Nanomon API resides here

- OpenAPI v3 Specs: [openapi.yaml](./openapi.yaml)
- JSON Schema: [nanomon-schema.json](./nanomon-schema.json)

## Spec Sources

The API is described using [TypeSpec](https://typespec.io/), see [typespec/main.tsp](typespec/main.tsp)

To build and generate the OpenAPI output run `make generate-specs`
2 changes: 1 addition & 1 deletion build/Dockerfile.standalone
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ RUN GO111MODULE=on CGO_ENABLED=$CGO_ENABLED GOOS=linux \
# ================================================================================================
# === Stage 2: Munge everything into a single image
# ================================================================================================
FROM bitnami/mongodb:6.0
FROM bitnami/mongodb:8.0
WORKDIR /app

# Root certs
Expand Down
2 changes: 1 addition & 1 deletion deploy/azure/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ module mongodb './bicep-iac/modules/containers/app.bicep' = if (externalMongoDbU
params: {
location: location
name: 'mongo'
image: 'bitnami/mongodb:6.0'
image: 'bitnami/mongodb:8.0'
environmentId: containerAppEnv.outputs.id

ingressPort: 27017
Expand Down
Binary file modified etc/architecture.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion makefile
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ run-db: ## 🍃 Run MongoDB in container (needs Docker)
-e MONGODB_REPLICA_SET_MODE=primary \
-e MONGODB_ADVERTISED_HOSTNAME=localhost \
-e ALLOW_EMPTY_PASSWORD=yes \
--name mongo bitnami/mongodb:6.0
--name mongo bitnami/mongodb:8.0

run-all: ## 🚀 Run all everything locally, including DB with hot-reload
@figlet $@ || true
Expand Down
66 changes: 45 additions & 21 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,30 @@
# NanoMon - Monitoring Tool

NanoMon is a lightweight network and HTTP monitoring system, designed to be self hosted any container based system e.g. Kubernetes, or just run locally. It is written in Go and based on the now ubiquitous microservices pattern, so decomposed into several discreet but interlinked components. The features of Nanomon include:
NanoMon is a lightweight network and HTTP monitoring system, designed to be self hosted any container based system e.g. Kubernetes, or just run locally. It is written in Go and based on the now ubiquitous microservices pattern, so decomposed into several discreet but interlinked components. The features of NanoMon include:

- A range of configurable monitor types
- Web frontend for viewing results & editing/creating monitors
- Email alerting
- Range of deployment options
- Rules for setting monitor status and evaluating results
- OAuth2 based user sign-in and authentication
- Exporting of metrics & data to Prometheus

It also serves as a reference & learning app for microservices and is used by my Kubernetes workshop as the workload & application deployed in order to demonstrate Kubernetes concepts.

In a hurry? - Jump to the sections [running locally quick start](#local-dev-quick-start) or [deploying with Helm](#deploy-to-kubernetes-using-helm)

## Architecture

The architecture is fairly simple consisting of four application components and a database.
The architecture is a fairly standard design, consisting of four application components and a database.

![architecture diagram](./etc/architecture.drawio.png)

- **API** - API provides the main interface for the frontend and any custom clients. It is RESTful and runs over HTTP(S). It connects directly to the MongoDB database.
- **Runner** - Monitor runs are executed from here (see [concepts](#concepts) below). It connects directly to the MongoDB database, and reads monitor configuration data, and saves back & stores result data.
- **Frontend** - The web interface is a SPA (single page application), consisting of a static set of HTML, JS etc which executes from the user's browser. It connects directly to the API, and is [developed using Alpine.js](https://alpinejs.dev/)
- **Frontend** - The web interface is a SPA (single page application), consisting of a static set of HTML, JS etc which executes from the user's browser. It connects directly to the API, and is developed using Alpine.js](https://alpinejs.dev/)
- **Frontend Host** - The static content host for the frontend app, which contains no business logic. This simply serves frontend application files HTML, JS and CSS files over HTTP. In addition it exposes a small configuration endpoint.
- **MongoDB** - Backend data store, this is a vanilla instance of MongoDB v6. External services which provide MongoDB compatibility (e.g. Azure Cosmos DB) will also work
- **MongoDB** - Backend data store, this is a vanilla instance of MongoDB. Cloud and hosted services which provide MongoDB compatibility (e.g. Azure Cosmos DB) also work

## Concepts

Expand Down Expand Up @@ -117,10 +118,10 @@ See [Azure & Bicep docs](./deploy/azure/)
- Written in Go, [source code - /services/runner](./services/runner/)
- The runner requires a connection to MongoDB in order to start, it will exit if the connection fails.
- It keeps in sync with the `monitors` collection in the database, it does this one of two ways:
- Watching the collection using MongoDB change stream. This mode is prefered as it results in instant updates to changes made in the frontend & UI
- If change stream isn't supported, then the runner will poll the database and look for changes.
- Watching the collection using MongoDB change stream. This mode is preferred as it results in instant updates to changes made in the frontend & UI
- If change stream isn't supported, then the runner falls back to polling the database for changes.
- If configured the runner will send email alerts, see [alerting section below](#alerting-configuration)
- The runner doesn't listen to inbound network connections or bind to any ports.
- By default runner doesn't listen to inbound network connections or bind to any ports, the exception being if [Prometheus support is enabled](#appendix-prometheus)

### API

Expand Down Expand Up @@ -212,12 +213,14 @@ All three components (API, runner and frontend host) expect their configuration
| ALERT_LINK_BASEURL | When hosting NanoMon and you want the link in alert emails to point to the correct URL | http://localhost:3000 |
| POLLING_INTERVAL | Only used when in polling mode, when change stream isn't available | 10s |
| USE_POLLING | Force polling mode, by default MongoDB change streams will be tried, and polling mode used if that fails. | false |
| PROMETHEUS_ENABLE | Enable exporting metrics in Prometheus format (see below) | false |
| PROMETHEUS_PORT | HTTP port used to serve the Prometheus metrics | 8080 |

## Monitor Reference

Nanomon currently supports four types of monitor, which can be configured various ways, this is a reference for each monitor type, the runtime behaviour, properties that can be set, and the resulting outputs.
NanoMon currently supports four types of monitor, which can be configured various ways, this is a reference for each monitor type, the runtime behaviour, properties that can be set, and the resulting outputs.

### Type: HTTP
### HTTP Monitor

This makes a single HTTP request to the target URL each time it is run, it will return failed status in the event of network failure e.g. no network connection, unable to resolve name with DNS, invalid URL etc. Otherwise any sort of HTTP response will return an OK status. If you want to check the HTTP response code, use a rule as described above e.g. `status == 200` or `status >= 200 && status < 300`.

Expand All @@ -238,7 +241,7 @@ This makes a single HTTP request to the target URL each time it is run, it will
- _certExpiryDays_ - Number of days before the TLS cert of the site expires (number)
- _regexMatch_ - Match of the bodyRegex if any (number or string)

### Type: TCP
### TCP Monitor

Each time a TCP monitor runs it attempts to open a TCP connection to given host on the given port, it will return failed status in the event of network/connection failure, DNS resolution failure, or if the port is closed or blocked. Otherwise it will return OK.

Expand All @@ -250,7 +253,7 @@ Each time a TCP monitor runs it attempts to open a TCP connection to given host
- _respTime_ - Same as monitor value (number)
- _ipAddress_ - Resolved IP address of the target (string)

### Type: Ping
### Ping Monitor

This monitor will send one or more ICMP ping packets to the given host or IP address, it will return failed status in the event of network/connection failure, unable to resolve name with DNS Otherwise it will return OK.

Expand All @@ -270,9 +273,9 @@ Note. As this monitor needs to send ICMP packets, the runner process needs certa
- _packetLoss_ - Percentage of packet that were lost (number)
- _ipAddress_ - Resolved IP address of the target (string)

### Type: DNS
### DNS Monitor

The DNS monitor looks up DNS records and returns the results, if the name fails to resolve it will return failed status, otherwise it will return OK.
The DNS monitor looks up DNS records and returns the results as outputs, if the name fails to resolve it will return failed status, otherwise it will return OK.

- **Target:** The domain or hostname you want to lookup in DNS
- **Value:** Time for lookup to complete
Expand All @@ -295,19 +298,20 @@ The rule expression should always return a boolean, a false value will set the r
Some rule examples:

```bash
status >= 200 && status < 300 # Check for OK range of status codes
status == 200 && respTime < 5000 # Check status code and response time
body =~ 'some words' # Look for a string in the HTTP body
regexMatch == 'a value' # Check the of the RegEx
status >= 200 && status < 300 # Check for OK range of HTTP status codes
status == 200 && respTime < 5000 # Check status code and response time
'93.184.215.14' IN (result1, result2) # Check IP in multiple DNS results
body =~ 'some words' # Look for a string in the HTTP body
regexMatch == 'a value' # Check the value of the RegEx match
```

## Authentication & Security

By default there is no authentication, security or user sign-in. This is by design to make the app easy to deploy, and for use in learning scenarios and workshops.

Security is enabled using the Microsoft Identity Platform (now called Microsoft Entra ID) and OAuth2 + OIDC. With an app registered in Entra ID, then passing the app's client id as `AUTH_CLIENT_ID` to the NanoMon containers, this changes the behaviour of the application as follows:
Security is enabled using the Microsoft Identity Platform (now called Microsoft Entra ID) and OAuth2 + OIDC. With an app registered in Entra ID, then passing the app's client id as `AUTH_CLIENT_ID` to the NanoMon services. Setting this changes the behaviour of the application as follows:

- **API container** - Will enforce validation on certain API routes, like POST, PUT and DELETE, using OAuth 2.0 JWT bearer tokens. The token is checked for validity as follows; contains a scope matching `system.admin` and has an audience matching the client id.
- **API** - Will enforce validation on certain API routes, like POST, PUT and DELETE, using OAuth 2.0 JWT bearer tokens. The token is checked for validity as follows; contains a scope matching `system.admin` and has an audience matching the client id.
- **Frontend host** - The UI will show a sign-in button and only allow signed-in users to create, edit or delete monitors. Access tokens are fetched from Entra ID for the signed-in user with the `system.admin` scope, and then passed when calling the API as bearer tokens.

A basic guide to set this up:
Expand All @@ -325,9 +329,9 @@ NanoMon provides basic alerting support, which sends emails when monitors return

To enable alerting all of the env vars starting `ALERT_` will need to be set, there are six of these as described above. However as three of these variables have defaults, you only need to set the remaining three `ALERT_SMTP_PASSWORD`, `ALERT_SMTP_FROM` and `ALERT_SMTP_FROM` to switch the feature on, this will be using GMail to send emails. For the password you will need [setup an Google app password](https://support.google.com/accounts/answer/185833?hl=en) this will use your personal Google account to send the emails, so this probably isn't a good option for production (putting it mildly).

Known limitations:
Limitations:

- Only been tested with the GMail SMTP server, I have no idea if it'll work with others!
- Only been tested with the GMail SMTP server, I have no idea if it'll work with others! ¯\\\_(ツ)\_
- The from address is also used as the login user to the SMTP server.
- Only a single email address can be set to send emails to.
- Restarting the runner will resend alerts for failing monitors.
Expand All @@ -345,3 +349,23 @@ Azure Cosmos DB can be used as a database for NanoMon, however there are two thi
- An index must be added for the `date` field to the results collection, this can be done in the Azure Portal or with a single command:
`az cosmosdb mongodb collection update -a $COSMOS_ACCOUNT -g $COSMOS_RG -d nanomon -n results --idx '[{"key":{"keys":["_id"]}},{"key":{"keys":["date"]}}]'`
- Cosmos DB for MongoDB does have support for change streams, however it comes with [several limitations, most notably the lack of support for delete events](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/change-streams?tabs=javascript#current-limitations). Given these limitations NanoMon will fall back to polling when using Cosmos DB

## Appendix: Prometheus

NanoMon has support for Prometheus metrics, which are exposed from the runner service via HTTP in the standard text-based exposition format. When configuring NanoMon as a scraping target use the url `http://<runner-host>:8080/metrics` (the port can be changed with `PROMETHEUS_PORT`)

This feature is disabled by default and is enabled by setting the `PROMETHEUS_ENABLE` env var, when enabled the metrics can be fetched/scraped from the `/metrics` endpoint. The active monitors will be provided as labelled Prometheus gauges (one gauge per monitor), these labels will hold the values for the monitor status (0 = OK, 1 = Error, 2 = Failed), and values of each numeric monitor output (string outputs are not applicable to Prometheus)

Using Prometheus means you many not need to run the NanoMon frontend, as you can visualize the data through other tools, and optionally enable things like the Prometheus alerts.

Example of metrics

```
# HELP nanomon_example_monitor Example Monitor (http)
# TYPE nanomon_example_monitor gauge
nanomon_example_monitor{id="6722474d0c73d60184f14c73",result="_status",type="http"} 0
nanomon_example_monitor{id="6722474d0c73d60184f14c73",result="_value",type="http"} 178
nanomon_example_monitor{id="6722474d0c73d60184f14c73",result="bodyLen",type="http"} 15256
nanomon_example_monitor{id="6722474d0c73d60184f14c73",result="respTime",type="http"} 178
nanomon_example_monitor{id="6722474d0c73d60184f14c73",result="status",type="http"} 404
```
2 changes: 1 addition & 1 deletion scripts/run-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ if ! docker ps | grep -q mongo; then
-e MONGODB_REPLICA_SET_MODE=primary \
-e MONGODB_ADVERTISED_HOSTNAME=localhost \
-e ALLOW_EMPTY_PASSWORD=yes \
--name mongo bitnami/mongodb:6.0
--name mongo bitnami/mongodb:8.0
else
echo "### 🚀 MongoDB is already running"
fi
Expand Down
2 changes: 1 addition & 1 deletion services/api/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func (api API) addProtectedRoutes(r chi.Router) {
r.Put("/api/monitors/{id}", api.updateMonitor)
}

// Simply create an API with the given database context
// Create an API with the given database context
func NewAPI(db *database.DB) API {
return API{
api.NewBase(serviceName, version, buildInfo, true),
Expand Down
1 change: 1 addition & 0 deletions services/api/routes.go
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,7 @@ func (api API) deleteMonitors(resp http.ResponseWriter, req *http.Request) {
resp.WriteHeader(204)
}

// Reset and remove all results
func (api API) deleteResults(resp http.ResponseWriter, req *http.Request) {
log.Printf("### Resetting and deleting all results")

Expand Down
9 changes: 5 additions & 4 deletions services/api/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,14 @@ func main() {

// Note this will exit the process if the DB connection fails, so no need to check for errors
db := database.ConnectToDB()

// Core API wrapping base go-rest-api/pkg/api
api := NewAPI(db)

// Some basic middleware, change as you see fit, see: https://github.com/go-chi/chi#core-middlewares
router.Use(middleware.RealIP)
// Filtered request logger, exclude /metrics & /health endpoints
router.Use(logging.NewFilteredRequestLogger(regexp.MustCompile(`(^/api/metrics)|(^/api/health)`)))
router.Use(logging.NewFilteredRequestLogger(regexp.MustCompile(`(^/metrics)|(^/api/health)`)))
router.Use(middleware.Recoverer)

// Some custom middleware for very permissive CORS policy
Expand Down Expand Up @@ -80,9 +82,6 @@ func main() {

// Anonymous routes
router.Group(func(publicRouter chi.Router) {
// Add Prometheus metrics endpoint, must be before the other routes
api.AddMetricsEndpoint(publicRouter, "api/metrics")

// Add optional root, health & status endpoints
api.AddHealthEndpoint(publicRouter, "api/health")
api.AddStatusEndpoint(publicRouter, "api/status")
Expand All @@ -92,6 +91,8 @@ func main() {
api.addAnonymousRoutes(publicRouter)
})

log.Printf("### ⚓ API routes configured")

// Start ticker to check the DB connection, and set the health status
go func() {
ticker := time.Tick(5 * time.Second)
Expand Down
Loading

0 comments on commit 1a59dd5

Please sign in to comment.