Skip to content

Commit

Permalink
Improve documentation (#59)
Browse files Browse the repository at this point in the history
* Fix typo in change_tracker example

* Improve installation and getting started docs in README

* Write section about remote storage

* Write section about using in CD

* Remove old formulations

* Update README.md

Co-authored-by: Tobias Kongsvik <[email protected]>

* Update README.md

Co-authored-by: Tobias Kongsvik <[email protected]>

* Update README.md

Co-authored-by: Tobias Kongsvik <[email protected]>

Co-authored-by: Tobias Kongsvik <[email protected]>
  • Loading branch information
mikberg and tokongs authored Mar 9, 2022
1 parent 86091ee commit 8138e85
Show file tree
Hide file tree
Showing 2 changed files with 179 additions and 76 deletions.
245 changes: 174 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,117 +1,220 @@
# Snapshots
# Bazel Snapshots

Snapshots is a mechanism for doing _incremental deploys_ with Bazel. It can be used to find out which targets have changed between two versions of a Bazel workspace.
For instance, a continuous deployment (CD) mechanism can make snapshots of what has been deployed, thus only deploy what's necessary.
Bazel Snapshots is a tool for finding the changed targets between two versions in a Bazel project.
It can be used to implement _incremental deployment_ – only re-deploying things that have changed – or to implement any other side effect of making a change which affects an output, such as sending notifications or interacting with pull requests.

More generally, it can be used to implement side effects for changes to targets in Bazel workspaces.
Bazel Snapshots works by creating digests of outputs and recording them to files, which can be compared later.
By comparing two snapshots, we get a JSON structure containing the changed outputs, together with the metadata.
Implementing specific side-effects, such as deploying, is left for other tools.

## Overview
Bazel Snapshots also has built-in support for storing snapshots and references to them remotely, so that they can be easily accessed and interacted with.

Snapshots consists of the following parts:
The way Bazel Snapshots works is in contrast to other approaches with similar goals, such as [https://github.com/Tinder/bazel-diff](bazel-diff), which analyses Bazel's graphs.
In short, Bazel Snapshots discovers which outputs have actually changed, whereas Bazel graph analysis methods discover which outputs could be affected by some change.
The main advantage with our approach is less over-reporting and more explicit control.

* A rule `change_tracker`: Used to create an arbitrary _change tracker_ for some Bazel target.
* A skylark function `create_tracker_file`: Used to integrate with existing rules, so that they output change trackers in addition to their existing output.
* A tool `snapshots`: A CLI which is used to create, push, tag and create diffs between different snapshots.
## Installation

### Change Trackers
### Use Pre-Built Binaries (recommended)

Add Bazel Snapshots to your `WORKSPACE` file.
See [Releases](https://github.com/cognitedata/bazel-snapshots/releases) for the specific snippet for the latest release.

```skylark
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "com_cognitedata_bazel_snapshots",
sha256 = "...",
url = "https://github.com/cognitedata/bazel-snapshots/releases/download/<VERSION>/snapshots-<VERSION>.tar",
)
load("@com_cognitedata_bazel_snapshots//:repo.bzl", "snapshots_repos")
snapshots_repos()
```

_NOTE:_ If you're using [rules_docker](https://github.com/bazelbuild/rules_docker), put `snapshots_repos()` later in the `WORKSPACE` file to avoid overriding.

Add the following to your _root_ `BUILD` file:

```
load("@com_cognitedata_bazel_snapshots//snapshots:snapshots.bzl", "snapshots")
snapshots(name = "snapshots")
```

You should now be able to run the Snapshots tool via Bazel:

```sh
$ bazel run snapshots
usage: snapshots <command> [args...]
# ...
```

### Build Binaries From Source

Requires rules_go and gazelle.
See [example](/examples/build-from-source).

## Getting Started

Change trackers can be created by any rule by using the `create_change_tracker` Skylark function, or using the `change_tracker` rule to create arbitrary change trackers based on other rules.
Change trackers consist of a set of output targets to track, as well as a list called `run` of executables to run when the output targets change, and a list of `tags` which can be used to separate change trackers into categories.
In order to use Bazel Snapshots, we first have to define trackers for the things we are interested in detecting changes on.

```py
load("//build/rules/snapshots:snapshots.bzl", "change_tracker")
### Using The change_tracker Rule

Example: [change-tracker](/examples/change-tracker).

The `change_tracker` rule is a stand-alone rule defining a tracker.
You can use it to create trackers for existing targets.

```skylark
load("@com_cognitedata_bazel_snapshots//snapshots:snapshots.bzl", "snapshots", "change_tracker")
# A change_tracker
change_tracker(
name = "my-tracker",
name = "my-change-tracker",
deps = [
":my-target", # some output target (or source file)
# list of outputs and source files to track (required)
"my-file.txt",
],
run = [
# list of executable targets to run when the tracked files have
# changed (optional).
# bazel-snapshots will not run these automatically; this only provides
# hints to other tooling.
"//:notify-slack",
],
tracker_tags = [
# list of "tags" for the tracker, useful for other tooling.
"textfiles",
],
run = [":deploy-my-target"], # executable to run when my-target changes
tracker_tags = ["notify-slack"],
)
```

In the above example, `:deploy-my-target` can be set up to be executed whenever `my-tracker` has changed.
`my-tracker` will be considered changed whenever the SHA256 of `my-target` changes.
The tracker's tag `notify-slack` is metadata which can be used e.g. to perform other actions, such as sending Slack notifications.
The `run` and `tracker_tags` fields are optional.
### Integrating With Other Rules

Change Trackers can be built individually:
Example: [integrate-with-other-rules](/examples/integrate-with-other-rules/)

```sh
$ bazel build //path/to:my-tracker --output_groups=change_track_files
Target //path/to:my-tracker up-to-date:
bazel-bin/path/to/my-tracker.json
The `create_tracker_file()` Skylark function can be used to create a `OutputGroupInfo` which can be returned from any Bazel rule.
This technique can be used to create "transparent" support for Bazel Snapshots without using macros.
The tracker files can still be built separately using `bazel build //some:label --output_groups=change_track_files`.


### Remote Storage

So far, only Google Cloud Storage is supported for remote storage.
To start using a remote storage backend, add a `bucket` attribute to `snapshots` in your root BUILD file:

```skylark
snapshots(
name = "snapshots",
bucket = "name-of-cloud-storage-bucket",
)
```

The contents of a Change Tracker will look something like this:
Bazel Snapshots will create the following structure in the remote storage:

```json
{
"digest": "deadbeef", # sha256 of the tracked files
"run": ["//path/to:deploy-my-target"],
"tags": ["notify-slack"],
}
```
/
└── <workspace name>
├── snapshots
│ ├── b1d4a4f.json # snapshot files go here
│ ├── abcd123.json # (typically named by git commit)
│ └── ...
└── tags
└── deployed # a tag called "deployed"
```

### Collecting Snapshots
_Snapshot files_ are JSON files containing the digests for all trackers in the Bazel project.
_Tag files_ emulate git tags, and can be referred to by name.
A tag file only contains the name of some snapshot file.

The `snapshots` CLI has a special command `collect`, which is used to collect all the individual Change Trackers.
This is done effectively by building the whole workspace using the `change_track_files` output groups, while also recording the actions performed.
The command will then collect all the individual Change Trackers and create a JSON file called a _Snapshot_.
With remote storage, you can use these commands of the Snapshot tool:

The `snapshots` CLI is typically "installed" in a Bazel workspace so that it can be invoked with `bazel run snapshots -- <arguments>`.
* `get`: get a snapshot from remote storage
* `push`: push a snapshot to remote storage
* `tag`: tag a remote snapshot

A snapshot of the current state of the workspace can be collected:
Usage example:

```sh
$ bazel run snapshots -- collect > my-snapshot.json
$ SNAPSHOT_NAME="$(git rev-parse --short HEAD)"

# Create a snapshot
$ bazel run snapshots -- collect --out "$SNAPSHOT_NAME.json"
snapshots: wrote file to /some-path/bcb0283.json

# Push the snapshot
$ bazel run snapshots -- push --name="$SNAPSHOT_NAME" --snapshot-path="$SNAPSHOT_NAME.json"

# Tag the snapshot
$ bazel run snapshots -- tag --name "$SNAPSHOT_NAME" latest
snapshots: tagged snapshot bcb0283 as latest: infrastructure/tags/latest

# Get or diff against the snapshot by name
$ bazel run snapshots -- get latest
$ bazel run snaptool -- diff latest
```

The snapshots are the basis for comparing the state of the workspace at different points in time (that is, different git commits).
A snapshot can easily be pushed to a Storage Bucket, using snapshot's `push` command.
The snapshots tool also allows _tagging_ a Snapshot, so that it can be fetched by that tag later on.
### Using in Continous Deployment Jobs

A minimal setup would have a deployment process (CD) which collects a snapshot and compares it with some already-known snapshot in order to find out which targets need to be re-deployed.
Re-deploying is often done by `bazel run`-ing some target, but the CD process could also determine this by itself.

### Performing a diff
Assuming there already exists some _tag_ called `deployed`, referring to some _snapshot_ representing the last set of deployed targets, we can use the `diff` command to both collect a snapshot and diff against the tag:

In order to get the Change Trackers which have changed between two snapshots, the snapshots tool offers a command `diff`.
If given only one snapshot, the `diff` command will internally run `collect` and diff the given snapshot against the current state of the workspace.
The diff is effectively a list of Change Trackers, augmented with the _type_ of change they have seen: _added_, _removed_ or _changed_.
```sh
# Collect all trackers and diff against the snapshot tagged 'deployed'.
# Also output the collected snapshot to a file 'snapshot.json' and
# pretty-print a table of the detected changes to stderr.
$ bazel run snapshots -- diff --out snapshot.json --format=json --stderr-pretty deployed
```

The diff can be provided as a "pretty" human-readable table, as a plain list of labels or in JSON format.
Using the JSON format allows the greatest flexibility in how the result is interpreted.
The above command prints a JSON structure showing which targets have changed, along with their "run" labels and tags.
It's up to the CD process to interpret there results and run the necessary commands.

```groovy
def diffStr = sh('bazel run snapshots -- diff --format=json deployed', returnStdout: true)
def diff = readJSON text: diffStr
At the end of the CD process, we can push the snapshot we collected earlier and tag it as `deployed`, so that it will be used to diff against in the next CD process.

// Get the changes with 'notify-slack' tag
def notifiable = diff.findAll { change -> change.tags.any { tag -> tag == 'notify-slack' } }
```sh
# Push the snapshot to remote storage
$ bazel run snapshots -- push --snapshot-path=snapshot.json

# Tag it as 'deployed'
baszel run snapshots -- tag deployed
```

The `run` field in the change tracker does _not_ cause actions to automatically be executed – it's up to some outside system to actually invoke the commands.
## How It Works

## Installation
Bazel Snapshots tracks Bazel targets (build artifacts, outputs) by creating a _digest_ of the output files.
This digest, together with some metadata such as a _label_ and _tags_ represents a _tracker_.
The data for all trackers in the Bazel project is collected together in a file called a _snapshot_, typically named after a code revision (e.g. a git revision).
Two snapshots can be _diff_-ed to find out which trackers have changed between the two snapshots.

Bazel Snapshots consists of the following parts:

Snapshots can be installed either using pre-built binaries (recommended) or by building them from source.
For installation examples, see:
* A rule `change_tracker`: Used to create an arbitrary _change tracker_ for some Bazel target. This is a thin wrapper around the `create_tracker_file` function.
* A skylark function `create_tracker_file`: Used to integrate with other rules, so that they output change trackers in addition to their primary output.
* A tool `snapshots`: A CLI which is used to create, store, tag and create diffs between different snapshots.

* [examples/use-binaries](Use pre-built binaries)
* [examples/build-from-source](Build from source)
### Change Trackers

## Useful Commands
You can specifically build the change trackers and see their contents using Bazel's `--output_groups` option:

```sh
# Create a Snapshot
# --bazel_stderr routes Bazel's stderr to stderr, for debugging.
$ bazel run snapshots -- collect [--bazel_stderr]
$ bazel build //path/to:my-tracker --output_groups=change_track_files
Target //path/to:my-tracker up-to-date:
bazel-bin/path/to/my-tracker.json
```

# Get some Snapshot
# 'deployed' in the example is a specific tag.
$ bazel run snapshots -- get deployed
This can be useful for debugging purposes, i.e. if the digest isn't being changed as expected.
A tracker will typically look something like this:

# Diff two snapshots
# Assumes there exists a snapshot named after the master branch HEAD
$ bazel run snapshots -- diff deployed $(git rev-parse master)
```json
{
"digest": "deadbeef", // sha256 of the tracked files
"run": ["//path/to:deploy-my-target"],
"tags": ["notify-slack"],
}
```


10 changes: 5 additions & 5 deletions examples/change-tracker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ The rule will generate a JSON file containing and output like this:
```

The digest is a hash of the `deps` attribute and run is an array containing the targets
specified in the `run` atribute.
specified in the `run` atribute.

The tags come from the `tracker_tags` attribute of the rule and can contain an arbitrary list of tags. These tags can be used for querying with jq or some
other JSON tool after running the diff command `bazel run snapshots -- diff --format=json --stderr-pretty $(pwd)/old_tracker.json`.
The tags come from the `tracker_tags` attribute of the rule and can contain an arbitrary list of tags. These tags can be used for querying with jq or some
other JSON tool after running the diff command `bazel run snapshots -- diff --format=json --stderr-pretty $(pwd)/old_tracker.json`.
This command will output a list of changes in the format below.
```json
[
Expand All @@ -27,8 +27,8 @@ This command will output a list of changes in the format below.
]
```

When runnig `bazel run snapshots -- collect` the snapshots tool run a query against the output group with
name `change_track_files` which is the output group created by `change_tracker`. It will then aggregate all
When running `bazel run snapshots -- collect` the snapshots tool run a query against the output group with
name `change_track_files` which is the output group created by `change_tracker`. It will then aggregate all
the files into a single JSON file containing a list of digests and run targets. The ouput will look like the JSON object below.

```json
Expand Down

0 comments on commit 8138e85

Please sign in to comment.