Improve documentation (#59)

* Fix typo in change_tracker example * Improve installation and getting started docs in README * Write section about remote storage * Write section about using in CD * Remove old formulations * Update README.md Co-authored-by: Tobias Kongsvik <[email protected]> * Update README.md Co-authored-by: Tobias Kongsvik <[email protected]> * Update README.md Co-authored-by: Tobias Kongsvik <[email protected]> Co-authored-by: Tobias Kongsvik <[email protected]>
cognitedata · Mar 9, 2022 · 8138e85 · 8138e85
1 parent 86091ee
commit 8138e85
Show file tree

Hide file tree

Showing 2 changed files with 179 additions and 76 deletions.
diff --git a/README.md b/README.md
@@ -1,117 +1,220 @@
-# Snapshots
+# Bazel Snapshots
 
-Snapshots is a mechanism for doing _incremental deploys_ with Bazel. It can be used to find out which targets have changed between two versions of a Bazel workspace.
-For instance, a continuous deployment (CD) mechanism can make snapshots of what has been deployed, thus only deploy what's necessary.
+Bazel Snapshots is a tool for finding the changed targets between two versions in a Bazel project.
+It can be used to implement _incremental deployment_ – only re-deploying things that have changed – or to implement any other side effect of making a change which affects an output, such as sending notifications or interacting with pull requests.
 
-More generally, it can be used to implement side effects for changes to targets in Bazel workspaces.
+Bazel Snapshots works by creating digests of outputs and recording them to files, which can be compared later.
+By comparing two snapshots, we get a JSON structure containing the changed outputs, together with the metadata.
+Implementing specific side-effects, such as deploying, is left for other tools.
 
-## Overview
+Bazel Snapshots also has built-in support for storing snapshots and references to them remotely, so that they can be easily accessed and interacted with.
 
-Snapshots consists of the following parts:
+The way Bazel Snapshots works is in contrast to other approaches with similar goals, such as [https://github.com/Tinder/bazel-diff](bazel-diff), which analyses Bazel's graphs.
+In short, Bazel Snapshots discovers which outputs have actually changed, whereas Bazel graph analysis methods discover which outputs could be affected by some change.
+The main advantage with our approach is less over-reporting and more explicit control.
 
- * A rule `change_tracker`: Used to create an arbitrary _change tracker_ for some Bazel target.
- * A skylark function `create_tracker_file`: Used to integrate with existing rules, so that they output change trackers in addition to their existing output.
- * A tool `snapshots`: A CLI which is used to create, push, tag and create diffs between different snapshots.
+## Installation
 
-### Change Trackers
+### Use Pre-Built Binaries (recommended)
+
+Add Bazel Snapshots to your `WORKSPACE` file.
+See [Releases](https://github.com/cognitedata/bazel-snapshots/releases) for the specific snippet for the latest release.
+
+```skylark
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+http_archive(
+    name = "com_cognitedata_bazel_snapshots",
+    sha256 = "...",
+    url = "https://github.com/cognitedata/bazel-snapshots/releases/download/<VERSION>/snapshots-<VERSION>.tar",
+)
+
+load("@com_cognitedata_bazel_snapshots//:repo.bzl", "snapshots_repos")
+snapshots_repos()
+```
+
+_NOTE:_ If you're using [rules_docker](https://github.com/bazelbuild/rules_docker), put `snapshots_repos()` later in the `WORKSPACE` file to avoid overriding.
+
+Add the following to your _root_ `BUILD` file:
+
+```
+load("@com_cognitedata_bazel_snapshots//snapshots:snapshots.bzl", "snapshots")
+
+snapshots(name = "snapshots")
+```
+
+You should now be able to run the Snapshots tool via Bazel:
+
+```sh
+$ bazel run snapshots
+usage: snapshots <command> [args...]
+# ...
+```
+
+### Build Binaries From Source
+
+Requires rules_go and gazelle.
+See [example](/examples/build-from-source).
+
+## Getting Started
 
-Change trackers can be created by any rule by using the `create_change_tracker` Skylark function, or using the `change_tracker` rule to create arbitrary change trackers based on other rules.
-Change trackers consist of a set of output targets to track, as well as a list called `run` of executables to run when the output targets change, and a list of `tags` which can be used to separate change trackers into categories.
+In order to use Bazel Snapshots, we first have to define trackers for the things we are interested in detecting changes on.
 
-```py
-load("//build/rules/snapshots:snapshots.bzl", "change_tracker")
+### Using The change_tracker Rule
 
+Example: [change-tracker](/examples/change-tracker).
+
+The `change_tracker` rule is a stand-alone rule defining a tracker.
+You can use it to create trackers for existing targets.
+
+```skylark
+load("@com_cognitedata_bazel_snapshots//snapshots:snapshots.bzl", "snapshots", "change_tracker")
+
+# A change_tracker
 change_tracker(
-    name = "my-tracker",
+    name = "my-change-tracker",
     deps = [
-        ":my-target",  # some output target (or source file)
+        # list of outputs and source files to track (required)
+        "my-file.txt",
+    ],
+    run = [
+        # list of executable targets to run when the tracked files have
+        # changed (optional).
+        # bazel-snapshots will not run these automatically; this only provides
+        # hints to other tooling.
+        "//:notify-slack",
+    ],
+    tracker_tags = [
+        # list of "tags" for the tracker, useful for other tooling.
+        "textfiles",
     ],
-    run = [":deploy-my-target"],  # executable to run when my-target changes
-    tracker_tags = ["notify-slack"],
 )
+
 ```
 
-In the above example, `:deploy-my-target` can be set up to be executed whenever `my-tracker` has changed.
-`my-tracker` will be considered changed whenever the SHA256 of `my-target` changes.
-The tracker's tag `notify-slack` is metadata which can be used e.g. to perform other actions, such as sending Slack notifications.
-The `run` and `tracker_tags` fields are optional.
+### Integrating With Other Rules
 
-Change Trackers can be built individually:
+Example: [integrate-with-other-rules](/examples/integrate-with-other-rules/)
 
-```sh
-$ bazel build //path/to:my-tracker --output_groups=change_track_files
-Target //path/to:my-tracker up-to-date:
-  bazel-bin/path/to/my-tracker.json
+The `create_tracker_file()` Skylark function can be used to create a `OutputGroupInfo` which can be returned from any Bazel rule.
+This technique can be used to create "transparent" support for Bazel Snapshots without using macros.
+The tracker files can still be built separately using `bazel build //some:label --output_groups=change_track_files`.
+
+
+### Remote Storage
+
+So far, only Google Cloud Storage is supported for remote storage.
+To start using a remote storage backend, add a `bucket` attribute to `snapshots` in your root BUILD file:
+
+```skylark
+snapshots(
+    name = "snapshots",
+    bucket = "name-of-cloud-storage-bucket",
+)
 ```
 
-The contents of a Change Tracker will look something like this:
+Bazel Snapshots will create the following structure in the remote storage:
 
-```json
-{
-    "digest": "deadbeef",  # sha256 of the tracked files
-    "run": ["//path/to:deploy-my-target"],
-    "tags": ["notify-slack"],
-}
+```
+/
+└── <workspace name>
+    ├── snapshots
+    │   ├── b1d4a4f.json  # snapshot files go here
+    │   ├── abcd123.json  # (typically named by git commit)
+    │   └── ...
+    └── tags
+        └── deployed      # a tag called "deployed"
 ```
 
-### Collecting Snapshots
+_Snapshot files_ are JSON files containing the digests for all trackers in the Bazel project.
+_Tag files_ emulate git tags, and can be referred to by name.
+A tag file only contains the name of some snapshot file.
 
-The `snapshots` CLI has a special command `collect`, which is used to collect all the individual Change Trackers.
-This is done effectively by building the whole workspace using the `change_track_files` output groups, while also recording the actions performed.
-The command will then collect all the individual Change Trackers and create a JSON file called a _Snapshot_.
+With remote storage, you can use these commands of the Snapshot tool:
 
-The `snapshots` CLI is typically "installed" in a Bazel workspace so that it can be invoked with `bazel run snapshots -- <arguments>`.
+ * `get`: get a snapshot from remote storage
+ * `push`: push a snapshot to remote storage
+ * `tag`: tag a remote snapshot
 
-A snapshot of the current state of the workspace can be collected:
+Usage example:
 
 ```sh
-$ bazel run snapshots -- collect > my-snapshot.json
+$ SNAPSHOT_NAME="$(git rev-parse --short HEAD)"
+
+# Create a snapshot
+$ bazel run snapshots -- collect --out "$SNAPSHOT_NAME.json"
+snapshots: wrote file to /some-path/bcb0283.json
+
+# Push the snapshot
+$ bazel run snapshots -- push --name="$SNAPSHOT_NAME" --snapshot-path="$SNAPSHOT_NAME.json"
+
+# Tag the snapshot
+$ bazel run snapshots -- tag --name "$SNAPSHOT_NAME" latest
+snapshots: tagged snapshot bcb0283 as latest: infrastructure/tags/latest
+
+# Get or diff against the snapshot by name
+$ bazel run snapshots -- get latest
+$ bazel run snaptool -- diff latest
 ```
 
-The snapshots are the basis for comparing the state of the workspace at different points in time (that is, different git commits).
-A snapshot can easily be pushed to a Storage Bucket, using snapshot's `push` command.
-The snapshots tool also allows _tagging_ a Snapshot, so that it can be fetched by that tag later on.
+### Using in Continous Deployment Jobs
+
+A minimal setup would have a deployment process (CD) which collects a snapshot and compares it with some already-known snapshot in order to find out which targets need to be re-deployed.
+Re-deploying is often done by `bazel run`-ing some target, but the CD process could also determine this by itself.
 
-### Performing a diff
+Assuming there already exists some _tag_ called `deployed`, referring to some _snapshot_ representing the last set of deployed targets, we can use the `diff` command to both collect a snapshot and diff against the tag:
 
-In order to get the Change Trackers which have changed between two snapshots, the snapshots tool offers a command `diff`.
-If given only one snapshot, the `diff` command will internally run `collect` and diff the given snapshot against the current state of the workspace.
-The diff is effectively a list of Change Trackers, augmented with the _type_ of change they have seen: _added_, _removed_ or _changed_.
+```sh
+# Collect all trackers and diff against the snapshot tagged 'deployed'.
+# Also output the collected snapshot to a file 'snapshot.json' and
+# pretty-print a table of the detected changes to stderr.
+$ bazel run snapshots -- diff --out snapshot.json --format=json --stderr-pretty deployed
+```
 
-The diff can be provided as a "pretty" human-readable table, as a plain list of labels or in JSON format.
-Using the JSON format allows the greatest flexibility in how the result is interpreted.
+The above command prints a JSON structure showing which targets have changed, along with their "run" labels and tags.
+It's up to the CD process to interpret there results and run the necessary commands.
 
-```groovy
-def diffStr = sh('bazel run snapshots -- diff --format=json deployed', returnStdout: true)
-def diff = readJSON text: diffStr
+At the end of the CD process, we can push the snapshot we collected earlier and tag it as `deployed`, so that it will be used to diff against in the next CD process.
 
-// Get the changes with 'notify-slack' tag
-def notifiable = diff.findAll { change -> change.tags.any { tag -> tag == 'notify-slack' } }
+```sh
+# Push the snapshot to remote storage
+$ bazel run snapshots -- push --snapshot-path=snapshot.json
+
+# Tag it as 'deployed'
+baszel run snapshots -- tag deployed
 ```
 
-The `run` field in the change tracker does _not_ cause actions to automatically be executed – it's up to some outside system to actually invoke the commands.
+## How It Works
 
-## Installation
+Bazel Snapshots tracks Bazel targets (build artifacts, outputs) by creating a _digest_ of the output files.
+This digest, together with some metadata such as a _label_ and _tags_ represents a _tracker_.
+The data for all trackers in the Bazel project is collected together in a file called a _snapshot_, typically named after a code revision (e.g. a git revision).
+Two snapshots can be _diff_-ed to find out which trackers have changed between the two snapshots.
+
+Bazel Snapshots consists of the following parts:
 
-Snapshots can be installed either using pre-built binaries (recommended) or by building them from source.
-For installation examples, see:
+ * A rule `change_tracker`: Used to create an arbitrary _change tracker_ for some Bazel target. This is a thin wrapper around the `create_tracker_file` function.
+ * A skylark function `create_tracker_file`: Used to integrate with other rules, so that they output change trackers in addition to their primary output.
+ * A tool `snapshots`: A CLI which is used to create, store, tag and create diffs between different snapshots.
 
- * [examples/use-binaries](Use pre-built binaries)
- * [examples/build-from-source](Build from source)
+### Change Trackers
 
-## Useful Commands
+You can specifically build the change trackers and see their contents using Bazel's `--output_groups` option:
 
 ```sh
-# Create a Snapshot
-# --bazel_stderr routes Bazel's stderr to stderr, for debugging.
-$ bazel run snapshots -- collect [--bazel_stderr]
+$ bazel build //path/to:my-tracker --output_groups=change_track_files
+Target //path/to:my-tracker up-to-date:
+  bazel-bin/path/to/my-tracker.json
+```
 
-# Get some Snapshot
-# 'deployed' in the example is a specific tag.
-$ bazel run snapshots -- get deployed
+This can be useful for debugging purposes, i.e. if the digest isn't being changed as expected.
+A tracker will typically look something like this:
 
-# Diff two snapshots
-# Assumes there exists a snapshot named after the master branch HEAD
-$ bazel run snapshots -- diff deployed $(git rev-parse master)
+```json
+{
+    "digest": "deadbeef",  // sha256 of the tracked files
+    "run": ["//path/to:deploy-my-target"],
+    "tags": ["notify-slack"],
+}
 ```
 
+
diff --git a/examples/change-tracker/README.md b/examples/change-tracker/README.md
@@ -6,10 +6,10 @@ The rule will generate a JSON file containing and output like this:
 ```
 
 The digest is a hash of the `deps` attribute and run is an array containing the targets
-specified in the `run` atribute. 
+specified in the `run` atribute.
 
-The tags come from the `tracker_tags` attribute of the rule and can contain an arbitrary list of tags. These tags can be used for querying with jq or some 
-other JSON tool after running the diff command `bazel run snapshots -- diff --format=json --stderr-pretty $(pwd)/old_tracker.json`. 
+The tags come from the `tracker_tags` attribute of the rule and can contain an arbitrary list of tags. These tags can be used for querying with jq or some
+other JSON tool after running the diff command `bazel run snapshots -- diff --format=json --stderr-pretty $(pwd)/old_tracker.json`.
 This command will output a list of changes in the format below.
 ```json
 [
@@ -27,8 +27,8 @@ This command will output a list of changes in the format below.
 ]
 ```
 
-When runnig `bazel run snapshots -- collect` the snapshots tool run a query against the output group with 
-name `change_track_files` which is the output group created by `change_tracker`. It will then aggregate all 
+When running `bazel run snapshots -- collect` the snapshots tool run a query against the output group with
+name `change_track_files` which is the output group created by `change_tracker`. It will then aggregate all
 the files into a single JSON file containing a list of digests and run targets. The ouput will look like the JSON object below.
 
 ```json