Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] persistent worker integration into 2.x #3

Draft
wants to merge 69 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
edf2211
Implement local resources for workers (#1282)
luxe Oct 17, 2023
97e3b90
build: override grpc dependencies with our dependencies
jasonschroeder-sfdc Oct 16, 2023
96f239d
chore(deps): bump protobuf runtime to 3.19.1
jasonschroeder-sfdc Oct 16, 2023
af3f34e
chore(deps) add transitive dependencies
jasonschroeder-sfdc Oct 16, 2023
380f8a1
feat: add Proto reflection service to shard worker
jasonschroeder-sfdc Oct 16, 2023
7e7979d
fixup! build: override grpc dependencies with our dependencies
jasonschroeder-sfdc Oct 16, 2023
1f9d01f
fixup! chore(deps) add transitive dependencies
jasonschroeder-sfdc Oct 16, 2023
578589f
Bug: Fix Blocked thread in WriteStreamObserver Caused by CASFile Writ…
amishra-u Oct 24, 2023
dfa5937
Pin the Java toolchain to `remotejdk_17` (#1509)
stefanobaghino Oct 24, 2023
f6459d1
docs: add markdown language specifiers for code blocks
jasonschroeder-sfdc Oct 18, 2023
018e177
Support OutputPaths in OutputDirectory
werkt Oct 25, 2023
8b37013
Permit Absolute Symlink Targets with configuration
werkt Oct 26, 2023
df9ce1d
chore: update bazel to 6.4.0 (#1513)
jasonschroeder-sfdc Oct 29, 2023
bd740c9
Rename instance types (#1514)
luxe Oct 30, 2023
2a61f77
Create SymlinkNode outputs during upload (#1515)
werkt Oct 30, 2023
76c2657
feat: Implement CAS lease extension (#1455)
amishra-u Oct 31, 2023
cfa2e18
Bump org.json:json from 20230227 to 20231013 in /admin/main (#1516)
dependabot[bot] Oct 31, 2023
ff00c8f
Re-add missing graceful shutdown functionality (#1520)
80degreeswest Nov 1, 2023
afb0603
Technically correct to unwrap EE on lock failure
werkt Jul 14, 2023
9b5ec43
Bump rules_oss_audit and patch for py3.11
werkt Oct 17, 2023
f9ef75a
Prevent healthStatusManager NPE on start failure
werkt Oct 17, 2023
20512f6
Consistent check for publicName presence
werkt Nov 1, 2023
654032e
Read through external with query THROUGH=true
werkt Nov 1, 2023
b4359c5
Add --port option to worker
werkt Nov 1, 2023
5195809
Restore worker --root cmdline specification
werkt Nov 1, 2023
938c789
Make bf-executor small blob names consistent
werkt Nov 1, 2023
87face1
Configured output size operation failure
werkt Nov 2, 2023
58faec9
Restore abbrev port as -p
werkt Nov 2, 2023
cf6fc58
Update zstd-jni for latest version
jerrymarino Oct 31, 2023
6bc70e1
Attempt to resolve windows stamping
werkt Nov 3, 2023
b226725
Bug: Fix workerSet update logic for RemoteCasWriter
amishra-u Nov 2, 2023
751ac90
Detail storage requirements
werkt Nov 4, 2023
2bf3eae
Fix worker execution env title
werkt Nov 4, 2023
b6bddff
Add storage example descriptions
werkt Nov 4, 2023
c490925
Check for context cancelled before responding to error (#1526)
justinwon777 Nov 6, 2023
2a51c31
chore(deps): bump com.google.errorprone:error-prone
jasonschroeder-sfdc Oct 16, 2023
7ea1a9f
Worker name execution properties matching
werkt Nov 8, 2023
a4822c1
updates
luxe Apr 21, 2023
f1ea9b5
updates
luxe Apr 21, 2023
ccf763d
updates
luxe Apr 21, 2023
b7f5661
updates
luxe Apr 21, 2023
72b40ae
updates
luxe Apr 23, 2023
14c759b
Update ShardWorkerContext.java
luxe Nov 8, 2023
ca9bb92
Update ShardWorkerContext.java
luxe Nov 8, 2023
10e68c4
[execution] allow tmpfs and cgroups enforcement
werkt Nov 9, 2023
35883a4
Release resources when not keeping an operation (#1535)
werkt Nov 9, 2023
9d80f4e
Update queues.md
werkt Nov 9, 2023
aac33b6
Implement custom label header support for Grpc metrics interceptor (#…
rastenis Nov 10, 2023
f9882f7
Specify direct guava dependency usage (#1538)
werkt Nov 11, 2023
69e0248
Update lombok dependency for jdk21 (#1540)
werkt Nov 11, 2023
339aa13
Reorganize DequeueMatchEvaluator (#1537)
werkt Nov 11, 2023
025305a
Upgrade com_google_protobuf for jvm compatibility (#1539)
werkt Nov 11, 2023
f720909
Create buildfarm-worker-base-build-and-deploy.yml (#1534)
80degreeswest Nov 11, 2023
b7daba3
Add base image generation scripts (#1532)
80degreeswest Nov 11, 2023
dcff4f0
Fix buildfarm-worker-base-build-and-deploy.yml (#1541)
80degreeswest Nov 11, 2023
52318f8
Add public buildfarm image generation actions (#1542)
80degreeswest Nov 14, 2023
e343393
Update base image building action (#1544)
80degreeswest Nov 16, 2023
dae7f78
Add release image generation action (#1545)
80degreeswest Nov 16, 2023
b01889d
Limit workflow to canonical repository (#1547)
werkt Nov 16, 2023
dcee798
Check for "cores" exec property as min-cores match (#1548)
werkt Nov 16, 2023
221eae9
Consider output_* as relative to WD (#1550)
werkt Nov 19, 2023
021071b
Integrate persistent workers into shard Worker via PersistentExecutor
wiwa Oct 27, 2023
6bd2ab3
slight cleanup
wiwa Feb 1, 2023
4b748d4
Formatting
wiwa Feb 1, 2023
2ac7a3e
Fix static analysis
wiwa Feb 1, 2023
957c4b5
String#strip() -> String#trim()
wiwa Feb 1, 2023
9ff3134
Rename and add tests for util/InputIndexer and init PersistentExecuto…
wiwa Oct 17, 2023
9ca026e
Add test utils and tests for ProtoCoordinator
wiwa Oct 17, 2023
20cd93e
Address Feedback
wiwa Nov 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .bazelci/presubmit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ tasks:
name: "Unit Tests"
build_targets:
- "..."
build_flags:
- "--build_tag_filters=-container"
test_flags:
- "--test_tag_filters=-integration,-redis"
test_targets:
Expand All @@ -49,13 +51,18 @@ tasks:
name: "Unit Tests"
build_targets:
- "..."
build_flags:
- "--build_tag_filters=-container"
test_flags:
- "--test_tag_filters=-integration,-redis"
test_targets:
- "..."
macos:
name: "Unit Tests"
environment:
USE_BAZEL_VERSION: 17be878292730359c9c90efdceabed26126df7ae
build_flags:
- "--cxxopt=-std=c++14"
- "--build_tag_filters=-container"
build_targets:
- "..."
Expand All @@ -70,6 +77,7 @@ tasks:
build_targets:
- "..."
test_flags:
- "--@rules_jvm_external//settings:stamp_manifest=False"
- "--test_tag_filters=-integration,-redis"
test_targets:
- "..."
Expand Down
4 changes: 2 additions & 2 deletions .bazelci/run_server_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ bazel build //src/main/java/build/buildfarm:buildfarm-shard-worker
bazel build //src/main/java/build/buildfarm:buildfarm-server

# Start a single worker
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
echo "Started buildfarm-shard-worker..."

# Start a single server
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > worker.log 2>&1 &
bazel run //src/main/java/build/buildfarm:buildfarm-server $(pwd)/examples/config.minimal.yml > server.log 2>&1 &
echo "Started buildfarm-server..."

echo "Wait for startup to finish..."
Expand Down
6 changes: 6 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
build --java_language_version=17
build --java_runtime_version=remotejdk_17

build --tool_java_language_version=17
build --tool_java_runtime_version=remotejdk_17

common --enable_platform_specific_config

build:fuse --define=fuse=true
Expand Down
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6.1.2
6.4.0
31 changes: 31 additions & 0 deletions .github/workflows/buildfarm-images-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Build and Push Latest Buildfarm Images

on:
push:
branches:
- main

jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Buildfarm Images
runs-on: ubuntu-latest
steps:
- uses: bazelbuild/setup-bazelisk@v2

- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Server Image
id: buildAndPushServerImage
run: bazel run public_push_buildfarm-server --define release_version=latest

- name: Build Worker Image
id: buildAndPushWorkerImage
run: bazel run public_push_buildfarm-worker --define release_version=latest
30 changes: 30 additions & 0 deletions .github/workflows/buildfarm-release-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Build and Push Buildfarm Releases

on:
release:
types: [published]

jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Buildfarm Images
runs-on: ubuntu-latest
steps:
- uses: bazelbuild/setup-bazelisk@v2

- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Server Image
id: buildAndPushServerImage
run: bazel run public_push_buildfarm-server --define release_version=${{ github.event.release.tag_name }}

- name: Build Worker Image
id: buildAndPushWorkerImage
run: bazel run public_push_buildfarm-worker --define release_version=${{ github.event.release.tag_name }}
39 changes: 39 additions & 0 deletions .github/workflows/buildfarm-worker-base-build-and-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Build and Push Base Buildfarm Worker Images

on:
push:
branches:
- main
paths:
- ci/base-worker-image/jammy/Dockerfile
- ci/base-worker-image/mantic/Dockerfile
jobs:
build:
if: github.repository == 'bazelbuild/bazel-buildfarm'
name: Build Base Buildfarm Worker Image
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Login to Bazelbuild Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.BAZELBUILD_DOCKERHUB_USERNAME }}
password: ${{ secrets.BAZELBUILD_DOCKERHUB_TOKEN }}

- name: Build Jammy Docker image
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
with:
context: .
file: ./ci/base-worker-image/jammy/Dockerfile
push: true
tags: bazelbuild/buildfarm-worker-base:jammy

- name: Build Mantic Docker image
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
with:
context: .
file: ./ci/base-worker-image/mantic/Dockerfile
push: true
tags: bazelbuild/buildfarm-worker-base:mantic
28 changes: 25 additions & 3 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ load("@com_github_bazelbuild_buildtools//buildifier:def.bzl", "buildifier")
load("@io_bazel_rules_docker//java:image.bzl", "java_image")
load("@io_bazel_rules_docker//docker/package_managers:download_pkgs.bzl", "download_pkgs")
load("@io_bazel_rules_docker//docker/package_managers:install_pkgs.bzl", "install_pkgs")
load("@io_bazel_rules_docker//container:container.bzl", "container_image")
load("@io_bazel_rules_docker//container:container.bzl", "container_image", "container_push")
load("@rules_oss_audit//oss_audit:java/oss_audit.bzl", "oss_audit")
load("//:jvm_flags.bzl", "server_jvm_flags", "worker_jvm_flags")

Expand Down Expand Up @@ -148,14 +148,14 @@ oss_audit(
# Download cgroup-tools so that the worker is able to restrict actions via control groups.
download_pkgs(
name = "worker_pkgs",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
packages = ["cgroup-tools"],
tags = ["container"],
)

install_pkgs(
name = "worker_pkgs_image",
image_tar = "@ubuntu-jammy//image",
image_tar = "@ubuntu-mantic//image",
installables_tar = ":worker_pkgs.tar",
installation_cleanup_commands = "rm -rf /var/lib/apt/lists/*",
output_image_name = "worker_pkgs_image",
Expand Down Expand Up @@ -195,3 +195,25 @@ oss_audit(
src = "//src/main/java/build/buildfarm:buildfarm-shard-worker",
tags = ["audit"],
)

# Below targets push public docker images to bazelbuild dockerhub.

container_push(
name = "public_push_buildfarm-server",
format = "Docker",
image = ":buildfarm-server",
registry = "index.docker.io",
repository = "bazelbuild/buildfarm-server",
tag = "$(release_version)",
tags = ["container"],
)

container_push(
name = "public_push_buildfarm-worker",
format = "Docker",
image = ":buildfarm-shard-worker",
registry = "index.docker.io",
repository = "bazelbuild/buildfarm-worker",
tag = "$(release_version)",
tags = ["container"],
)
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ All commandline options override corresponding config settings.

Run via

```
docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
```shell
$ docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
redis-cli config set stop-writes-on-bgsave-error no
```

### Bazel Buildfarm Server

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```
Expand All @@ -40,8 +40,8 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

Expand All @@ -53,9 +53,9 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm

To use the example configured buildfarm with bazel (version 1.0 or higher), you can configure your `.bazelrc` as follows:

```
```shell
$ cat .bazelrc
build --remote_executor=grpc://localhost:8980
$ build --remote_executor=grpc://localhost:8980
```

Then run your build as you would normally do.
Expand All @@ -67,20 +67,20 @@ Buildfarm uses [Java's Logging framework](https://docs.oracle.com/javase/10/core
You can use typical Java logging configuration to filter these results and observe the flow of executions through your running services.
An example `logging.properties` file has been provided at [examples/logging.properties](examples/logging.properties) for use as follows:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

and

```
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
``` shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

To attach a remote debugger, run the executable with the `--debug=<PORT>` flag. For example:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```


Expand Down
29 changes: 16 additions & 13 deletions _site/docs/architecture/queues.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,32 +25,35 @@ If your configuration file does not specify any provisioned queues, buildfarm wi
This will ensure the expected behavior for the paradigm in which all work is put on the same queue.

### Matching Algorithm
The matching algorithm is performed by the operation queue when the caller is requesting to push or pop elements.
The matching algorithm is performed by the operation queue when the server or worker is requesting to push or pop elements, respectively.
The matching algorithm is designed to find the appropriate queue to perform these actions on.
On the scheduler side, the action's platform properties are used for matching.
On the worker side, the `dequeue_match_settings` are used.
![Operation Queue Matching]({{site.url}}{{site.baseurl}}/assets/images/Operation-Queue-Matching1.png)

This is how the matching algorithm works:
The matching algorithm works as follows:
Each provision queue is checked in the order that it is configured.
The first provision queue that is deemed eligible is chosen and used.
When deciding if an action is eligible for the provision queue, each platform property is checked individually.
By default, there must be a perfect match on each key/value.
Wildcards ("*") can be used to avoid the need of a perfect match.
Additionally, if the action contains any platform properties is not mentioned by the provision queue, it will be deemed ineligible.
setting `allow_unmatched: true` can be used to allow a superset of action properties as long as a subset matches the provision queue.
setting `allowUnmatched: true` can be used to allow a superset of action properties as long as a subset matches the provision queue.
If no provision queues can be matched, the operation queue will provide an analysis on why none of the queues were eligible.

When taking elements off of the operation queue, the matching algorithm behaves a similar way.
The worker's `DequeueMatchSettings` also have an `allow_unmatched` property.
Workers also have the ability to reject an operation after matching with a provision queue and dequeuing a value.
To avoid any of these rejections by the worker, you can use `accept_everything: true`.

When configuring your worker, consider the following decisions:
First, if the accept_everything setting is true, the job is accepted.
Otherwise, if any execution property for the queue has a wildcard key, the job is accepted.
Otherwise, if the allow_unmatched setting is true, each key present in the queue's properties must be a wildcard or exist in the execution request's properties with an equal value.
Otherwise, the execution request's properties must have exactly the same set of keys as the queue's execution properties, and the request's value for each property must equal the queue's if the queue's value for this property is not a wildcard.
A worker will dequeue operations from matching queues and determine whether to keep and execute it according to the following procedure:
For each property key-value in the operation's platform, an operation is REJECTED if:
The key is `min-cores` and the integer value is greater than the number of cores on the worker.
Or The key is `min-mem` and the integer value is greater than the number of bytes of RAM on the worker.
Or if the key exists in the `DequeueMatchSettings` platform with neither the value nor a `*` in the corresponding DMS platform key's values,
Or if the `allowUnmatched` setting is `false`.
For each resource requested in the operation's platform with the resource: prefix, the action is rejected if:
The resource amount cannot currently be satisfied with the associated resource capacity count

There are special predefined execution property names which resolve to dynamic configuration for the worker to match against:
`Worker`: The worker's `publicName`
`min-cores`: Less than or equal to the `executeStageWidth`
`process-wrapper`: The set of named `process-wrappers` present in configuration

### Server Example

Expand Down
4 changes: 2 additions & 2 deletions _site/docs/architecture/worker-execution-environment.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: default
title: Workers
title: Worker Execution Environment
parent: Architecture
nav_order: 3
---
Expand Down Expand Up @@ -124,4 +124,4 @@ java_image(

And now that this is in place, we can use the following to build the container and make it available to our local docker daemon:

`bazel run :buildfarm-shard-worker-ubuntu20-java14`
`bazel run :buildfarm-shard-worker-ubuntu20-java14`
Loading