Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] persistent worker integration into 2.x #3

Draft
wants to merge 69 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
edf2211
Implement local resources for workers (#1282)
luxe Oct 17, 2023
97e3b90
build: override grpc dependencies with our dependencies
jasonschroeder-sfdc Oct 16, 2023
96f239d
chore(deps): bump protobuf runtime to 3.19.1
jasonschroeder-sfdc Oct 16, 2023
af3f34e
chore(deps) add transitive dependencies
jasonschroeder-sfdc Oct 16, 2023
380f8a1
feat: add Proto reflection service to shard worker
jasonschroeder-sfdc Oct 16, 2023
7e7979d
fixup! build: override grpc dependencies with our dependencies
jasonschroeder-sfdc Oct 16, 2023
1f9d01f
fixup! chore(deps) add transitive dependencies
jasonschroeder-sfdc Oct 16, 2023
578589f
Bug: Fix Blocked thread in WriteStreamObserver Caused by CASFile Writ…
amishra-u Oct 24, 2023
dfa5937
Pin the Java toolchain to `remotejdk_17` (#1509)
stefanobaghino Oct 24, 2023
f6459d1
docs: add markdown language specifiers for code blocks
jasonschroeder-sfdc Oct 18, 2023
018e177
Support OutputPaths in OutputDirectory
werkt Oct 25, 2023
8b37013
Permit Absolute Symlink Targets with configuration
werkt Oct 26, 2023
df9ce1d
chore: update bazel to 6.4.0 (#1513)
jasonschroeder-sfdc Oct 29, 2023
bd740c9
Rename instance types (#1514)
luxe Oct 30, 2023
2a61f77
Create SymlinkNode outputs during upload (#1515)
werkt Oct 30, 2023
76c2657
feat: Implement CAS lease extension (#1455)
amishra-u Oct 31, 2023
cfa2e18
Bump org.json:json from 20230227 to 20231013 in /admin/main (#1516)
dependabot[bot] Oct 31, 2023
ff00c8f
Re-add missing graceful shutdown functionality (#1520)
80degreeswest Nov 1, 2023
afb0603
Technically correct to unwrap EE on lock failure
werkt Jul 14, 2023
9b5ec43
Bump rules_oss_audit and patch for py3.11
werkt Oct 17, 2023
f9ef75a
Prevent healthStatusManager NPE on start failure
werkt Oct 17, 2023
20512f6
Consistent check for publicName presence
werkt Nov 1, 2023
654032e
Read through external with query THROUGH=true
werkt Nov 1, 2023
b4359c5
Add --port option to worker
werkt Nov 1, 2023
5195809
Restore worker --root cmdline specification
werkt Nov 1, 2023
938c789
Make bf-executor small blob names consistent
werkt Nov 1, 2023
87face1
Configured output size operation failure
werkt Nov 2, 2023
58faec9
Restore abbrev port as -p
werkt Nov 2, 2023
cf6fc58
Update zstd-jni for latest version
jerrymarino Oct 31, 2023
6bc70e1
Attempt to resolve windows stamping
werkt Nov 3, 2023
b226725
Bug: Fix workerSet update logic for RemoteCasWriter
amishra-u Nov 2, 2023
751ac90
Detail storage requirements
werkt Nov 4, 2023
2bf3eae
Fix worker execution env title
werkt Nov 4, 2023
b6bddff
Add storage example descriptions
werkt Nov 4, 2023
c490925
Check for context cancelled before responding to error (#1526)
justinwon777 Nov 6, 2023
2a51c31
chore(deps): bump com.google.errorprone:error-prone
jasonschroeder-sfdc Oct 16, 2023
7ea1a9f
Worker name execution properties matching
werkt Nov 8, 2023
a4822c1
updates
luxe Apr 21, 2023
f1ea9b5
updates
luxe Apr 21, 2023
ccf763d
updates
luxe Apr 21, 2023
b7f5661
updates
luxe Apr 21, 2023
72b40ae
updates
luxe Apr 23, 2023
14c759b
Update ShardWorkerContext.java
luxe Nov 8, 2023
ca9bb92
Update ShardWorkerContext.java
luxe Nov 8, 2023
10e68c4
[execution] allow tmpfs and cgroups enforcement
werkt Nov 9, 2023
35883a4
Release resources when not keeping an operation (#1535)
werkt Nov 9, 2023
9d80f4e
Update queues.md
werkt Nov 9, 2023
aac33b6
Implement custom label header support for Grpc metrics interceptor (#…
rastenis Nov 10, 2023
f9882f7
Specify direct guava dependency usage (#1538)
werkt Nov 11, 2023
69e0248
Update lombok dependency for jdk21 (#1540)
werkt Nov 11, 2023
339aa13
Reorganize DequeueMatchEvaluator (#1537)
werkt Nov 11, 2023
025305a
Upgrade com_google_protobuf for jvm compatibility (#1539)
werkt Nov 11, 2023
f720909
Create buildfarm-worker-base-build-and-deploy.yml (#1534)
80degreeswest Nov 11, 2023
b7daba3
Add base image generation scripts (#1532)
80degreeswest Nov 11, 2023
dcff4f0
Fix buildfarm-worker-base-build-and-deploy.yml (#1541)
80degreeswest Nov 11, 2023
52318f8
Add public buildfarm image generation actions (#1542)
80degreeswest Nov 14, 2023
e343393
Update base image building action (#1544)
80degreeswest Nov 16, 2023
dae7f78
Add release image generation action (#1545)
80degreeswest Nov 16, 2023
b01889d
Limit workflow to canonical repository (#1547)
werkt Nov 16, 2023
dcee798
Check for "cores" exec property as min-cores match (#1548)
werkt Nov 16, 2023
221eae9
Consider output_* as relative to WD (#1550)
werkt Nov 19, 2023
021071b
Integrate persistent workers into shard Worker via PersistentExecutor
wiwa Oct 27, 2023
6bd2ab3
slight cleanup
wiwa Feb 1, 2023
4b748d4
Formatting
wiwa Feb 1, 2023
2ac7a3e
Fix static analysis
wiwa Feb 1, 2023
957c4b5
String#strip() -> String#trim()
wiwa Feb 1, 2023
9ff3134
Rename and add tests for util/InputIndexer and init PersistentExecuto…
wiwa Oct 17, 2023
9ca026e
Add test utils and tests for ProtoCoordinator
wiwa Oct 17, 2023
20cd93e
Address Feedback
wiwa Nov 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: add markdown language specifiers for code blocks
  • Loading branch information
jasonschroeder-sfdc authored and werkt committed Oct 24, 2023
commit f6459d199d1a40bdd8b31a75d7c19da7a1af69f4
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ All commandline options override corresponding config settings.

Run via

```
docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
```shell
$ docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:5.0.9
redis-cli config set stop-writes-on-bgsave-error no
```

### Bazel Buildfarm Server

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```
Expand All @@ -40,8 +40,8 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=

Run via

```
bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>
```shell
$ bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

Expand All @@ -53,9 +53,9 @@ Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm

To use the example configured buildfarm with bazel (version 1.0 or higher), you can configure your `.bazelrc` as follows:

```
```shell
$ cat .bazelrc
build --remote_executor=grpc://localhost:8980
$ build --remote_executor=grpc://localhost:8980
```

Then run your build as you would normally do.
Expand All @@ -67,20 +67,20 @@ Buildfarm uses [Java's Logging framework](https://docs.oracle.com/javase/10/core
You can use typical Java logging configuration to filter these results and observe the flow of executions through your running services.
An example `logging.properties` file has been provided at [examples/logging.properties](examples/logging.properties) for use as follows:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

and

```
bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
``` shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml
```

To attach a remote debugger, run the executable with the `--debug=<PORT>` flag. For example:

```
bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```shell
$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml
```


Expand Down
32 changes: 16 additions & 16 deletions _site/docs/configuration/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ has_children: true

Minimal required:

```
```yaml
backplane:
redisUri: "redis://localhost:6379"
queues:
Expand Down Expand Up @@ -38,7 +38,7 @@ For an example configuration containing all of the configuration values, see `ex

Example:

```
```yaml
digestFunction: SHA1
defaultActionTimeout: 1800
maximumActionTimeout: 1800
Expand Down Expand Up @@ -79,7 +79,7 @@ worker:

Example:

```
```yaml
server:
instanceType: SHARD
name: shard
Expand All @@ -96,7 +96,7 @@ server:

Example:

```
```yaml
server:
grpcMetrics:
enabled: false
Expand All @@ -114,7 +114,7 @@ server:

Example:

```
```yaml
server:
caches:
directoryCacheMaxEntries: 10000
Expand All @@ -132,7 +132,7 @@ server:

Example:

```
```yaml
server:
admin:
deploymentEnvironment: AWS
Expand All @@ -151,14 +151,14 @@ server:

Example:

```
```yaml
server:
metrics:
publisher: log
logLevel: INFO
```

```
```yaml
server:
metrics:
publisher: aws
Expand Down Expand Up @@ -207,7 +207,7 @@ server:

Example:

```
```yaml
backplane:
type: SHARD
redisUri: "redis://localhost:6379"
Expand All @@ -224,7 +224,7 @@ backplane:

Example:

```
```yaml
backplane:
type: SHARD
redisUri: "redis://localhost:6379"
Expand Down Expand Up @@ -262,7 +262,7 @@ backplane:
| realInputDirectories | List of Strings, _external_ | | A list of paths that will not be subject to the effects of linkInputDirectories setting, may also be used to provide writable directories as input roots for actions which expect to be able to write to an input location and will fail if they cannot |
| gracefulShutdownSeconds | Integer, 0 | | Time in seconds to allow for operations in flight to finish when shutdown signal is received |

```
```yaml
worker:
port: 8981
publicName: "localhost:8981"
Expand All @@ -279,7 +279,7 @@ worker:

Example:

```
```yaml
worker:
capabilities:
cas: true
Expand All @@ -296,7 +296,7 @@ worker:

Example:

```
```yaml
worker:
sandboxSettings:
alwaysUse: true
Expand All @@ -313,7 +313,7 @@ worker:

Example:

```
```yaml
worker:
dequeueMatchSettings:
acceptEverything: true
Expand All @@ -333,7 +333,7 @@ worker:

Example:

```
```yaml
worker:
storages:
- type: FILESYSTEM
Expand Down Expand Up @@ -361,7 +361,7 @@ worker:

Example:

```
```yaml
worker:
executionPolicies:
- name: test
Expand Down
26 changes: 17 additions & 9 deletions _site/docs/execution/execution_policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This policy type specifies that a worker should prepend a single path, and a num

This example will use the buildfarm-provided executable `as-nobody`, which will upon execution demote itself to a `nobody` effective process owner uid, and perform an `execvp(2)` with the remaining provided program arguments, which will subsequently execute as a user that no longer matches the worker process.

```
```yaml
# default wrapper policy application
worker:
executionPolicies:
Expand Down Expand Up @@ -50,32 +50,37 @@ These wrappers are used for detecting actions that rely on time. Below is a dem
This addresses two problems in regards to an action's dependence on time. The 1st problem is when an action takes longer than it should because it's sleeping unnecessarily. The 2nd problem is when an action relies on time which causes it to eventually be broken on master despite the code not changing. Both problems are expressed below as unit tests. We demonstrate a time-spoofing mechanism (the re-writing of syscalls) which allows us to detect these problems generically over any action. The objective is to analyze builds for performance inefficiency and discover future instabilities before they occur.

### Issue 1 (slow test)
```

```bash
#!/bin/bash
set -euo pipefail

echo -n "testing... "
sleep 10;
echo "done"
```

The test takes 10 seconds to run on average.
```
bazel test --runs_per_test=10 --config=remote //cloud/buildfarm:sleep_test

```shell
$ bazel test --runs_per_test=10 --config=remote //cloud/buildfarm:sleep_test
//cloud/buildfarm:sleep_test PASSED in 10.2s
Stats over 10 runs: max = 10.2s, min = 10.1s, avg = 10.2s, dev = 0.0s
```

We can check for performance improvements by using the `skip-sleep` option.
```
bazel test --runs_per_test=10 --config=remote --remote_default_exec_properties='skip-sleep=true' //cloud/buildfarm:sleep_test

```shell
$ bazel test --runs_per_test=10 --config=remote --remote_default_exec_properties='skip-sleep=true' //cloud/buildfarm:sleep_test
//cloud/buildfarm:sleep_test PASSED in 1.0s
Stats over 10 runs: max = 1.0s, min = 0.9s, avg = 1.0s, dev = 0.0s
```

Now the test is 10x faster. If skipping sleep makes an action perform significantly faster without affecting its success rate, that would warrant further investigation into the action's implementation.

### Issue 2 (future failing test)
```

```bash
#!/bin/bash
set -euo pipefail

Expand All @@ -89,12 +94,15 @@ echo "Times change."
date
exit -1;
```

The test passes today, but will it pass tomorrow? Will it pass a year from now? We can find out by using the `time-shift` option.
```
bazel test --test_output=streamed --remote_default_exec_properties='time-shift=31556952' --config=remote //cloud/buildfarm:future_fail

```shell
$ bazel test --test_output=streamed --remote_default_exec_properties='time-shift=31556952' --config=remote //cloud/buildfarm:future_fail
INFO: Found 1 test target...
Times change.
Mon Sep 25 18:31:09 UTC 2023
//cloud/buildfarm:future_fail FAILED in 18.0s
```

Time is shifted to the year 2023 and the test now fails. We can fix the problem before others see it.
25 changes: 15 additions & 10 deletions _site/docs/execution/execution_properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,37 +76,42 @@ Despite being given 1 core, they see all of the cpus and decide to spawn that ma

**Standard Example:**
This test will succeed when env var TESTVAR is foobar, and fail otherwise.
```

```shell
#!/bin/bash
[ "$TESTVAR" = "foobar" ]
```
```
./bazel test \

```shell
$ ./bazel test \
--remote_executor=grpc://127.0.0.1:8980 --noremote_accept_cached --nocache_test_results \
//env_test:main
FAIL
```

```
./bazel test --remote_default_exec_properties='env-vars={"TESTVAR": "foobar"}' \
```shell
$ ./bazel test --remote_default_exec_properties='env-vars={"TESTVAR": "foobar"}' \
--remote_executor=grpc://127.0.0.1:8980 --noremote_accept_cached --nocache_test_results \
//env_test:main
PASS
```

**Template Example:**
If you give a range of cores, buildfarm has the authority to decide how many your operation actually claims. You can let buildfarm resolve this value for you (via [mustache](https://mustache.github.io/)).
```
```bash
#!/bin/bash
[ "$MKL_NUM_THREADS" = "1" ]
```
```
./bazel test \

```shell
$ ./bazel test \
--remote_executor=grpc://127.0.0.1:8980 --noremote_accept_cached --nocache_test_results \
//env_test:main
FAIL
```
```
./bazel test \

```shell
$ ./bazel test \
--remote_default_exec_properties='env-vars="MKL_NUM_THREADS": "{{limits.cpu.claimed}}"' \
--remote_executor=grpc://127.0.0.1:8980 --noremote_accept_cached --nocache_test_results \
//env_test:main
Expand Down
25 changes: 15 additions & 10 deletions _site/docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ Let's start with a bazel workspace with a single file to compile into an executa
Create a new directory for our workspace and add the following files:

`main.cc`:
```

```c
#include <iostream>

int main( int argc, char *argv[] )
Expand All @@ -35,7 +36,8 @@ int main( int argc, char *argv[] )
```

`BUILD`:
```

```starlark
cc_binary(
name = "main",
srcs = ["main.cc"],
Expand Down Expand Up @@ -118,15 +120,18 @@ That `2 remote` indicates that your compile and link ran remotely. Congratulatio
## Container Quick Start

To bring up a minimal buildfarm cluster, you can run:

```shell
$ ./examples/bf-run start
```
./examples/bf-run start
```

This will start all of the necessary containers at the latest version.
Once the containers are up, you can build with `bazel run --remote_executor=grpc://localhost:8980 :main`.

To stop the containers, run:
```
./examples/bf-run stop

```shell
$ ./examples/bf-run stop
```

## Next Steps
Expand All @@ -137,8 +142,8 @@ We've started our worker on the same host as our server, and also the same host

You can now easily launch a new Buildfarm cluster locally or in AWS using an open sourced [Buildfarm Manager](https://github.com/80degreeswest/bfmgr).

```
wget https://github.com/80degreeswest/bfmgr/releases/download/1.0.7/bfmgr-1.0.7.jar
java -jar bfmgr-1.0.7.jar
Navigate to http://localhost
```shell
$ wget https://github.com/80degreeswest/bfmgr/releases/download/1.0.7/bfmgr-1.0.7.jar
$ java -jar bfmgr-1.0.7.jar
$ open http://localhost
```