Merge branch 'master' into filter_parser_loop2

fluent · Dec 15, 2023 · 5fd8a1e · 5fd8a1e
2 parents bc5282d + 2612a1a
commit 5fd8a1e
Show file tree

Hide file tree

Showing 2,261 changed files with 91,966 additions and 22,122 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -7,16 +7,18 @@ Enter `[N/A]` in the box, if an item is not applicable to your change.
 
 **Testing**
 Before we can approve your change; please submit the following in a comment:
+
 - [ ] Example configuration file for the change
 - [ ] Debug log output from testing the change
-<!--  
-Please refer to the Developer Guide for instructions on building Fluent Bit with Valgrind support: 
+<!--
+Please refer to the Developer Guide for instructions on building Fluent Bit with Valgrind support:
 https://github.com/fluent/fluent-bit/blob/master/DEVELOPER_GUIDE.md#valgrind
 Invoke Fluent Bit and Valgrind as: $ valgrind --leak-check=full ./bin/fluent-bit <args>
 -->
 - [ ] Attached [Valgrind](https://valgrind.org/docs/manual/quick-start.html) output that shows no leaks or memory corruption was found
 
 If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
+
 - [ ] Run [local packaging test](./packaging/local-build-all.sh) showing all targets (including any new ones) build.
 - [ ] Set `ok-package-test` label to test for all targets (requires maintainer to do).
 

diff --git a/.github/actionlint.yml b/.github/actionlint.yml
@@ -0,0 +1,3 @@
+self-hosted-runner:
+  labels:
+    - actuated-arm64-8cpu-16gb
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -21,7 +21,7 @@
 | Label name | Description |
 | :----------|-------------|
 | docs-required| default tag used to request documentation, has to be removed before merge |
-| ok-package-test | run all package tests |
+| ok-package-test | Build for all possible targets |
 | ok-to-test | run all integration tests |
 | ok-to-merge | run mergebot and merge (rebase) current PR |
 | ci/integration-docker-ok | integration test is able to build docker image |
@@ -66,14 +66,162 @@ For some reason this is not automatically done via permission inheritance or sim
 
 Each major version (e.g. 1.8 & 1.9) supports different targets to build for, e.g. 1.9 includes a CentOS 8 target and 1.8 has some other legacy targets.
 
-This is all handled by the [build matrix generation composite action](../actions/generate-package-build-matrix/action.yaml) so make sure to update appropriately.
-The build matrix is then fed into the reusable job that builds packages which will then fire for the appropriate targets.
+This is all handled by the [build matrix generation composite action](../actions/generate-package-build-matrix/action.yaml).
+This uses a [JSON file](../../packaging/build-config.json) to specify the targets so ensure this is updated.
+The build matrix is then fed into the [reusable job](./call-build-linux-packages.yaml) that builds packages which will then fire for the appropriate targets.
+The reusable job is used for all package builds including unstable/nightly and the PR `ok-package-test` triggered ones.
 
 ## Releases
 
-Currently the process is as follows:
+The process at a high level is as follows:
 
-1. Tag the source with whatever tag you like on master.
-2. The [`Deploy to staging`](./staging-build.yaml) workflow will then kick in to build everything and upload it either to the S3 staging bucket (packages) or ghcr.io (containers).
-3. Once this completes, the [`Test staging`](./staging-test.yaml) workflow will then run to carry out smoke tests on these packages and containers.
-4. The [`Release from staging`](./staging-release.yaml) workflow can then be manually initiated to promote staging to release.
+1. Tag created with `v` prefix.
+2. [Deploy to staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-build.yaml) workflow runs.
+3. [Test staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-test.yaml) workflow runs.
+4. Manually initiate [release from staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-release.yaml) workflow.
+5. A PR is auto-created to increment the minor version now for Fluent Bit using the [`update_version.sh`](../../update_version.sh) script.
+6. Create PRs for doc updates - Windows & container versions. (WIP to automate).
+
+Breaking the steps down.
+
+### Deploy to staging and test
+
+This should run automatically when a tag is created matching the `v*` regex.
+It currently copes with 1.8+ builds although automation is only exercised for 1.9+ releases.
+
+Once this is completed successfully the staging tests should also run automatically.
+
+![Workflows for staging and test example](./resources/auto-build-test-workflow.png "Example of workflows for build and test")
+
+If both complete successfully then we are good to go.
+
+Occasional failures are seen with package builds not downloading dependencies (CentOS 7 in particular seems bad for this).
+A re-run of failed jobs should resolve this.
+
+The workflow builds all Linux, macOS and Windows targets to a staging S3 bucket plus the container images to ghcr.io.
+
+### Release from staging workflow
+
+This is a manually initiated workflow, the intention is multiple staging builds can happen but we only release one.
+Note that currently we do not support parallel staging builds of different versions, e.g. master and 1.9 branches.
+**We can only release the previous staging build and there is a check to confirm version.**
+
+Ensure AppVeyor build for the tag has completed successfully as well.
+
+To trigger: <https://github.com/fluent/fluent-bit/actions/workflows/staging-release.yaml>
+
+All this job does is copy the various artefacts from staging locations to release ones, it does not rebuild them.
+
+![Workflow for release example](./resources/release-from-staging-workflow-incorrect-version.png "Example of workflow for release")
+
+With this example you can see we used the wrong `version` as it requires it without the `v` prefix (it is used for container tag, etc.) and so it fails.
+
+![Workflow for release failure example](./resources/release-version-failure.png "Example of failing workflow for release")
+
+Make sure to provide without the `v` prefix.
+
+![Workflow for release example](./resources/release-from-staging-workflow.png "Example of successful workflow for release")
+
+Once this workflow is initiated you then also need to have it approved by the designated "release team" otherwise it will not progress.
+
+![Release approval example](./resources/release-approval.png "Release approval example")
+
+They will be notified for approval by Github.
+Unfortunately it has to be approved for each job in the sequence rather than a global approval for the whole workflow although that can be useful to check between jobs.
+
+![Release approval per-job required](./resources/release-approval-per-job.png "Release approval per-job required")
+
+This is quite useful to delay the final smoke test of packages until after the manual steps are done as it will then verify them all for you.
+
+#### Packages server sync
+
+The workflow above ensures all release artefacts are pushed to the appropriate container registry and S3 bucket for official releases.
+The packages server then periodically syncs from this bucket to pull down and serve the new packages so there may be a delay (up to 1 hour) before it serves the new versions.
+The syncs happen hourly.
+See <https://github.com/fluent/fluent-bit-infra/blob/main/terraform/provision/package-server-provision.sh.tftpl> for details of the dedicated packages server.
+
+The main reason for a separate server is to accurately track download statistics.
+Container images are handled by ghcr.io and Docker Hub, not this server.
+
+#### Transient container publishing failures
+
+The parallel publishing of multiple container tags for the same image seems to fail occasionally with network errors, particularly more for ghcr.io than DockerHub.
+This can be resolved by just re-running the failed jobs.
+
+#### Windows builds from AppVeyor
+
+This is automated, however confirm that the actual build is successful for the tag: <https://ci.appveyor.com/project/fluent/fluent-bit-2e87g/history>
+If not then ask a maintainer to retrigger.
+
+It can take a while to find the one for the specific tag...
+
+#### ARM builds
+
+All builds are carried out in containers and intended to be run on a valid Ubuntu host to match a standard Github Actions runner.
+This can take some time for ARM as we have to emulate the architecture via QEMU.
+
+<https://github.com/fluent/fluent-bit/pull/7527> introduces support to run ARM builds on a dedicated [actuated.dev](https://docs.actuated.dev/) ephemeral VM runner.
+A self-hosted ARM runner is sponsored by [Equinix Metal](https://deploy.equinix.com/metal/) and provisioned for this per the [documentation](https://docs.actuated.dev/provision-server/).
+For fork workflows, this should all be skipped and run on a normal Ubuntu Github hosted runner but be aware this may take some time.
+
+### Manual release
+
+As long as it is built to staging we can manually publish packages as well via the script here: <https://github.com/fluent/fluent-bit/blob/master/packaging/update-repos.sh>
+
+Containers can be promoted manually too, ensure to include all architectures and signatures.
+
+### Create PRs
+
+Once releases are published we need to provide PRs for the following documentation updates:
+
+1. Windows checksums: <https://docs.fluentbit.io/manual/installation/windows#installation-packages>
+2. Container versions: <https://docs.fluentbit.io/manual/installation/docker#tags-and-versions>
+
+<https://github.com/fluent/fluent-bit-docs> is the repo for updates to docs.
+
+Take the checksums from the release process above, the AppVeyor stage provides them all and we attempt to auto-create the PR with it.
+
+## Unstable/nightly builds
+
+These happen every 24 hours and [reuse the same workflow](./cron-unstable-build.yaml) as the staging build so are identical except they skip the upload to S3 step.
+This means all targets are built nightly for `master` and `2.1` branches including container images and Linux, macOS and Windows packages.
+
+The container images are available here (the tag refers to the branch):
+
+* [ghcr.io/fluent/fluent-bit/unstable:2.1](ghcr.io/fluent/fluent-bit/unstable:2.1)
+* [ghcr.io/fluent/fluent-bit/unstable:master](ghcr.io/fluent/fluent-bit/unstable:master)
+* [ghcr.io/fluent/fluent-bit/unstable:windows-2019-2.1](ghcr.io/fluent/fluent-bit/unstable:windows-2019-2.1)
+* [ghcr.io/fluent/fluent-bit/unstable:windows-2019-master](ghcr.io/fluent/fluent-bit/unstable:windows-2019-master)
+
+The Linux, macOS and Windows packages are available to download from the specific workflow run.
+
+## Integration tests
+
+On every commit to `master` we rebuild the [packages](./build-master-packages.yaml) and [container images](./master-integration-test.yaml).
+The container images are then used to [run the integration tests](./master-integration-test.yaml) from the <https://github.com/fluent/fluent-bit-ci> repository.
+The container images are available as:
+
+* [ghcr.io/fluent/fluent-bit/master:x86_64](ghcr.io/fluent/fluent-bit/master:x86_64)
+
+## PR checks
+
+Various workflows are run for PRs automatically:
+
+* [Unit tests](./unit-tests.yaml)
+* [Compile checks on CentOS 7 compilers](./pr-compile-check.yaml)
+* [Linting](./pr-lint.yaml)
+* [Windows builds](./pr-windows-build.yaml)
+* [Fuzzing](./pr-fuzz.yaml)
+* [Container image builds](./pr-image-tests.yaml)
+* [Install script checks](./pr-install-script.yaml)
+
+We try to guard these to only trigger when relevant files are changed to reduce any delays or resources used.
+**All should be able to be triggered manually for explicit branches as well.**
+
+The following workflows can be triggered manually for specific PRs too:
+
+* [Integration tests](./pr-integration-test.yaml): Build a container image and run the integration tests as per commits to `master`.
+* [Performance tests](./pr-perf-test.yaml): WIP to trigger a performance test on a dedicated VM and collect the results as a PR comment.
+* [Full package build](./pr-package-tests.yaml): builds all Linux, macOs and Windows packages as well as container images.
+
+To trigger these, apply the relevant label.
diff --git a/.github/workflows/build-legacy-branch.yaml b/.github/workflows/build-legacy-branch.yaml
@@ -17,7 +17,7 @@ jobs:
       contents: read
     steps:
       - name: Checkout code
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
         with:
           ref: ${{ inputs.ref }}
 
@@ -53,39 +53,40 @@ jobs:
       packages: write
     steps:
       - name: Checkout the docker build repo for legacy builds
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
         with:
           repository: fluent/fluent-bit-docker-image
           ref: "1.8" # Fixed to this branch
 
       - name: Set up QEMU
-        uses: docker/setup-qemu-action@v2
+        uses: docker/setup-qemu-action@v3
 
       - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2
+        uses: docker/setup-buildx-action@v3
 
       - name: Log in to the Container registry
-        uses: docker/login-action@v2
+        uses: docker/login-action@v3
         with:
           registry: ghcr.io
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
 
       - id: debug-meta
-        uses: docker/metadata-action@v4
+        uses: docker/metadata-action@v5
         with:
           images: ${{ env.IMAGE_NAME }}
           tags: |
             raw,${{ inputs.ref }}-debug
 
       - name: Build the legacy x86_64 debug image
         if: matrix.arch == 'amd64'
-        uses: docker/build-push-action@v4
+        uses: docker/build-push-action@v5
         with:
           file: ./Dockerfile.x86_64.debug
           context: .
           tags: ${{ steps.debug-meta.outputs.tags }}
           labels: ${{ steps.debug-meta.outputs.labels }}
+          provenance: false
           platforms: linux/amd64
           push: true
           load: false
@@ -94,20 +95,21 @@ jobs:
 
       - name: Extract metadata from Github
         id: meta
-        uses: docker/metadata-action@v4
+        uses: docker/metadata-action@v5
         with:
           images: ${{ env.IMAGE_NAME }}
           tags: |
             raw,${{ matrix.suffix }}-${{ inputs.ref }}
 
       - name: Build the legacy ${{ matrix.arch }} image
-        uses: docker/build-push-action@v4
+        uses: docker/build-push-action@v5
         with:
           file: ./Dockerfile.${{ matrix.suffix }}
           context: .
           tags: ${{ steps.meta.outputs.tags }}
           labels: ${{ steps.meta.outputs.labels }}
           platforms: linux/${{ matrix.arch }}
+          provenance: false
           push: true
           load: false
           build-args: |
@@ -125,10 +127,10 @@ jobs:
       - build-legacy-images-matrix
     steps:
       - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2
+        uses: docker/setup-buildx-action@v3
 
       - name: Log in to the Container registry
-        uses: docker/login-action@v2
+        uses: docker/login-action@v3
         with:
           registry: ghcr.io
           username: ${{ github.actor }}