Skip to content

Commit

Permalink
Merge branch 'fluent:master' into ppc64le-ci
Browse files Browse the repository at this point in the history
  • Loading branch information
sumitd2 authored Jul 10, 2023
2 parents fd58dde + ac3cb7c commit 441e402
Show file tree
Hide file tree
Showing 2,550 changed files with 109,687 additions and 27,435 deletions.
6 changes: 5 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ Enter `[N/A]` in the box, if an item is not applicable to your change.
Before we can approve your change; please submit the following in a comment:
- [ ] Example configuration file for the change
- [ ] Debug log output from testing the change
<!-- Invoke Fluent Bit and Valgrind as: $ valgrind ./bin/fluent-bit <args> -->
<!--
Please refer to the Developer Guide for instructions on building Fluent Bit with Valgrind support:
https://github.com/fluent/fluent-bit/blob/master/DEVELOPER_GUIDE.md#valgrind
Invoke Fluent Bit and Valgrind as: $ valgrind --leak-check=full ./bin/fluent-bit <args>
-->
- [ ] Attached [Valgrind](https://valgrind.org/docs/manual/quick-start.html) output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
Expand Down
4 changes: 4 additions & 0 deletions .github/actionlint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
self-hosted-runner:
labels:
- actuated
- actuated-aarch64
164 changes: 156 additions & 8 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
| Label name | Description |
| :----------|-------------|
| docs-required| default tag used to request documentation, has to be removed before merge |
| ok-package-test | run all package tests |
| ok-package-test | Build for all possible targets |
| ok-to-test | run all integration tests |
| ok-to-merge | run mergebot and merge (rebase) current PR |
| ci/integration-docker-ok | integration test is able to build docker image |
Expand Down Expand Up @@ -66,14 +66,162 @@ For some reason this is not automatically done via permission inheritance or sim

Each major version (e.g. 1.8 & 1.9) supports different targets to build for, e.g. 1.9 includes a CentOS 8 target and 1.8 has some other legacy targets.

This is all handled by the [build matrix generation composite action](../actions/generate-package-build-matrix/action.yaml) so make sure to update appropriately.
The build matrix is then fed into the reusable job that builds packages which will then fire for the appropriate targets.
This is all handled by the [build matrix generation composite action](../actions/generate-package-build-matrix/action.yaml).
This uses a [JSON file](../../packaging/build-config.json) to specify the targets so ensure this is updated.
The build matrix is then fed into the [reusable job](./call-build-linux-packages.yaml) that builds packages which will then fire for the appropriate targets.
The reusable job is used for all package builds including unstable/nightly and the PR `ok-package-test` triggered ones.

## Releases

Currently the process is as follows:
The process at a high level is as follows:

1. Tag the source with whatever tag you like on master.
2. The [`Deploy to staging`](./staging-build.yaml) workflow will then kick in to build everything and upload it either to the S3 staging bucket (packages) or ghcr.io (containers).
3. Once this completes, the [`Test staging`](./staging-test.yaml) workflow will then run to carry out smoke tests on these packages and containers.
4. The [`Release from staging`](./staging-release.yaml) workflow can then be manually initiated to promote staging to release.
1. Tag created with `v` prefix.
2. [Deploy to staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-build.yaml) workflow runs.
3. [Test staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-test.yaml) workflow runs.
4. Manually initiate [release from staging](https://github.com/fluent/fluent-bit/actions/workflows/staging-release.yaml) workflow.
5. A PR is auto-created to increment the minor version now for Fluent Bit using the [`update_version.sh`](../../update_version.sh) script.
6. Create PRs for doc updates - Windows & container versions. (WIP to automate).

Breaking the steps down.

### Deploy to staging and test

This should run automatically when a tag is created matching the `v*` regex.
It currently copes with 1.8+ builds although automation is only exercised for 1.9+ releases.

Once this is completed successfully the staging tests should also run automatically.

![Workflows for staging and test example](./resources/auto-build-test-workflow.png "Example of workflows for build and test")

If both complete successfully then we are good to go.

Occasional failures are seen with package builds not downloading dependencies (CentOS 7 in particular seems bad for this).
A re-run of failed jobs should resolve this.

The workflow builds all Linux, macOS and Windows targets to a staging S3 bucket plus the container images to ghcr.io.

### Release from staging workflow

This is a manually initiated workflow, the intention is multiple staging builds can happen but we only release one.
Note that currently we do not support parallel staging builds of different versions, e.g. master and 1.9 branches.
**We can only release the previous staging build and there is a check to confirm version.**

Ensure AppVeyor build for the tag has completed successfully as well.

To trigger: <https://github.com/fluent/fluent-bit/actions/workflows/staging-release.yaml>

All this job does is copy the various artefacts from staging locations to release ones, it does not rebuild them.

![Workflow for release example](./resources/release-from-staging-workflow-incorrect-version.png "Example of workflow for release")

With this example you can see we used the wrong `version` as it requires it without the `v` prefix (it is used for container tag, etc.) and so it fails.

![Workflow for release failure example](./resources/release-version-failure.png "Example of failing workflow for release")

Make sure to provide without the `v` prefix.

![Workflow for release example](./resources/release-from-staging-workflow.png "Example of successful workflow for release")

Once this workflow is initiated you then also need to have it approved by the designated "release team" otherwise it will not progress.

![Release approval example](./resources/release-approval.png "Release approval example")

They will be notified for approval by Github.
Unfortunately it has to be approved for each job in the sequence rather than a global approval for the whole workflow although that can be useful to check between jobs.

![Release approval per-job required](./resources/release-approval-per-job.png "Release approval per-job required")

This is quite useful to delay the final smoke test of packages until after the manual steps are done as it will then verify them all for you.

#### Packages server sync

The workflow above ensures all release artefacts are pushed to the appropriate container registry and S3 bucket for official releases.
The packages server then periodically syncs from this bucket to pull down and serve the new packages so there may be a delay (up to 1 hour) before it serves the new versions.
The syncs happen hourly.
See <https://github.com/fluent/fluent-bit-infra/blob/main/terraform/provision/package-server-provision.sh.tftpl> for details of the dedicated packages server.

The main reason for a separate server is to accurately track download statistics.
Container images are handled by ghcr.io and Docker Hub, not this server.

#### Transient container publishing failures

The parallel publishing of multiple container tags for the same image seems to fail occasionally with network errors, particularly more for ghcr.io than DockerHub.
This can be resolved by just re-running the failed jobs.

#### Windows builds from AppVeyor

This is automated, however confirm that the actual build is successful for the tag: <https://ci.appveyor.com/project/fluent/fluent-bit-2e87g/history>
If not then ask a maintainer to retrigger.

It can take a while to find the one for the specific tag...

#### ARM builds

All builds are carried out in containers and intended to be run on a valid Ubuntu host to match a standard Github Actions runner.
This can take some time for ARM as we have to emulate the architecture via QEMU.

<https://github.com/fluent/fluent-bit/pull/7527> introduces support to run ARM builds on a dedicated [actuated.dev](https://docs.actuated.dev/) ephemeral VM runner.
A self-hosted ARM runner is sponsored by [Equinix Metal](https://deploy.equinix.com/metal/) and provisioned for this per the [documentation](https://docs.actuated.dev/provision-server/).
For fork workflows, this should all be skipped and run on a normal Ubuntu Github hosted runner but be aware this may take some time.

### Manual release

As long as it is built to staging we can manually publish packages as well via the script here: <https://github.com/fluent/fluent-bit/blob/master/packaging/update-repos.sh>

Containers can be promoted manually too, ensure to include all architectures and signatures.

### Create PRs

Once releases are published we need to provide PRs for the following documentation updates:

1. Windows checksums: <https://docs.fluentbit.io/manual/installation/windows#installation-packages>
2. Container versions: <https://docs.fluentbit.io/manual/installation/docker#tags-and-versions>

<https://github.com/fluent/fluent-bit-docs> is the repo for updates to docs.

Take the checksums from the release process above, the AppVeyor stage provides them all and we attempt to auto-create the PR with it.

## Unstable/nightly builds

These happen every 24 hours and [reuse the same workflow](./cron-unstable-build.yaml) as the staging build so are identical except they skip the upload to S3 step.
This means all targets are built nightly for `master` and `2.0` branches including container images and Linux, macOS and Windows packages.

The container images are available here (the tag refers to the branch):

* [ghcr.io/fluent/fluent-bit/unstable:2.0](ghcr.io/fluent/fluent-bit/unstable:2.0)
* [ghcr.io/fluent/fluent-bit/unstable:master](ghcr.io/fluent/fluent-bit/unstable:master)
* [ghcr.io/fluent/fluent-bit/unstable:windows-2019-2.0](ghcr.io/fluent/fluent-bit/unstable:windows-2019-2.0)
* [ghcr.io/fluent/fluent-bit/unstable:windows-2019-master](ghcr.io/fluent/fluent-bit/unstable:windows-2019-master)

The Linux, macOS and Windows packages are available to download from the specific workflow run.

## Integration tests

On every commit to `master` we rebuild the [packages](./build-master-packages.yaml) and [container images](./master-integration-test.yaml).
The container images are then used to [run the integration tests](./master-integration-test.yaml) from the <https://github.com/fluent/fluent-bit-ci> repository.
The container images are available as:

* [ghcr.io/fluent/fluent-bit/master:x86_64](ghcr.io/fluent/fluent-bit/master:x86_64)

## PR checks

Various workflows are run for PRs automatically:

* [Unit tests](./unit-tests.yaml)
* [Compile checks on CentOS 7 compilers](./pr-compile-check.yaml)
* [Linting](./pr-lint.yaml)
* [Windows builds](./pr-windows-build.yaml)
* [Fuzzing](./pr-fuzz.yaml)
* [Container image builds](./pr-image-tests.yaml)
* [Install script checks](./pr-install-script.yaml)

We try to guard these to only trigger when relevant files are changed to reduce any delays or resources used.
**All should be able to be triggered manually for explicit branches as well.**

The following workflows can be triggered manually for specific PRs too:

* [Integration tests](./pr-integration-test.yaml): Build a container image and run the integration tests as per commits to `master`.
* [Performance tests](./pr-perf-test.yaml): WIP to trigger a performance test on a dedicated VM and collect the results as a PR comment.
* [Full package build](./pr-package-tests.yaml): builds all Linux, macOs and Windows packages as well as container images.

To trigger these, apply the relevant label.
11 changes: 8 additions & 3 deletions .github/workflows/call-build-linux-packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ jobs:
call-build-linux-packages:
name: ${{ matrix.distro }} package build and stage to S3
environment: ${{ inputs.environment }}
runs-on: ubuntu-latest
# Ensure for OSS Fluent Bit repo we enable usage of Actuated runners for ARM builds, for forks it should keep existing ubuntu-latest usage.
runs-on: ${{ (contains(matrix.distro, 'arm' ) && (github.repository == 'fluent/fluent-bit') && 'actuated-aarch64') || 'ubuntu-latest' }}
permissions:
contents: read
strategy:
Expand All @@ -119,6 +120,10 @@ jobs:
with:
ref: ${{ inputs.ref }}

- name: Set up Actuated mirror
if: contains(matrix.distro, 'arm' ) && (github.repository == 'fluent/fluent-bit')
uses: self-actuated/hub-mirror@master

- name: Set up QEMU
uses: docker/setup-qemu-action@v2

Expand Down Expand Up @@ -158,7 +163,7 @@ jobs:
# For ubuntu map to codename using the disto-info list (CSV)
run: |
sudo apt-get update
sudo apt-get install -y distro-info
sudo apt-get install -y distro-info awscli
TARGET=${DISTRO%*.arm64v8}
if [[ "$TARGET" == "ubuntu/"* ]]; then
UBUNTU_CODENAME=$(cut -d ',' -f 1,3 < "/usr/share/distro-info/ubuntu.csv"|grep "${TARGET##*/}"|cut -d ',' -f 2)
Expand Down Expand Up @@ -216,7 +221,7 @@ jobs:
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y createrepo-c aptly
sudo apt-get install -y createrepo-c aptly awscli
- name: Checkout code for repo metadata construction - always latest
uses: actions/checkout@v3
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/call-build-windows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,13 +127,18 @@ jobs:
C:\vcpkg\vcpkg install --recurse openssl --triplet ${{ matrix.config.vcpkg_triplet }}
shell: cmd

- name: Build libyaml with vcpkg
run: |
C:\vcpkg\vcpkg install --recurse libyaml --triplet ${{ matrix.config.vcpkg_triplet }}
shell: cmd

- name: Build Fluent Bit packages
# If we are using 2.0.* or earlier we need to exclude the ARM64 build as the dependencies fail to compile.
# Trying to do via an exclude for the job triggers linting errors.
# This is only supposed to be a workaround for now so can be easily removed later.
if: ${{ matrix.config.arch != 'amd64_arm64' || needs.call-build-windows-get-meta.outputs.armSupported == 'true' }}
run: |
cmake -G "NMake Makefiles" -DFLB_NIGHTLY_BUILD='${{ inputs.unstable }}' -DOPENSSL_ROOT_DIR='${{ matrix.config.openssl_dir }}' ${{ matrix.config.cmake_additional_opt }} ../
cmake -G "NMake Makefiles" -DFLB_NIGHTLY_BUILD='${{ inputs.unstable }}' -DOPENSSL_ROOT_DIR='${{ matrix.config.openssl_dir }}' ${{ matrix.config.cmake_additional_opt }} -DFLB_LIBYAML_DIR=C:\vcpkg\packages\libyaml_${{ matrix.config.vcpkg_triplet }} ../
cmake --build .
cpack
working-directory: build
Expand Down
32 changes: 20 additions & 12 deletions .github/workflows/call-run-integration-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -216,13 +216,19 @@ jobs:
ref: ${{ inputs.ref }}
repository: fluent/fluent-bit-ci

- name: Configure system for Opensearch
run: |
sudo sysctl -w vm.max_map_count=262144
sysctl -p
shell: bash

- name: Setup BATS
uses: mig4/setup-bats@v1
with:
bats-version: 1.7.0
bats-version: 1.9.0

- name: Create k8s Kind Cluster
uses: helm/kind-action@v1.5.0
uses: helm/kind-action@v1.7.0
with:
node_image: kindest/node:${{ matrix.k8s-release }}
cluster_name: kind
Expand All @@ -236,6 +242,7 @@ jobs:
uses: azure/[email protected]

- name: Run tests
timeout-minutes: 60
run: |
kind load docker-image ${{ inputs.image_name }}:${{ inputs.image_tag }}
./run-tests.sh
Expand All @@ -261,6 +268,8 @@ jobs:
cloud:
- aks
- gke
env:
USE_GKE_GCLOUD_AUTH_PLUGIN: true
steps:
- uses: actions/checkout@v3
with:
Expand All @@ -274,11 +283,13 @@ jobs:

- if: matrix.cloud == 'gke'
uses: 'google-github-actions/setup-gcloud@v1'
with:
install_components: 'gke-gcloud-auth-plugin'

- name: Setup BATS
uses: mig4/setup-bats@v1
with:
bats-version: 1.7.0
bats-version: 1.9.0

- name: Set up Helm
uses: azure/[email protected]
Expand All @@ -290,13 +301,10 @@ jobs:

- name: Get the GKE Kubeconfig
if: matrix.cloud == 'gke'
run: |
gcloud info
gcloud container clusters get-credentials "$GKE_CLUSTER_NAME" --zone "$GKE_ZONE"
shell: bash
env:
GKE_CLUSTER_NAME: ${{ needs.call-run-terraform-setup.outputs.gke-cluster-name }}
GKE_ZONE: ${{ needs.call-run-terraform-setup.outputs.gke-cluster-zone }}
uses: 'google-github-actions/get-gke-credentials@v1'
with:
cluster_name: ${{ needs.call-run-terraform-setup.outputs.gke-cluster-name }}
location: ${{ needs.call-run-terraform-setup.outputs.gke-cluster-zone }}

- name: Get the AKS Kubeconfig
if: matrix.cloud == 'aks'
Expand All @@ -321,8 +329,7 @@ jobs:
shell: bash

- name: Run tests
# https://github.com/fluent/fluent-bit-ci/issues/80
continue-on-error: ${{ matrix.cloud == 'gke' }}
timeout-minutes: 60
run: |
./run-tests.sh
shell: bash
Expand All @@ -334,4 +341,5 @@ jobs:
HOSTED_OPENSEARCH_HOST: ${{ needs.call-run-terraform-setup.outputs.aws-opensearch-endpoint }}
HOSTED_OPENSEARCH_PORT: 443
HOSTED_OPENSEARCH_USERNAME: admin
USE_GKE_GCLOUD_AUTH_PLUGIN: true
HOSTED_OPENSEARCH_PASSWORD: ${{ secrets.opensearch_admin_password }}
2 changes: 1 addition & 1 deletion .github/workflows/call-test-images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ jobs:
ref: ${{ inputs.ref }}

- name: Create k8s Kind Cluster
uses: helm/kind-action@v1.5.0
uses: helm/kind-action@v1.7.0

- name: Set up Helm
uses: azure/[email protected]
Expand Down
15 changes: 8 additions & 7 deletions .github/workflows/cron-scorecards-analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,24 @@ jobs:
permissions:
# Needed to upload the results to code-scanning dashboard.
security-events: write
actions: read
contents: read

# Needed for GitHub OIDC token if publish_results is true
id-token: write
steps:
- name: "Checkout code"
uses: actions/checkout@v3
with:
persist-credentials: false

- name: "Run analysis"
uses: ossf/scorecard-action@e38b1902ae4f44df626f11ba0734b14fb91f8f86
uses: ossf/scorecard-action@08b4669551908b1024bb425080c797723083c031
with:
results_file: scorecard-results.sarif
results_format: sarif
# Read-only PAT token. To create it,
# follow the steps in https://github.com/ossf/scorecard-action#pat-token-creation.
repo_token: ${{ secrets.SCORECARD_READ_TOKEN || github.token }}
# (Optional) fine-grained personal access token. Uncomment the `repo_token` line below if:
# - you want to enable the Branch-Protection check on a *public* repository, or
# To create the PAT, follow the steps in https://github.com/ossf/scorecard-action#authentication-with-fine-grained-pat-optional.
repo_token: ${{ secrets.SCORECARD_TOKEN }}
#
# Publish the results for public repositories to enable scorecard badges. For more details, see
# https://github.com/ossf/scorecard-action#publishing-results.
# For private repositories, `publish_results` will automatically be set to `false`, regardless
Expand Down
Loading

0 comments on commit 441e402

Please sign in to comment.