Skip to content

Commit

Permalink
Merge branch 'main' into metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
Surya361 authored Jun 7, 2024
2 parents 7477817 + 0816302 commit 8e59e06
Show file tree
Hide file tree
Showing 115 changed files with 2,683 additions and 2,663 deletions.
50 changes: 45 additions & 5 deletions .github/workflows/cbench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,15 @@ on:
- "quickwit/**"
- "!quickwit/quickwit-ui/**"
# For security reasons (to make sure the list of allowed users is
# trusted), make sure we run the workflow definition the base of the
# pull request.
# trusted), make sure we run the workflow definition from the base
# commit of the pull request.
pull_request_target:

# This is required for github.rest.issues.createComment.
permissions:
issues: write
pull-requests: write

env:
RUSTFLAGS: --cfg tokio_unstable

Expand All @@ -26,7 +31,7 @@ jobs:
# The self-hosted runner must have the system deps installed for QW and
# the benchmark, because we don't have root access.
runs-on: self-hosted
timeout-minutes: 40
timeout-minutes: 60
steps:
- name: Set authorized users
id: authorized-users
Expand Down Expand Up @@ -79,19 +84,54 @@ jobs:
- name: Run Benchmark on SSD
if: contains(fromJSON(steps.authorized-users.outputs.users), github.actor)
id: bench-run-ssd
run: python3 ./run.py --search-only --storage pd-ssd --engine quickwit --track generated-logs --tags "${{ github.event_name }}_${{ github.ref_name }}" --manage-engine --source github_workflow --binary-path ../quickwit/quickwit/target/release/quickwit --instance "{autodetect_gcp}" --export-to-endpoint=https://qw-benchmarks.104.155.161.122.nip.io --engine-data-dir "{qwdata_local}" --write-exported-run-url-to-file $GITHUB_OUTPUT
run: python3 ./run.py --search-only --storage pd-ssd --engine quickwit --track generated-logs --tags "${{ github.event_name }}_${{ github.ref_name }}" --manage-engine --source github_workflow --binary-path ../quickwit/quickwit/target/release/quickwit --instance "{autodetect_gcp}" --export-to-endpoint=https://qw-benchmarks.104.155.161.122.nip.io --engine-data-dir "{qwdata_local}" --github-workflow-user "${{ github.actor }}" --github-workflow-run-id "${{ github.run_id }}" --comparison-reference-tag="push_main" --github-pr "${{ github.event_name == 'pull_request_target' && github.event.number || 0 }}" --comparison-reference-commit "${{ github.event_name == 'pull_request_target' && github.sha || github.event.before }}" --write-exported-run-url-to-file $GITHUB_OUTPUT
working-directory: ./benchmarks
- name: Run Benchmark on cloud storage
if: contains(fromJSON(steps.authorized-users.outputs.users), github.actor)
id: bench-run-cloud-storage
run: python3 ./run.py --search-only --storage gcs --engine quickwit --track generated-logs --tags "${{ github.event_name }}_${{ github.ref_name }}" --manage-engine --source github_workflow --binary-path ../quickwit/quickwit/target/release/quickwit --instance "{autodetect_gcp}" --export-to-endpoint=https://qw-benchmarks.104.155.161.122.nip.io --engine-data-dir "{qwdata_gcs}" --write-exported-run-url-to-file $GITHUB_OUTPUT
run: python3 ./run.py --search-only --storage gcs --engine quickwit --track generated-logs --tags "${{ github.event_name }}_${{ github.ref_name }}" --manage-engine --source github_workflow --binary-path ../quickwit/quickwit/target/release/quickwit --instance "{autodetect_gcp}" --export-to-endpoint=https://qw-benchmarks.104.155.161.122.nip.io --engine-data-dir "{qwdata_gcs}" --engine-config-file engines/quickwit/configs/cbench_quickwit_gcs.yaml --github-workflow-user "${{ github.actor }}" --github-workflow-run-id "${{ github.run_id }}" --comparison-reference-tag="push_main" --github-pr "${{ github.event_name == 'pull_request_target' && github.event.number || 0 }}" --comparison-reference-commit "${{ github.event_name == 'pull_request_target' && github.sha || github.event.before }}" --write-exported-run-url-to-file $GITHUB_OUTPUT
working-directory: ./benchmarks
- name: Show results links
if: contains(fromJSON(steps.authorized-users.outputs.users), github.actor)
run: |
echo "::notice title=Benchmark Results on SSD::${{ steps.bench-run-ssd.outputs.url }}"
echo "::notice title=Comparison of results on SSD::${{ steps.bench-run-ssd.outputs.comparison_text }}"
echo "::notice title=Benchmark Results on Cloud Storage::${{ steps.bench-run-cloud-storage.outputs.url }}"
echo "::notice title=Comparison of results on Cloud Storage::${{ steps.bench-run-cloud-storage.outputs.comparison_text }}"
- name: In case of auth error
if: ${{ ! contains(fromJSON(steps.authorized-users.outputs.users), github.actor) }}
run: |
echo "::error title=User not allowed to run the benchmark::User must be in list ${{ steps.authorized-users.outputs.users }}"
- name: Add a PR comment with comparison results
uses: actions/github-script@v7
if: contains(fromJSON(steps.authorized-users.outputs.users), github.actor) && github.event_name == 'pull_request_target'
# Inspired from: https://github.com/actions/github-script/blob/60a0d83039c74a4aee543508d2ffcb1c3799cdea/.github/workflows/pull-request-test.yml
with:
script: |
// Get the existing comments.
const {data: comments} = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.number,
})
// Find any comment already made by the bot to update it.
const botComment = comments.find(comment => comment.user.id === 41898282)
const commentBody = "### On SSD:\n${{ steps.bench-run-ssd.outputs.comparison_text }}\n### On GCS:\n${{ steps.bench-run-cloud-storage.outputs.comparison_text }}\n"
if (botComment) {
// Update existing comment.
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: botComment.id,
body: commentBody
})
} else {
// New comment.
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.number,
body: commentBody
})
}
6 changes: 3 additions & 3 deletions distribution/ecs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ Metastore database backups are disabled as restoring one would lead to
inconsistencies with the index store on S3. To ensure high availability, you
should enable `rds_config.multi_az` instead. To use your own Postgres database
instead of creating a new RDS instance, configure the
`external_postgres_uri_ssm_parameter_arn` variable (e.g
`postgres://user:password@domain:port/db`).
`external_postgres_uri_secret_arn` variable (e.g ARN of an SSM parameter with
the value `postgres://user:password@domain:port/db`).

Using NAT Gateways for the image registry is quite costly (approx. $0.05/hour/AZ). If
you are not already using NAT Gateways in the AZs where Quickwit will be
Expand Down Expand Up @@ -64,7 +64,7 @@ IAM policies to indexers.
We provide an example of self contained deployment with an ad-hoc VPC.

> [!IMPORTANT]
> This stack costs ~$150/month to run (Fargate tasks, NAT Gateways
> This stack costs ~$200/month to run (Fargate tasks, NAT Gateways
> and RDS)
### Deploy the Quickwit module and connect through a bastion
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/example/terraform.tf
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ module "quickwit" {
# multi_az = false
# }

# external_postgres_uri_ssm_parameter_arn = aws_ssm_parameter.postgres_uri.arn
# external_postgres_uri_secret_arn = aws_ssm_parameter.postgres_uri.arn

## Example logging configuration
# sidecar_container_definitions = {
Expand Down
4 changes: 2 additions & 2 deletions distribution/ecs/quickwit/configs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ locals {

quickwit_index_s3_prefix = var.quickwit_index_s3_prefix == "" ? aws_s3_bucket.index[0].id : var.quickwit_index_s3_prefix

use_external_rds = var.external_postgres_uri_ssm_parameter_arn != ""
postgres_uri_parameter_arn = var.external_postgres_uri_ssm_parameter_arn != "" ? var.external_postgres_uri_ssm_parameter_arn : aws_ssm_parameter.postgres_credential[0].arn
use_external_rds = var.external_postgres_uri_secret_arn != ""
postgres_uri_secret_arn = var.external_postgres_uri_secret_arn != "" ? var.external_postgres_uri_secret_arn : aws_ssm_parameter.postgres_credential[0].arn
}

resource "random_id" "module" {
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/iam.tf
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ data "aws_iam_policy_document" "quickwit_task_execution_permission" {
statement {
actions = ["ssm:GetParameters"]

resources = [local.postgres_uri_parameter_arn]
resources = [local.postgres_uri_secret_arn]
}

statement {
Expand Down
12 changes: 12 additions & 0 deletions distribution/ecs/quickwit/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,15 @@ output "indexer_service_name" {
output "searcher_service_name" {
value = "${aws_service_discovery_service.searcher.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}"
}

output "janitor_service_name" {
value = "${aws_service_discovery_service.janitor.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}"
}

output "control_plane_service_name" {
value = "${aws_service_discovery_service.control_plane.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}"
}

output "metastore_service_name" {
value = "${aws_service_discovery_service.metastore.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}"
}
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/quickwit-control-plane.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module "quickwit_control_plane" {
service_name = "control_plane"
service_discovery_registry_arn = aws_service_discovery_service.control_plane.arn
cluster_arn = module.ecs_cluster.arn
postgres_credential_arn = local.postgres_uri_parameter_arn
postgres_uri_secret_arn = local.postgres_uri_secret_arn
quickwit_peer_list = local.quickwit_peer_list
s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn
task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/quickwit-indexer.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module "quickwit_indexer" {
service_name = "indexer"
service_discovery_registry_arn = aws_service_discovery_service.indexer.arn
cluster_arn = module.ecs_cluster.arn
postgres_credential_arn = local.postgres_uri_parameter_arn
postgres_uri_secret_arn = local.postgres_uri_secret_arn
quickwit_peer_list = local.quickwit_peer_list
s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn
task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/quickwit-janitor.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module "quickwit_janitor" {
service_name = "janitor"
service_discovery_registry_arn = aws_service_discovery_service.janitor.arn
cluster_arn = module.ecs_cluster.arn
postgres_credential_arn = local.postgres_uri_parameter_arn
postgres_uri_secret_arn = local.postgres_uri_secret_arn
quickwit_peer_list = local.quickwit_peer_list
s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn
task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/quickwit-metastore.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module "quickwit_metastore" {
service_name = "metastore"
service_discovery_registry_arn = aws_service_discovery_service.metastore.arn
cluster_arn = module.ecs_cluster.arn
postgres_credential_arn = local.postgres_uri_parameter_arn
postgres_uri_secret_arn = local.postgres_uri_secret_arn
quickwit_peer_list = local.quickwit_peer_list
s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn
task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn
Expand Down
2 changes: 1 addition & 1 deletion distribution/ecs/quickwit/quickwit-searcher.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module "quickwit_searcher" {
service_name = "searcher"
service_discovery_registry_arn = aws_service_discovery_service.searcher.arn
cluster_arn = module.ecs_cluster.arn
postgres_credential_arn = local.postgres_uri_parameter_arn
postgres_uri_secret_arn = local.postgres_uri_secret_arn
quickwit_peer_list = local.quickwit_peer_list
s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn
task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn
Expand Down
6 changes: 1 addition & 5 deletions distribution/ecs/quickwit/service/ecs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ module "quickwit_service" {
secrets = [
{
name = "QW_METASTORE_URI"
valueFrom = var.postgres_credential_arn
valueFrom = var.postgres_uri_secret_arn
}
]

Expand Down Expand Up @@ -119,10 +119,6 @@ module "quickwit_service" {
}
]

task_exec_ssm_param_arns = [
var.postgres_credential_arn
]

tasks_iam_role_policies = local.tasks_iam_role_policies

task_exec_iam_role_policies = {
Expand Down
4 changes: 3 additions & 1 deletion distribution/ecs/quickwit/service/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ variable "subnet_ids" {
type = list(string)
}

variable "postgres_credential_arn" {}
variable "postgres_uri_secret_arn" {
description = "ARN of the SSM parameter or Secret Manager secret containing the URI of a Postgres instance"
}

variable "quickwit_image" {}

Expand Down
10 changes: 5 additions & 5 deletions distribution/ecs/quickwit/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ variable "quickwit_indexer" {
description = "Indexer service sizing configurations"
type = object({
desired_count = optional(number, 1)
memory = optional(number, 4096)
cpu = optional(number, 1024)
memory = optional(number, 8192)
cpu = optional(number, 2048)
ephemeral_storage_gib = optional(number, 21)
extra_task_policy_arns = optional(list(string), [])
})
Expand All @@ -95,7 +95,7 @@ variable "quickwit_searcher" {
description = "Searcher service sizing configurations"
type = object({
desired_count = optional(number, 1)
memory = optional(number, 2048)
memory = optional(number, 4096)
cpu = optional(number, 1024)
ephemeral_storage_gib = optional(number, 21)
})
Expand Down Expand Up @@ -131,7 +131,7 @@ variable "rds_config" {
default = {}
}

variable "external_postgres_uri_ssm_parameter_arn" {
description = "ARN of the SSM parameter containing the URI of a Postgres instance (postgres://{user}:{password}@{address}:{port}/{db_instance_name}). The Postgres instance should allow indbound connections from the subnets specified in `variable.subnet_ids`. If provided, the internal RDS will not be created and `var.rds_config` is ignored."
variable "external_postgres_uri_secret_arn" {
description = "ARN of the SSM parameter or Secret Manager secret containing the URI of a Postgres instance (postgres://{user}:{password}@{address}:{port}/{db_instance_name}). The Postgres instance should allow indbound connections from the subnets specified in `variable.subnet_ids`. If provided, the internal RDS will not be created and `var.rds_config` is ignored."
default = ""
}
3 changes: 3 additions & 0 deletions distribution/lambda/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,11 @@ package:
then
pushd ../../quickwit/
rustc --version
# TODO: remove --disable-optimizations when upgrading to a release containing
# https://github.com/cargo-lambda/cargo-lambda/issues/649 (> 1.2.1)
cargo lambda build \
-p quickwit-lambda \
--disable-optimizations \
--release \
--output-format zip \
--target x86_64-unknown-linux-gnu
Expand Down
9 changes: 8 additions & 1 deletion distribution/lambda/cdk/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,13 @@ def get_logs(
last_event_id = ""
last_event_found = True
start_time = time.time()
while time.time() - start_time < timeout:
describe_resp = client.describe_log_groups(logGroupNamePrefix=log_group_name)
group_names = [group["logGroupName"] for group in describe_resp["logGroups"]]
if log_group_name in group_names:
break
print(f"log group not found, retrying...")
time.sleep(3)
while time.time() - start_time < timeout:
for page in paginator.paginate(
logGroupName=log_group_name,
Expand All @@ -268,7 +275,6 @@ def get_logs(
last_event_id = event["eventId"]
yield event["message"]
if event["message"].startswith("REPORT"):
lower_time_bound = int(event["timestamp"])
last_event_id = "REPORT"
break
if last_event_id == "REPORT":
Expand Down Expand Up @@ -454,3 +460,4 @@ def req(method, path, body=None, expected_status=200):
expected_status=400,
)
req("GET", f"/api/v1/_elastic/_search?q=animal", expected_status=501)
req("GET", f"/api/v1/indexes/{mock_sales_index_id}")
6 changes: 5 additions & 1 deletion docs/configuration/metastore-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,11 @@ By default, the File-Backed Metastore is only read once when you start a Quickwi

You can also configure it to poll the File-Backed Metastore periodically to keep a fresh view of it. This is useful for a Searcher instance that needs to be aware of new splits published by an Indexer running in parallel.

To configure the polling interval (in seconds only), add a URI fragment to the storage URI like this: `s3://quickwit/my-indexes#polling_interval=30s`
To configure the polling interval (in seconds), add a URI fragment to the storage URI as follows: `s3://quickwit/my-indexes#polling_interval=30s`

:::note
The polling interval can be configured in seconds only; other units, such as minutes or hours, are not supported.
:::

:::tip
Amazon S3 charges $0.0004 per 1000 GET requests. Polling a metastore every 30 seconds costs $0.04 per month and index.
Expand Down
16 changes: 3 additions & 13 deletions docs/configuration/node-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,17 +123,7 @@ This section may contain one configuration subsection per available metastore im

### File-backed metastore configuration

| Property | Description | Default value |
| --- | --- | --- |
| `polling_interval` | Time interval between successive polling attempts to detect metastore changes. | `30s` |

Example of a metastore configuration for a file-backed implementation in YAML format:

```yaml
metastore:
file:
polling_interval: 1m
```
File-backed metastore doesn't have any node level configuration. You can configure the poll interval [at the index level](./metastore-config.md#polling-configuration).

### PostgreSQL metastore configuration

Expand Down Expand Up @@ -163,8 +153,8 @@ This section contains the configuration options for an indexer. The split store

| Property | Description | Default value |
| --- | --- | --- |
| `split_store_max_num_bytes` | Maximum size in bytes allowed in the split store for each index-source pair. | `100G` |
| `split_store_max_num_splits` | Maximum number of files allowed in the split store for each index-source pair. | `1000` |
| `split_store_max_num_bytes` | Maximum size in bytes allowed in the split store. | `100G` |
| `split_store_max_num_splits` | Maximum number of files allowed in the split store. | `1000` |
| `max_concurrent_split_uploads` | Maximum number of concurrent split uploads allowed on the node. | `12` |
| `merge_concurrency` | Maximum number of merge operations that can be executed on the node at one point in time. | `(2 x num threads available) / 3` |
| `enable_otlp_endpoint` | If true, enables the OpenTelemetry exporter endpoint to ingest logs and traces via the OpenTelemetry Protocol (OTLP). | `false` |
Expand Down
Loading

0 comments on commit 8e59e06

Please sign in to comment.