Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report healtchecks to status #280

Merged
merged 1 commit into from
Oct 31, 2024
Merged

Report healtchecks to status #280

merged 1 commit into from
Oct 31, 2024

Conversation

maksymvavilov
Copy link
Contributor

@maksymvavilov maksymvavilov commented Oct 24, 2024

Reports health checks into the status of DNSRecord.
It also modifies the Ready status. The logic is:

  • If all checks pass - the record is healthy and ready
  • If we have failing and healthy probes - the record is not healthy but ready
  • If all checks fail - the record is not healthy and not ready

@maksymvavilov maksymvavilov changed the title [WIP] Report healtchecks to status Report healtchecks to status Oct 24, 2024
@maksymvavilov maksymvavilov marked this pull request as ready for review October 24, 2024 15:26
@maksymvavilov maksymvavilov linked an issue Oct 24, 2024 that may be closed by this pull request
@maksymvavilov maksymvavilov force-pushed the gh-151 branch 4 times, most recently from bf50c50 to 5acb6c7 Compare October 25, 2024 10:16
@maksymvavilov maksymvavilov force-pushed the gh-151 branch 2 times, most recently from 8b050df to 4bfff2e Compare October 30, 2024 12:59
endpoint: "/health"
port: 80
protocol: "HTTPS"
failureThreshold: 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail e2e now so removing. Will be brought back in a next PR

@maleck13
Copy link
Collaborator

maleck13 commented Oct 31, 2024

tried with the following setup:

---
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: external
spec:
  gatewayClassName: istio
  listeners:
    - name: t1a
      port: 80
      hostname: 't1a.cb.hcpapps.net'
      protocol: HTTP
    - name: t1b
      port: 80
      hostname: 't1b.cb.hcpapps.net'
      protocol: HTTP
    - name: t1c
      port: 80
      hostname: 't1c.cb.hcpapps.net'
      protocol: HTTP
    - name: t1d
      port: 80
      hostname: 't1d.cb.hcpapps.net'
      protocol: HTTP

---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: toystore
  labels:
    app: toystore
spec:
  parentRefs:
    - name: external
  hostnames:
    [
      't1a.cb.hcpapps.net',
      't1b.cb.hcpapps.net',
      't1c.cb.hcpapps.net',
      't1d.cb.hcpapps.net',
    ]
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: '/toy'
          method: GET
        - path:
            type: Exact
            value: '/admin/toy'
          method: POST
        - path:
            type: Exact
            value: '/admin/toy'
          method: DELETE
        - path:
            type: Exact
            value: '/health'
          method: GET
      backendRefs:
        - name: toystore
          port: 80
---
apiVersion: kuadrant.io/v1alpha1
kind: DNSPolicy
metadata:
  name: dnspolicy-t1a
  labels:
    policy: 'gateway'
spec:
  loadBalancing:
    weight: 125
    geo: GEO-EU
    defaultGeo: true
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: external
    sectionName: t1a
  providerRefs:
    - name: aws-credentials
  healthCheck:
    path: /health
    protocol: HTTP
    failureThreshold: 3
    interval: 30s
---
apiVersion: kuadrant.io/v1alpha1
kind: DNSPolicy
metadata:
  name: dnspolicy-t1b
  labels:
    policy: 'gateway'
spec:
  loadBalancing:
    weight: 125
    geo: GEO-EU
    defaultGeo: true
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: external
    sectionName: t1b
  providerRefs:
    - name: aws-credentials
  healthCheck:
    path: /healthz
    protocol: HTTP
    failureThreshold: 3
    interval: 30s
---
apiVersion: kuadrant.io/v1alpha1
kind: DNSPolicy
metadata:
  name: dnspolicy
  labels:
    policy: 'gateway'
spec:
  loadBalancing:
    weight: 125
    geo: GEO-EU
    defaultGeo: true
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: external
  providerRefs:
    - name: aws-credentials
  healthCheck:
    path: /health
    protocol: HTTP
    failureThreshold: 3
    interval: 30s

This resulted in

k get dnshealthcheckprobe -n t1 -o=wide
NAME                                                                                  HEALTHY   LAST CHECKED
external-t1a-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   true      18s
external-t1b-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   false     20s
external-t1c-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   true      14s
external-t1d-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   true      13s
k get dnsrecord.kuadrant.io -n t1 -o=wide
NAME           READY
external-t1a   True
external-t1b   False
external-t1c   True
external-t1d   True
k get dnspolicy -n t1 -o=wide
NAME            ACCEPTED   ENFORCED   TARGETREFKIND   TARGETREFNAME   AGE
dnspolicy       True       True       Gateway         external        2m30s
dnspolicy-t1a   True       True       Gateway         external        2m31s
dnspolicy-t1b   True       False      Gateway         external        2m30s

as expected and all but t1b records were present in the provider

Updating the dnspolicy-t1b dnspolicy to have a valid endpoint resulted in

k get dnspolicy -n t1 -o=wide
NAME            ACCEPTED   ENFORCED   TARGETREFKIND   TARGETREFNAME   AGE
dnspolicy       True       True       Gateway         external        6m39s
dnspolicy-t1a   True       True       Gateway         external        6m40s
dnspolicy-t1b   True       True       Gateway         external        6m39s

as expected and t1b plus the others records were correct in the provider

Scaling down the workload resulted in the records remaining in place and all the health checks and dnspolicies becoming unhealthy

external-t1a-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   false     21s
external-t1b-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   false     19s
external-t1c-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   false     16s
external-t1d-aef422ea1007f49b19e8ca537252a20b-411358573.us-east-1.elb.amazonaws.com   false     15s

although it took a while the DNSPolicy to show as not enforced they eventually did

NAME            ACCEPTED   ENFORCED   TARGETREFKIND   TARGETREFNAME   AGE
dnspolicy       True       False      Gateway         external        14m
dnspolicy-t1a   True       False      Gateway         external        14m
dnspolicy-t1b   True       False      Gateway         external        14m

@maleck13
Copy link
Collaborator

@maksymvavilov all looks good. I do wonder though about the validity of saying the policy isn't enforced if it has discoverd something unhealthy? Perhaps we should add a new condition Healthy and set that to true if no health checks and published and true or false based on health checks of records?

@maleck13 maleck13 added this pull request to the merge queue Oct 31, 2024
Merged via the queue into main with commit a8d4411 Oct 31, 2024
13 of 15 checks passed
@maksymvavilov maksymvavilov deleted the gh-151 branch November 7, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add health checks to DNS Record status
2 participants