Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed DNS (RFC 0008) #55

Closed
wants to merge 13 commits into from
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
215 changes: 215 additions & 0 deletions rfcs/0008-distributed-dns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Distributed DNS Load Balancing

- Feature Name: Distributed DNS Load Balancing
- Start Date: 2024-01-17
- RFC PR: [Kuadrant/architecture#0008](https://github.com/Kuadrant/architecture/pull/55)
- Issue tracking: [Kuadrant/architecture#0008](https://github.com/Kuadrant/architecture/issues/56)

# Summary
[summary]: #summary

Enable the DNS Operator to manage DNS for one or more hostnames shared across one or more clusters. Remove the requirement for a central "hub" cluster or multi-cluster control plane.

# Motivation
[motivation]: #motivation

Currently, the DNS operator has limited functionality for a single cluster, and none for multiple clusters without OCM. Having the DNS operator enabled to manage DNS in these situations significantly eases onboarding users to leveraging DNSPolicy across both single and multi-cluster deployments.

# Diagrams
[diagrams]: #diagrams

- ![Distributed DNS](0008-distributed-dns-assets%2F0008-distributed-dns.jpg)
- ![Load Balanced DNS Records](0008-distributed-dns-assets%2F0008-loadbalanced-dns-records.jpg)

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Single Cluster and Multiple Cluster
Every cluster requires a DNS Policy configured with a provider secret. Every zone will be treated as a distributed DNS zone. In each cluster, when a matching listener host defined in the gateway is found that matches one of the available zones in the provider the addresses from that gateway will be appended to the zone file - in a non-destructive manner - allowing multiple clusters to all add their gateways addresses to the same zone file, without destroying each other's values.
philbrookes marked this conversation as resolved.
Show resolved Hide resolved

### Health Checks
The health checks provided by the various DNS Providers will be used.
philbrookes marked this conversation as resolved.
Show resolved Hide resolved

## OCM / ACM
This is very similar, with the caveat that the "cluster" is the hub cluster for this OCM and that the gateway will be a Kuadrant gatewayClass.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

## Migrating from single/multiple clusters to OCM

This is not covered.

## Cleanup

Cleanup of a cluster's zone file entries is triggered by deleting the DNS Policy that caused the kuadrantRecords to be created.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

The DNS policy will have a finalizer, when it is deleted the controller will mark the created kuadrantRecords for deletion.

The kuadrantRecords will have a finalizer from the DNS Operator which will clean the records from the zone before removing the finalizer.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## Terminology
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
- zone / DNS Provider zone: The set of records listed in the DNS Provider (the live list of DNS records)
- KuadrantRecord: The DNSRecord created on the local cluster by Kuadrant - only contains the DNS requirements for the local cluster
- ZoneRecord: The DNSRecord created on the local cluster by the DNS Operator, this reflects the DNSRecord as it appears in the zone.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
- Root Host: The listener on a gateway that caused the Kuadrant Operator to create a KuadrantRecord (i.e the host that the users will be interacting with)

## General Logic Overview

The general flow in the Kuadrant operator follows a single path, where it will always output a kuadrantRecord, which specifies everything needed for this workload on this cluster, and is unaware of the concept of distributed DNS.

This kuadrantRecord is reconciled by the DNS Operator, the DNS Operator will act with the DNS Provider's zone and ensure that the records in the zone relevant to the hosts defined within gateways on it's cluster are present. When cleaning up a DNS Record the operator will ensure all records owned by the policy for the gateway on the local cluster are removed, then the DNS Operator will also prune unresolvable records (an unresolvable record is a record that has no logical value such as an IP Address or a CNAME to a different root hostname) related to the hosts defined in the kuadrantRecords.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure that the records in the zone relevant to the hosts defined within gateways on it's cluster are present -> For me it seems like DNSOperator interacts with Gateways to obtain hostnames, which it should not

When cleaning up a DNSRecord the operator will ensure... -> cleaning up which DNSRecord (kuadrantRecord/DNSRecord)? Which operator will ensure...?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the only the Kuadrant Operator will interact with the gateways


### Configuration Updates
There will be a new configuration option that can be applied as runtime argument (to allow us emergency configuration rescue in case of an unexpected issue) to the kuadrant-operator:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you wrote kuadrant-operator creates a kuadrantRecord that is unaware of other clusters; on the other hand kuadrant-operator now has to deal with something (clusterID) which potentially distinguishes clusters; when this is done the distributed DNS functionality is scattered between kuadrant-operator and DNSOperator; I thought that the only operator aware of distributed DNS is DNSOperator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the Kuadrant Operator is just aware of how to construct the DNSRecord leveraging an ID. The DNSOperator is aware of the DNSRecord and the provider and the fact it may be in multiple clusters. The hard multi-cluster work is all done by the DNS Operator

- `clusterID` (see below for how this is used)
- When absent, this is generated in code (more information below). A unique string to identify this cluster apart from all other clusters.

### Changes to DNS Policy reconciler (kuadrant operator)

This will primarily need to be made aware that it will need to propagate the DNS Policy health check block into any DNS Records that it creates or updates using the provider specific fields.

It will also need to update the format of the records in the KuadrantRecord, to make use of the new GUID and clusterID when formatting the record names and targets.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

### DNS Operator

This component takes the kuadrantRecords output by the kuadrant-operator and ensure they are accurately reflected in the DNS Provider zone and currently assumes the DNS Record is the source of truth and always ensures that the zone in the DNS Provider matches the local DNS Record CR.

There are several changes required in this component:

#### Generate a ClusterID

The cluster will need a deterministic manner to generate an ID that is unique enough to not clash with other clusterIDs, that can be regenerated if ever required.

This clusterID is used as a prefix to identify which A/CNAME records were created by the local cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine more usage for clusterIDs beyond DNS (e.g. for global rate limiting.) I wonder if generating the clusterID shouldn't be a function of Kuadrant Operator.

The relation with the DNS Operator would then invert. Maybe the DNS Operator only needs to know about cluster IDs for the heartbeats. All the rest of the info it needs is in the DNSRecord CR.


The suggestion [here](https://groups.google.com/g/kubernetes-sig-architecture/c/mVGobfD4TpY/m/nkdbkX1iBwAJ) is to use the UID of the `kube-system` namespace, as this [cannot be deleted easily](https://groups.google.com/g/kubernetes-sig-architecture/c/mVGobfD4TpY/m/lR4SnaqpAAAJ) - so is practically speaking indelible.

We can take the UID of that namespace and apply a hashing algorithm to result in 6-7 character code which can be used as the clusterID for the local cluster.
philbrookes marked this conversation as resolved.
Show resolved Hide resolved

#### Update GUID Logic

Currently the DNS Operator builds the LB CNAME with a guid based on the name of the gateway, as this could potentially be different on different clusters, this will need to be updated to generate based on the zone's root hostname, to ensure it is consistently generated on all clusters that interact with this zone.

#### Ensure local records are accurate

When the local KuadrantRecord is updated, the DNS Operator will ensure those values are present in the zone by interacting directly with the DNS Provider - however it will now only remove existing records if the name contains the local clusters clusterID, it will also remove CNAME values, if the value contains the local clusters clusterID.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the local KuadrantRecord is updated, the DNS Operator will ensure those values are present in the zone by interacting directly with the DNS Provider Isn't there another layer of DNSRecord involved? ie. KuadrantRecord is updated DNSOperator updates matching DNSRecord and state of this DNSRecord is reflected to the dns-provider?

however it will now only remove existing records if the name contains the local clusters clusterID, it will also remove CNAME values, if the value contains the local clusters clusterID. -> why does it only talk about deleting records; shouldn't this be generalized to all other cases (adding, updating,...)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Kuadrant Opertor updates the DNSRecord. DNSOperator ensures that these records are written to the provider and validates they are still present a short time afterwards

Copy link
Collaborator

@maleck13 maleck13 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the second comment. The DNSOperator is removing from the local copy of the zone before re-writing them based on the current spec of the DNSRecord


Whenever the DNS Operator does a create or update on the zone (but not a delete), it will requeue the KuadrantRecord for TTL / 2, to verify the results are present. This is identified when the observedGeneration and Generation are equal, we should list the zone and ensure all expected records are present. If they are absent, add them and requeue to verify again.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

#### Cleanup

When a DNS Policy is marked for deletion the kuadrant operator will delete all relevant kuadrantRecords.

Whenever a deleted kuadrantRecord is reconciled by the DNS Operator, it will remove any relevant records from the DNS Provider (i.e. any record with the clusterID prefix in the name. and any target with the clusterID in the target) for the deleted kuadrantZone's root host.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

#### Prune dead branches from DNS Records

If a KuadrantRecord is deleting, there is the potential that we are leaving behind a dead branch, which will need to be recursively cleaned until none remain.

What is a dead branch? If a CNAME exists whose value is a record that does not exist, this CNAME will not resolve, as our DNS Records are structured similarly to a tree, this will be referenced as a "dead branch".

The controller will need to work through the DNS Record and ensure that from hostname to A (or external CNAME) records is structurally sound and that any dead branches are removed, to do this, the DNS Operator will need to first read what is published in the provider zone:
philbrookes marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if one operator instance is in the process of removing dead branch and another one in process of adding its record to this branch. The first operator removes the common dns path for the second operator's record.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled by the DNSOperator. It will do the prune. On the verify loop it will validate that its own values/records are gone ONLY it will not re-prune if no dead branch is found


#### Build ZoneRecord from the provider zone

When the DNS Operator has finished reconciling a deleting kuadrantRecord it will pull the most recent records from the DNS Provider and construct a separate zoneRecord for the same root host as the deleting kuadrantRecord.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

#### Prune the zoneRecord

It will recursively analyse the zoneRecord and ensure that it is structurally sound and that all records can be resolved to at least one external CNAME or A record.

If during this pruning period the zoneRecord is modified, then these modifications will need to be sent to the DNS Provider to clean the zone.

Once completed, the zoneRecord can be deleted (this may be possible to only reflect in memory completely).

#### Removing the finalizer from kuadrantRecords

When a kuadrantRecord is reconciled, the DNS Operator should apply a finalizer to it.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

When a kuadrantRecord is being removed, the following must be successfully completed before the finalizer can be removed:
- Remove the local clusters records and targets from the relevant zone
- Perform a prune on the relevant root host
- Apply the results of the prune to the DNS Provider

Only once these actions have all resolved should the finalizer be removed, and the kuadrantRecord allowed to be deleted.

Should a deleted DNS Record be recreated, or a deleted DNS Policy be restored, all of the deleted records will be restored by the DNS Operator, at that point.

#### Aggregating Status

When the DNS Operator performs actions based on the state of a kuadrantRecord, it should update the kuadrantRecord status with results of these actions, similar to how it is currently implemented.

When the Kuadrant operator sees these statuses on the kuadrantRecords related to a DNS Policy, it should aggregate them into a consolidated status on the DNS Policy status.

It will be possible from this DNS Policy status to determine that all the records for this cluster, related to this DNS Policy, have been accepted by the DNS Provider.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be possible to see that across cluster status distributed as well? I.e. Warnings that there is probably a loop going on, or that one cluster uses simple strategy while others use loadbalanced?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this will be reflected. If the validation loop fails to a max back off (think about it like the image pull back off) we will flag if we hit a max back off in the status (this is a flag that the state is constantly changing)


#### Routing Strategies

There are currently 2 routing strategies "simple" and "loadbalanced".

##### Simple
The simple strategy is also compatible with ExternalDNS which makes it useful in some use-cases. This strategy outputs a very simplistic zone file and will only be functional with a single cluster.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved

##### Loadbalanced
This strategy outputs a DNS record tree with various levels of CNAMEs to allow for geo loadbalancing as well as loadbalancing within a geo and is compatible with multiple clusters.

This can be seen here: ![Load Balanced DNS Records](0008-distributed-dns-assets%2F0008-loadbalanced-dns-records.jpg)

##### Migration

These 2 strategies are not compatible, and as such the RoutingStrategy field will be set to immutable, requiring the deletion and recreation of the DNS Policy in order to update this value. This will result in the related records being removed from the zone before being created in the new format.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if one cluster has DNSPolicy with Single strategy and another one with Loadbalanced? Will the loadbalanced fail with the reason that something else is using Single?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis will result in a flip/flop scenerio and unresolvable state which will be spotted by the max back off validation check mentioned above and flagged in the status

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC wil be updated to make this clearer


#### Metrics

Some useful metrics have been determined:
- Emit a metric whenever the DNS Operator writes to the zone.
- Emit a metric whenever the kuadrant operator updates the kuadrantZone spec.
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
- Emit a metric whenever the DNS Operator deletes an entry from the zone.
- Emit a metric whenever the DNS Operator prunes a shared record from the zone.

# Drawbacks
[drawbacks]: #drawbacks

clean up in disaster scenario for multi-cluster:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could those leftover records prevent the reconciliation of a new cluster? It is not uncommon for Kubernetes cluster to fail (i.e. taken down by force, without proper reconciliation in place), does this result in a branch that wont have a owner that will stay there forever?

Copy link
Collaborator

@maleck13 maleck13 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health checks will remove these records from the response. In a single cluster right now if you nuked the cluster you would also have a situation where the records were left behind. I see no difference here between single and multi-cluster. Yes there is the potential in a disaster scenerio for records to be left behind (this is true of any kubernetes API that interacts with an external service) IMO. Additonally a heart beat option would help also to allow other clusters to spot dead clusters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additonally a heart beat option would help also to allow other clusters to spot dead clusters

I was thinking something along these lines. Could we establish a protocol for the leftover clusters to perform the cleanup on behalf of the nuked one? I suppose if the nuked cluster is ever reborn from the ashes, it will sanitise the records anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. Just read to the heartbeat section 🙂

- any cluster nuked without time to cleanup will never be cleaned from the zone.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and when there is such cluster sharing a record with another cluster the deletion of shared record upon another cluster's deletion of DNSPolicy won't delete the shared record since there are always records from the failed cluster. Thus the hostname will resolve at all times to the failed cluster even if all DNSPolicies are deleted. edit: this may be resolved by cluster heartbeat functionality?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


API usage limits on DNS providers:
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
- [Google](https://cloud.google.com/service-usage/docs/quotas) - 240 read requests per minute, 60 writes per minute
- [Azure](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling) - 60 lists and 1000 gets per minute, 40 writes per minute
- [Route53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html) - 5 API requests per second read or write
- No health checks for private clusters

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- It is the most robust, and can handle the failure of any cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this more robust than the previous OCM multicluster?

- It provides the most valuable and robust DNS health checking network
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this as we are no longer responsible for that.

- Allows MGC without OCM/ACM
- It provides the most simple migration procedure from single to multiple clusters
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
- It provides a simple and useful demo without excessive setup

# Prior art
[prior-art]: #prior-art

No prior art.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Dealing with significant portions of the cluster groups going offline unexpectedly.
pehala marked this conversation as resolved.
Show resolved Hide resolved

# Future possibilities
[future-possibilities]: #future-possibilities

## Heartbeat Check
pehala marked this conversation as resolved.
Show resolved Hide resolved

It may turn out that disappearing clusters create more headaches than currently anticipated, and in this scenario we could respond with having each cluster regularly update a heartbeat TXT field (say every 5 minutes), and on this timer also check all other clusters heart beats, if some are falling old then all the records related to that heartbeat's clusterID can be removed.

## Migration from single / multi cluster to OCM/ACM

Once this work has completed, it's possible that there is a migration path apparent from non-ocm to ocm without having to start from scratch, e.g. convert back to a single cluster > install OCM hub to that cluster > convert other clusters into spokes.

## Integration with local DNS servers (e.g. CoreDNS)

To further improve the simple demo of the MGC product, making the clusters work as nameservers removes the requirement for any DNS providers, and further simplifies the local development environment setup, so that the features of MGC can be explored without requiring excessive steps from the user.