-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: tls policy status to state of the world tasks #885
Conversation
controllers/state_of_the_world.go
Outdated
if errors.IsAlreadyExists(err) { | ||
// This error can happen when the operator is starting, and the create event for the topology has not being processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... We may expect more of these, and not only with this particular resource.
This is because, as a pattern, we check the presence of the resource in the topology, and not with the API server. But more importantly, because, even though it's "state of the world" reconciliation and therefore the controller already knows about all resources that exist (because it lists them all), the library watching for changes to the cache writes to the subscription channel resource by resource.
I think we should consider fixing this in the Policy Machinery instead, because now it may be defeating the purpose of "state of the world". We may be able to make the call to cache.Replace
truly atomic – assuming it won't break with the granularity of multiple event types. I have a couple of ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Kuadrant/policy-machinery#35 may have fixed this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still happens with the same error:
{"level":"error","ts":"2024-10-02T08:56:41+01:00","logger":"kuadrant-operator","msg":"reconciliation error","error":"configmaps \"topology\" already exists","stacktrace":"github.com/kuadrant/policy-machinery/controller.(*Controller).propagate\n\t/Users/chfan/dev/kuadrant-operator/vendor/github.com/kuadrant/policy-machinery/controller/controller.go:262\ngithub.com/kuadrant/policy-machinery/controller.(*Controller).subscribe.func2\n\t/Users/chfan/dev/kuadrant-operator/vendor/github.com/kuadrant/policy-machinery/controller/controller.go:312"}
{"level":"info","ts":"2024-10-02T08:56:41+01:00","logger":"kuadrant-operator.controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-10-02T08:56:41+01:00","logger":"kuadrant-operator","msg":"starting server","name":"health probe","addr":"[::]:8081"}
What looks to happens from debugging is that this code is ran on boot even when there has been no events and when the topology has been loaded with the cluster data yet 🤔
My guess is that since this is a task with no event matchers, the initial creation of the empty cache store is an event that causes these task to run 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initial creation of the empty cache store is an event that causes these task to run
Oh! This makes total sense and explains why it's triggering before the controller finishes starting. By simply creating the cache we may be generating events already. Nice catch, @KevFan!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4c5cef5
to
84f8a86
Compare
controller.WithPredicates(ctrlruntimepredicate.TypedFuncs[*certmanagerv1.Certificate]{ | ||
CreateFunc: func(e event.TypedCreateEvent[*certmanagerv1.Certificate]) bool { | ||
return isCertificateOwnedByTLSPolicy(e.Object) | ||
}, | ||
UpdateFunc: func(e event.TypedUpdateEvent[*certmanagerv1.Certificate]) bool { | ||
return isCertificateOwnedByTLSPolicy(e.ObjectNew) | ||
}, | ||
DeleteFunc: func(e event.TypedDeleteEvent[*certmanagerv1.Certificate]) bool { | ||
return isCertificateOwnedByTLSPolicy(e.Object) | ||
}, | ||
GenericFunc: func(e event.TypedGenericEvent[*certmanagerv1.Certificate]) bool { | ||
return isCertificateOwnedByTLSPolicy(e.Object) | ||
}, | ||
})), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach. I wonder how it performs compared to labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I went with ownerReference instead since we already set the ownerRef to point to TLSPolicy
since we want the associated Certificates
deleted when the TLSPolicy
is deleted.
Also we don't currently set a common label across all created Certificate
's
kuadrant-operator/controllers/tlspolicy_certmanager_certificates.go
Lines 161 to 184 in 4793dc4
func commonTLSCertificateLabels(gwKey client.ObjectKey, p *v1alpha1.TLSPolicy) map[string]string { | |
common := map[string]string{} | |
for k, v := range policyTLSCertificateLabels(p) { | |
common[k] = v | |
} | |
for k, v := range gatewayTLSCertificateLabels(gwKey) { | |
common[k] = v | |
} | |
return common | |
} | |
func policyTLSCertificateLabels(p *v1alpha1.TLSPolicy) map[string]string { | |
return map[string]string{ | |
p.DirectReferenceAnnotationName(): p.Name, | |
fmt.Sprintf("%s-namespace", p.DirectReferenceAnnotationName()): p.Namespace, | |
} | |
} | |
func gatewayTLSCertificateLabels(gwKey client.ObjectKey) map[string]string { | |
return map[string]string{ | |
"gateway-namespace": gwKey.Namespace, | |
"gateway": gwKey.Name, | |
} | |
} |
c83f916
to
5e2d529
Compare
controllers/tlspolicies_validator.go
Outdated
// TODO: This should be only one target ref for now, but what should happen if multiple target refs is supported in the future? | ||
targetRefs := policy.GetTargetRefs() | ||
for _, targetRef := range targetRefs { | ||
// Find gateway defined by target ref | ||
_, ok := lo.Find(gws, func(item *machinery.Gateway) bool { | ||
if item.GetName() == targetRef.GetName() && item.GetNamespace() == targetRef.GetNamespace() { | ||
return true | ||
} | ||
return false | ||
}) | ||
|
||
// Can't find gateway target ref | ||
if !ok { | ||
logger.V(1).Info("tls policy cannot find target ref", "name", policy.Name, "namespace", policy.Namespace) | ||
s.Store(TLSPolicyAcceptedKey(policy.GetUID()), kuadrant.NewErrTargetNotFound(policy.Kind(), policy.GetTargetRef(), apierrors.NewNotFound(kuadrantv1alpha1.TLSPoliciesResource.GroupResource(), policy.GetName()))) | ||
continue | ||
} | ||
|
||
logger.V(1).Info("tls policy found target ref", "name", policy.Name, "namespace", policy.Namespace) | ||
s.Store(TLSPolicyAcceptedKey(policy.GetUID()), nil) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, we may not need to find the object the policy points to.
Because policies are linked to their targets already, a policy p
where len(p.GetTargetRefs()) > 0 && len(topology.Targetables().Children(p)) == 0
cannot be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, what I'm suggesting won't give enough details to report in the message which target cannot be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, yeah that's a nicer way of knowing whether a policy can't find their targetRef, I'll update 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(p.GetTargetRefs()) != len(topology.Targetables().Children(p))
is probably what we expect instead? 🤔
I think this is fine as we currently don't support multiple target refs but this may not work if we want to report which is not found as you've said, and also not sure what the policy status should be in the case where at least one target ref in the list is not found 🤔
if err := t.isCertificatesReady(ctx, tlsPolicy, topology); err != nil { | ||
return kuadrant.EnforcedCondition(tlsPolicy, kuadrant.NewErrUnknown(tlsPolicy.Kind(), err), false) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLS is an interesting case. If the gateway is not Programmed, it doesn't mean we cannot provision the certificates for its listeners. Is that why you didn't include the status of the gateway here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, a gateway can transition to a Programmed
state from the certs created by the current tls policy controller.
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
…found logic Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
|
||
store, ok := s.Load(TLSPolicyAcceptedKey) | ||
if !ok { | ||
logger.Error(errors.New("TLSPolicyAcceptedKey not found, skipping update of tls policy statuses"), "sync map error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok == false
should never happen, given both reconcilers subscribe to the exact same event matchers, unless the validator panics half-way. Good call logging the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this should never happen. This can also happen if there is a difference in the subscriptions between the validator and status updater in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how I'm solving for RateLimitPolicy the issue of updating the status after having previously validated the policy vs. when the policy validator is skipped but the status updater still has to run (e.g. due to different event matchers):
policyAcceptedFunc := rateLimitPolicyAcceptedStatusFunc(state) |
} | ||
|
||
// Nothing to do | ||
equalStatus := equality.Semantic.DeepEqual(newStatus, policy.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful to never allow unexported fields in the TLSPolicyStatus
type, otherwise this line will start panicking.
Signed-off-by: KevFan <[email protected]>
Signed-off-by: KevFan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested it, but the code LGTM.
I've left a comment about a pattern for picking up state from previous tasks vs falling back to the topology state, but I'd recommend considering another PR for it.
Description
Closes: #824
MissingDependency
status of TLS Policy for if CertManager is not installedVerification
MissingDependency
TLSPolicy status