Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scaler for temporal #6191

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

add scaler for temporal #6191

wants to merge 26 commits into from

Conversation

Prajithp
Copy link

@Prajithp Prajithp commented Sep 26, 2024

Implement a temporal scaler

Checklist

Fixes #4724

Signed-off-by: Prajithp <[email protected]>
@Prajithp Prajithp requested a review from a team as a code owner September 26, 2024 08:17
@febinct febinct mentioned this pull request Sep 26, 2024
7 tasks
@cretz
Copy link

cretz commented Sep 26, 2024

See comment at temporalio/temporal#33 (comment). Temporal's KEDA approach may slightly differ. Will have the engineers review, but we may suggest slight differences.

@febinct
Copy link

febinct commented Sep 30, 2024

any suggestion/comments @cretz

@robholland
Copy link

We're currently discussing which use cases we would like the scaler to support, we'll be in a position to give some feedback/direction on Friday 4th.

@jhecking
Copy link

jhecking commented Oct 4, 2024

We rolled out this new scaler to one of our dev clusters. (Rebased on top of the v2.15 release branch.) Activation/deactivation is working as expected. BUT, what I'm seeing is that the kena-operator pod is eating up all its allocated CPU when the temporal trigger is active. When I pause the scaledObject with the temporal trigger, then the CPU utilisation goes back to near zero. There are several other scaledObjects with prometheus triggers which don't cause this problem.

I don't see anything relevant in the keda-operator logs, even on DEBUG log level. I enabled profiling and this is the flame graph I see when this is happening:

Screenshot 2024-10-04 at 5 56 47 PM

[go tool pprof -http=:8081 "http://localhost:8082/debug/pprof/profile?seconds=60"]

For reference, here is a "normal" flame graph when all the Temporal triggers are paused:

Screenshot 2024-10-04 at 5 57 31 PM

One detail that might be relevant is that keda is connecting to the Temporal server via our Consul service mesh, i.e. there is a consul proxy injected into the keda-operator pod and the Temporal scaler is configured to connect to localhost:7233. But Keda is able to connect to the Temporal server, i.e. there are no connection errors. And we use this same configuration for all the Temporal worker services in the same cluster that Keda is supposed to scale, and none of them show this same behaviour.

I'm a bit at a loss as to how to debug this further. Any suggestions?

tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
DeleteKubernetesResources(t, testNamespace, data, templates)
}

func testActivation(t *testing.T, kc *kubernetes.Clientset, data templateData) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on some test cases here, but I suspect that the Temporal scaler is not going to be able to provide accurate activation (specifically 1->0). It does not have sight of activity levels that are low enough not to cause a backlog. We may need to couple the Temporal scaler with the prometheus scaler for the activation aspect for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add any other test cases? @robholland

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll need a build ID test, where a new deployment comes up with a new build ID and the two deployments are shown to scale separately (older one down as tasks on old version complete, newer one up as they start on that queue). I'm also not convinced that activation based on backlog is safe when scaling to 0, it won't account for tasks that are in-flight or low enough throughput to not cause a backlog. Scaling to 0 and then having to scale back up to clear the queue when things get killed and re-scheduled isn't ideal. Not sure yet the best way to test that, or whether we should just remove the activation altogether and recommend users couple the Temporal scaler with a Prometheus scaler, where Prometheus handles the activation and Temporal the scaling. Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding activation based on backlog, I believe we should keep this option rather than removing it, as it can be partially controlled by customising scale-down behaviour in the HPA. This feature would be useful for workloads with very short execution times. @robholland

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an option in omes to specify a build ID for the test cases? @robholland

Copy link

@robholland robholland Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the new worker versioning system you don't update a deployment to a new build ID, rather you create a new deployment for the new build ID. The two deployments will both be present until the workflows on the older build ID have completed, then you can tear the old one down. So we'd have a scaled object for each deployment/build id.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhecking this is the use case we are currently exploring, I'll be investigating prometheus metrics we might be able to use to tide us over until we can provide some equivalent of tasks_dispatch_rate that covers in-flight (if that's going to be possible).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland added a test case for BuildID. Can you please verify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland

I'll be investigating prometheus metrics we might be able to use to tide us over until we can provide some equivalent of tasks_dispatch_rate that covers in-flight (if that's going to be possible).

I know you've been out for a week, but was wondering whether you've had any time to look into this? Maybe I'm missing something, but wouldn't tasks_dispatch_rate also only be useful for 1->0 scaling, but not for 0->1 scaling? I.e. if all the workers have been shut down, then no new tasks can be dispatched, so the task dispatch rate cannot go back above zero.

I still think a variant of the approximate_backlog_count metric that includes both pending and in-flight tasks would be best suited for 1->0 and 0->1 scaling.

I'm new to KEDA myself. Is it possible to use different metrics for 1->0/0->1 and 1->n/n->1 scaling decisions? Maybe by using two separate triggers?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes task dispatch rate would only be useful for de-activation (but not reliable, due to it's caveats as discussed).. approximate_backlog_count should be useable for activation. You can mix different triggers together, I believe you'd use https://keda.sh/docs/2.14/reference/scaledobject-spec/#scalingmodifiers for that.

pkg/scalers/temporal.go Show resolved Hide resolved
pkg/scalers/temporal.go Outdated Show resolved Hide resolved
pkg/scalers/temporal.go Outdated Show resolved Hide resolved
pkg/scalers/temporal.go Show resolved Hide resolved
@jhecking
Copy link

jhecking commented Oct 8, 2024

The issue with the high CPU utilization only happens when Keda is connecting to the Temporal server via the Consul sidecar proxy. In another cluster, without Consul, the Temporal scaler is running fine and Keda is using minimal CPU.

When the Consul sidecar is injected, the proxy itself shows minimal CPU usage; only the keda-operator container itself uses up all its allocated CPU resources. Neither the Consul logs nor the Keda logs show any connection errors. Also, Keda and the Temporal scaler are able to connect to the Temporal server and are activating/deactivating the targeted pod based on those metrics.

But one other thing I could observe is that at times the Temporal scaler's GetMetricsAndActivity is being called much more frequently than every 30 seconds, e.g.:

2024-10-08T08:54:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:54:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:55:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:55:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:56:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:56:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:57:57Z    INFO    scaleexecutor    Successfully updated ScaleTarget    {"scaledobject.Name": "connector-mysql-2d9c6761", "scaledObject.Namespace": "borneo", "scaleTarget.Name": "connector-mysql-2d9c6761", "Original Replicas Count": 0, "New Replicas Count": 1}
2024-10-08T08:57:58Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:13Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:28Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:43Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:57Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:58:58Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:59:13Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}
2024-10-08T08:59:27Z    INFO    temporal_scaler    Getting Temporal queue size    {"type": "ScaledObject", "namespace": "borneo", "name": "connector-mysql-2d9c6761", "queueName": "connector/devdp-breo/2d9c6761-7c8c-4958-8d94-767064f873f5"}

Before the scaleexecutor event above Keda is already running at 100% CPU, but the Temporal scaler only gets queried every 30 seconds, as expected. After that event, the scaler is being queried much more frequently than once every 30 seconds. (I added the "Getting Temporal queue size" log in the scaler's getQueueSize function, right before the s.tcl.DescribeTaskQueueEnhanced call.)

@jhecking
Copy link

jhecking commented Oct 9, 2024

I sniffed the network traffic between the Keda operator and the Consul proxy today, and that finally offered a clue as to the cause of the high CPU utilisation, and – more importantly – a solution. So in order to route the queries from the Temporal scaler to the Temporal server via the Consul proxy, I had configured the endpoint for the scaler to localhost:7233.

Somehow, and I don't quite understand this yet, this is causing Keda to try to connect to the proxy on port 7233 using both 127.0.0.1 as well as using the IPv6 address ::1. The IPv4 connections are successful, but the IPv6 connections are getting rejected by the proxy. What's worse, Keda sends these IPv6 connection attempts at a rate of more than 1M connection attempts in 60 seconds, i.e. ~16,666 connection attempts per second. That's what's causing the high CPU load. It doesn't seem to give up on this nor back off.

Changing the endpoint config for the scaler from localhost:7233 to 127.0.0.1:7233 has resolved the issue completely. Keda has been running stable, and working as expected, with half a dozen scaledObjects with Temporal triggers since this simple config change.

pkg/scalers/temporal.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
APIKey string `keda:"name=apiKey, order=authParams;resolvedEnv;triggerMetadata, optional"`

UnsafeSsl bool `keda:"name=unsafeSsl, order=triggerMetadata, optional"`
Cert string `keda:"name=cert, order=authParams;resolvedEnv, optional"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more standard in Temporal deployments to have env vars that point to files for the key/cert rather than have them present in the environment directly. It would be good to have CertPath/KeyPath/CaPath or similar.

Copy link
Author

@Prajithp Prajithp Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robholland That's currently not possible with KEDA. KEDA can only read from environment variables or secrets. Ideally, in k8s, certificates and keys are stored in secrets, so this can be easily managed using TriggerAuthentication. @JorTurFer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file paths are normally pointed to certs that are mounted into the pods from secrets, so presumably users can just point at the secrets directly for KEDA use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be done with TriggerAuthentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we must read them from Kubernetes API and not from file. This is because we can't restart KEDA to include new certs. Take into account that KEDA provides a self-service approach where admins can set up KEDA and users can deploy their own resources, so assuming that certs are part of the KEDA's filesystem isn't an option. If the SDK only support reading them from files, the scaler must pull the secret and store in a temporal file within the container

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our mTLS client certificates and keys auto rotate, live on the filesystem, at least every hour. Our worker code handles the rotation without restarting.

I think any code using mTLS to connect to Temporal needs to be able to handle certificates and keys that are rotated frequently. Getting the certificates and keys from the filesystem seems natural and normal, if there is a way for KEDA to support that it would be good.

We use a sidecar to write/rotate the certificates and keys. But there are other Kubernetes systems for mTLS certificate management that also use filesystems and not Kubernetes Secret resources, like cert-manager csi-driver-spiffe.

pkg/scalers/temporal.go Outdated Show resolved Hide resolved
pkg/scalers/temporal.go Outdated Show resolved Hide resolved
pkg/scalers/temporal.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
tests/scalers/temporal/temporal_test.go Outdated Show resolved Hide resolved
@JorTurFer
Copy link
Member

JorTurFer commented Oct 16, 2024

/run-e2e temporal
Update: You can check the progress here

Prajithp and others added 3 commits October 17, 2024 08:24
Co-authored-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Prajith <[email protected]>
Co-authored-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Prajith <[email protected]>
Co-authored-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Prajith <[email protected]>
@Prajithp
Copy link
Author

/run-e2e temporal Update: You can check the progress here

@JorTurFer build is failing. Should I include the vendor directory?

@Prajithp
Copy link
Author

@robholland Can you please review?

@robholland
Copy link

On this again today, please can you go mod tidy, go mod vendor and commit those so that the KEDA CI system will run.

Signed-off-by: Prajithp <[email protected]>
@Prajithp
Copy link
Author

Prajithp commented Nov 1, 2024

On this again today, please can you go mod tidy, go mod vendor and commit those so that the KEDA CI system will run.

Done @robholland

@JorTurFer Can you please trigger CI?

@Prajithp Prajithp marked this pull request as ready for review November 1, 2024 09:05
t.Log("--- testing worker versioning ---")

data.BuildID = "1.1.1"
KubectlApplyWithTemplate(t, data, "jobUpdateBuildID", jobUpdateBuildIDTemplate)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap these in a helper so the intent is clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the upgrade step?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to reduce cognitive load for something which isn't part of the what we want to test.

tests/scalers/temporal/temporal_test.go Show resolved Hide resolved
Prajithp and others added 2 commits November 1, 2024 15:07
Co-authored-by: Rob Holland <[email protected]>
Signed-off-by: Prajith <[email protected]>
Co-authored-by: Rob Holland <[email protected]>
Signed-off-by: Prajith <[email protected]>
@robholland
Copy link

The scale out test fails for me locally:

    temporal_test.go:268:
        	Error Trace:	/Users/robholland/Developer/github.com/kedacore/keda/tests/scalers/temporal/temporal_test.go:268
        	            				/Users/robholland/Developer/github.com/kedacore/keda/tests/scalers/temporal/temporal_test.go:241
        	Error:      	Should be true
        	Test:       	TestTemporalScaler
        	Messages:   	replica count should be 1 after 3 minutes

Waiting to see how it fairs on CI.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 3, 2024

/run-e2e temporal
Update: You can check the progress here

@zroubalik zroubalik marked this pull request as draft November 5, 2024 21:08
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked this PR as draft, please change this once it is ready for a review from KEDA maintainers.

Thanks for the great collaboration everyone!

Signed-off-by: Prajithp <[email protected]>
@Prajithp
Copy link
Author

Prajithp commented Nov 9, 2024

The scale out test fails for me locally:

    temporal_test.go:268:
        	Error Trace:	/Users/robholland/Developer/github.com/kedacore/keda/tests/scalers/temporal/temporal_test.go:268
        	            				/Users/robholland/Developer/github.com/kedacore/keda/tests/scalers/temporal/temporal_test.go:241
        	Error:      	Should be true
        	Test:       	TestTemporalScaler
        	Messages:   	replica count should be 1 after 3 minutes

Waiting to see how it fairs on CI.

@robholland are we good with this? can we change the status to "Ready for Review" ?

Signed-off-by: Prajithp <[email protected]>
@robholland
Copy link

The tests are failing on CI, those should be fixed before its marked for review.

Signed-off-by: Prajithp <[email protected]>
Signed-off-by: Prajithp <[email protected]>
Signed-off-by: Prajithp <[email protected]>
@Prajithp
Copy link
Author

The tests are failing on CI, those should be fixed before its marked for review.

It should be fine now.

@robholland
Copy link

Cool, @Prajithp wanna mark as ready for review now?

@Prajithp
Copy link
Author

Cool, @Prajithp wanna mark as ready for review now?

Yes. Can you please review? @jhecking @zroubalik

@jhecking
Copy link

Yes. Can you please review?

We might eventually need the ability to use other task queue stats besides ApproximateBacklogCount for scaling purposes. Ref. #6191 (comment) and prior discussion on the same thread. But I'd prefer to get this PR merged now and add support for additional metrics later as and when needed.

Other than that, I don't have any further input on this PR. We are successfully using an earlier version of this branch in several production clusters to scale Temporal-based services down to zero. For 1->n scaling we mostly still rely on Prometheus metrics published by the Temporal workers. We are combining the two metrics using scaling modifiers, as was suggested by Rob. (Thanks, @robholland, for that suggestion, btw!)

@Prajithp Prajithp marked this pull request as ready for review November 18, 2024 04:38
@Prajithp
Copy link
Author

@zroubalik Can you please review this?

@black-snow
Copy link

Is there anything to help with? We're very interested in this :)

@robholland
Copy link

@JorTurFer are you able to review this now please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Temporal.io Scaler
9 participants