Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Worker Deployments 3.1 #1814

Merged

Conversation

antlai-temporal
Copy link
Contributor

@antlai-temporal antlai-temporal commented Feb 11, 2025

What was changed

This PR introduces Worker Deployments using the versioning 3.1 API

Systems tests need the new CLI to pass but they are almost complete.

Checklist

  1. Closes
    Rename Worker Deployment API interfaces #1778 Add ramp to Worker Deployments #1777
  2. How was this tested:

unit tests, system tests...

type DeploymentReachability = internal.DeploymentReachability

const (
// DeploymentReachabilityUnspecified - Reachability level not specified.
// NOTE: Experimental
// Deprecated: Use [WorkerDeploymentVersionDrainageStatus]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these APIs released as Public Preview? I am just wondering why we are deprecating instead of just removing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only pre-release. I think the plan was to give a heads up with one release deprecated and remove them in the next one for public preview.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing with @ShahabT it seems that no customers have been enabled in Cloud, and it has been out for just 2 months. If you feel strongly about removing it, I can bring back the issue with the team...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if they stay for a short bit, but yes I think we should consider removing soon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as long as we aren't compromising on the API names because the best names are taken by the old API I am fine to leave the old API for a short bit. Can we commit to removing the old API for the next minor release after the new API is released? so v1.34.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. The prefix Worker_Deployment vs Deployment seems to work fine for the API names, and avoids the confusion with other deployments...

@@ -96,6 +97,10 @@ func (ntp *nexusTaskPoller) poll(ctx context.Context) (taskForWorker, error) {
UseVersioning: ntp.useBuildIDVersioning,
DeploymentSeriesName: ntp.deploymentSeriesName,
},
DeploymentOptions: workerDeploymentOptionsToProto(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking deployments do work for Nexus tasks right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should dispatch nexus tasks based on current/ramping versions. No testing so far though...

@antlai-temporal antlai-temporal marked this pull request as ready for review February 14, 2025 10:36
@antlai-temporal antlai-temporal requested a review from a team as a code owner February 14, 2025 10:36
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the public API, most seems fine, only minor comments

type DeploymentReachability = internal.DeploymentReachability

const (
// DeploymentReachabilityUnspecified - Reachability level not specified.
// NOTE: Experimental
// Deprecated: Use [WorkerDeploymentVersionDrainageStatus]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if they stay for a short bit, but yes I think we should consider removing soon

internal/worker_deployment_client.go Outdated Show resolved Hide resolved
// NOTE: Experimental
//
// Exposed as: [go.temporal.io/sdk/client.WorkerDeploymentConflictToken]
WorkerDeploymentConflictToken struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, not too sure there is value in a struct wrapper for this byte slice. Why not just pass the byte slice around (it's normal for a zero-len (aka nil) byte slice to be the equivalent of unset, you don't need struct wrapper + pointer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this discussion last time, but here token is optional, and that forced me to use a pointer. So I think you are right here, moving to empty slice as nil will simplify the API...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional byte slice in Go is very normal and is nil/empty (but you use operations like len == 0 regardless to determine unset)

// with that name in this namespace, methods like WorkerDeploymentHandle.Describe()
// will return an error.
// NOTE: Experimental
GetHandle(ctx context.Context, name string) WorkerDeploymentHandle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a worker deployment created? If it's done lazily on one of the setters, I am concerned about how conflict version works because I am not sure how a user can say "create only if not created" in a race-safe way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SDK does not create deployments, the server creates them as a side-effect of seeing pollers annotated for a new deployment when they poll. GetHandle does not do much, just local...

Copy link
Member

@cretz cretz Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can you confirm every call on a WorkflowDeploymentHandle for a workflow deployment that does not exist returns a not-found? Meaning none of these calls work without a server having implicitly created such a deployment? Is it not a bit confusing that you cannot do some worker deployment preparation/configuration without having started a poller on it first? That's a really confusing way to create a worker deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't like the "eventually consistent" approach for deployments/versions since poller existence (or not) changes policy, and before it was independent of pollers, but that's fundamental to how versioning 3.1 works... On the positive side, it can be easier to use, just annotate workers and server does the rest...

internal/worker_deployment_client.go Outdated Show resolved Hide resolved
@@ -87,6 +87,9 @@ jobs:
- name: Docker compose - integration tests
if: ${{ matrix.testDockerCompose }}
run: go run . integration-test
env:
# TODO(antlai-temporal): Remove this flag once server 1.27 released.
DISABLE_WORKER_DEPLOYMENT_TESTS: "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DISABLE_WORKER_DEPLOYMENT_TESTS: "1"
DISABLE_SERVER_1_27_TESTS: "1"

@rodrigozhou will also need this flag to test attaching nexus callbacks I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the flag

@antlai-temporal antlai-temporal merged commit 7613853 into temporalio:versioning-3.1 Feb 18, 2025
3 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants