Support scaling the entire database #1051

roypaulin · 2025-02-03T21:23:40Z

So far, we have been scaling based on a service name. Now we can consider the entire database for scaling.
Now if serviceName is empty all the pods will be selected, and when scaling is needed all the subclusters will be considered.

HaoYang0000 · 2025-02-05T12:09:16Z

pkg/meta/labels.go

+	// Indicates if a pod belongs to the main cluster. It is mainly use as a selector in
+	// VAS to filter out sandbox pods.
+	IsSandboxLabel = "vertica.com/is-sandbox"
+	IsSandboxFalse = "false"


Is this label only used by VAAS? If we do sandboxing for a subcluster, should we change this flag to true?

You are correct, we need to change this to true for sandboxes.

cchen-vertica · 2025-02-06T22:42:44Z

changes/unreleased/Fixed-20250203-220428.yaml

@@ -0,0 +1,5 @@
+kind: Fixed
+body: Routing traffic to a sandbox pod after restart


This is for your previous PR?

Yes. I forgot to add a changie entry.

cchen-vertica · 2025-02-06T22:44:23Z

pkg/controllers/vas/scaledown_reconciler.go

@@ -78,7 +78,7 @@ func (s *ScaledownReconciler) Reconcile(ctx context.Context, req *ctrl.Request)
 			s.Log.Info("Metric's value is lower than the scale-down threshold.", "metric", mStatus.name)
 			newMinReplicas = *s.Vas.Spec.CustomAutoscaler.MinReplicas
 		} else {
-			newMinReplicas = s.Vas.Spec.TargetSize
+			newMinReplicas = s.Vas.Status.CurrentSize


What's this change for?

currentSize represents the number of existing pods/replicas, targetSize is the desired state. In this case we want minReplicas to be equal to the current number of replicas so we do not trigger scale down. currentSize is more accurate.

cchen-vertica · 2025-02-06T22:48:55Z

pkg/controllers/vas/subclusterscale_reconciler.go

+					// We will prevent removing a primary if it will lead to a kasety
+					// rule violation.
+					if primaryCountAfterScaling < minHosts {
+						continue


Log a message here

cchen-vertica · 2025-02-06T22:52:14Z

tests/e2e-leg-1/autoscale-db-by-subcluster/30-assert.yaml

+apiVersion: v1
+kind: Service
+metadata:
+  name: v-autoscale-db-by-subcluster-as


If we are using the template, all the new subclusters will be put behind one service?

Only if you set the serviceName in the template. If you don't each subcluster will have its own service whose name is the SC name.

cchen-vertica · 2025-02-06T23:03:50Z

tests/e2e-leg-1/autoscale-db-by-subcluster/50-remove-last-scaled-subcluster.yaml

+apiVersion: kuttl.dev/v1beta1
+kind: TestStep
+commands:
+  - script: kubectl scale -n $NAMESPACE verticaautoscaler/v-autoscale-db --replicas 0


Since ksafety is not 0, the primary subcluster will not be removed even we set "--replicas 0", right? If we don't violate ksafety check, will scale-down remove the existing subclusters? For example, we have pri1 and sec1 which are created manually by the customers and we set "--replicas 0", a. if ksafety is 0, will both subclusters be removed? b. if ksafety is not 0, will sec1 be removed?

First of all, kubectl scale does manual scaling, it is just for testing, to trigger the scaling logic.
That being said, if target size is lower than currentsize, then yes, we will remove sec1 (providing -1*(target-curentSize) is <= sec1 size).
scaling to zero will not be possible because it will violates ksafety and minReplicas min value is 1.
The goal of autoscaling is to only keep resources that we need.

roypaulin added 3 commits February 3, 2025 21:43

Consider all the nodes for scaling

7614d12

Update auto-scaling test

1087e13

Add changie entry

c3aabe9

roypaulin requested review from cchen-vertica, fenic-fawkes, HaoYang0000, qindotguan and LiboYu2 as code owners February 3, 2025 21:23

Fix e2e tests

eef2649

HaoYang0000 reviewed Feb 5, 2025

View reviewed changes

roypaulin added 2 commits February 5, 2025 19:16

Address comments

7056935

Merge remote-tracking branch 'origin/main' into roypaulin/vas2

90c9593

cchen-vertica reviewed Feb 6, 2025

View reviewed changes

Add log msg

3e05a9e

cchen-vertica approved these changes Feb 7, 2025

View reviewed changes

roypaulin merged commit 142e6c1 into main Feb 7, 2025
41 checks passed

roypaulin deleted the roypaulin/vas2 branch February 7, 2025 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support scaling the entire database #1051

Support scaling the entire database #1051

roypaulin commented Feb 3, 2025

HaoYang0000 Feb 5, 2025

roypaulin Feb 5, 2025

cchen-vertica Feb 6, 2025

roypaulin Feb 7, 2025

cchen-vertica Feb 6, 2025

roypaulin Feb 7, 2025

cchen-vertica Feb 6, 2025

cchen-vertica Feb 6, 2025

roypaulin Feb 7, 2025

cchen-vertica Feb 6, 2025

roypaulin Feb 7, 2025 •

edited

Loading

		@@ -0,0 +1,5 @@
		kind: Fixed
		body: Routing traffic to a sandbox pod after restart

Support scaling the entire database #1051

Support scaling the entire database #1051

Conversation

roypaulin commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roypaulin Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

roypaulin Feb 7, 2025 •

edited

Loading