reinitalise targets when parsing host list #313

dedelala · 2024-04-23T03:36:57Z

Description

Fix an issue with discovery where we're appending to the list of targets but never reinitialising it. Dead hosts are never removed from the list and active ones are added multiple times.

~~Change metric types of active hosts from counters to gauges, we should only be tracking the current number not incrementing.~~

I have tested this out with a local build and discovery is working correctly when the host list changes

dedelala · 2024-04-23T03:37:56Z

go/vt/vtgateproxy/discovery.go

 			contentsChanged, err := b.parse()
-			if err != nil || !contentsChanged {
+			if err != nil {
+				log.Error(err)
+				continue
+			}


we should be logging these errors, not sure if we care about incrementing unchangedCount in this case

This is going to log once a second if there's a permanent error in the file on disk. Can we slow it down?

Yeah - I had some logging in an earlier iteration and removed it because I was worried about spam volume.

I think we should set a formatError=true bit the first time we detect a parse error, and only log once when we enter that state, then when we properly parse the file, clear the bit so we log the next time.

henryr

This is a great catch. Just a concern about logging volume.

henryr · 2024-04-23T09:21:28Z

go/vt/vtgateproxy/discovery.go

 			contentsChanged, err := b.parse()
-			if err != nil || !contentsChanged {
+			if err != nil {
+				log.Error(err)
+				continue
+			}


This is going to log once a second if there's a permanent error in the file on disk. Can we slow it down?

demmer · 2024-04-23T14:52:08Z

go/vt/vtgateproxy/discovery.go

-	affinityCount.Add("local", local)
-	affinityCount.Add("remote", remote)
-	poolTypeCount.Add(r.poolType, int64(len(targets)))
+	affinityCount.Set("local", local)


As-is this isn't quite accurate, since the set is already filtered by target by this point. We could move the

I realize counters aren't quiite right either, but using gauges across the fleet in prometheus isn't great for aggregations.

One thing we could do is pivot the model so that instead of counting whenever we update a target, we could instead use something like stats.NewGaugesFuncWithMultiLabels so that we generate the stats on demand regardless of whether or not any clients are connected.

I think that would be a better model frankly, since it would exercise all the parsing code and we could verify the discovery layer is running as we expect before sending any traffic through the system.

We could also wire in a /debug/status page so we can see the full state as well.

demmer · 2024-04-23T14:56:57Z

High level -- great find! I feel somewhat sheepish I did it this way in the first place.

My .02 is that we should separate out the fix for the issue from the other changes to the logging / metrics, merge and deploy the fix ASAP, and then work on the observability separately.

demmer

Strong +1 :)

* reinitalise targets when parsing host list * remove metrics and logging changes

reinitalise targets when parsing host list

17cbc51

dedelala commented Apr 23, 2024

View reviewed changes

henryr approved these changes Apr 23, 2024

View reviewed changes

demmer reviewed Apr 23, 2024

View reviewed changes

remove metrics and logging changes

c3184b9

demmer approved these changes Apr 23, 2024

View reviewed changes

dedelala merged commit 11ccb3a into vtgateproxy Apr 23, 2024
152 of 241 checks passed

dedelala deleted the esme-vtgateproxy-reinit-targets branch April 23, 2024 23:49

dedelala added a commit that referenced this pull request Jul 30, 2024

reinitalise targets when parsing host list (#313)

d4da5e7

* reinitalise targets when parsing host list * remove metrics and logging changes

dedelala added a commit that referenced this pull request Nov 12, 2024

reinitalise targets when parsing host list (#313)

3d7dee5

* reinitalise targets when parsing host list * remove metrics and logging changes

dedelala added a commit that referenced this pull request Jan 8, 2025

reinitalise targets when parsing host list (#313)

2de2a67

* reinitalise targets when parsing host list * remove metrics and logging changes

dedelala added a commit that referenced this pull request Jan 29, 2025

reinitalise targets when parsing host list (#313)

3fde9f5

* reinitalise targets when parsing host list * remove metrics and logging changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinitalise targets when parsing host list #313

reinitalise targets when parsing host list #313

dedelala commented Apr 23, 2024 •

edited

Loading

dedelala Apr 23, 2024

henryr Apr 23, 2024

demmer Apr 23, 2024

henryr left a comment

henryr Apr 23, 2024

demmer Apr 23, 2024

demmer Apr 23, 2024

demmer commented Apr 23, 2024

demmer left a comment

reinitalise targets when parsing host list #313

reinitalise targets when parsing host list #313

Conversation

dedelala commented Apr 23, 2024 • edited Loading

Description

dedelala Apr 23, 2024

Choose a reason for hiding this comment

henryr Apr 23, 2024

Choose a reason for hiding this comment

demmer Apr 23, 2024

Choose a reason for hiding this comment

henryr left a comment

Choose a reason for hiding this comment

henryr Apr 23, 2024

Choose a reason for hiding this comment

demmer Apr 23, 2024

Choose a reason for hiding this comment

demmer Apr 23, 2024

Choose a reason for hiding this comment

demmer commented Apr 23, 2024

demmer left a comment

Choose a reason for hiding this comment

dedelala commented Apr 23, 2024 •

edited

Loading