-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reinitalise targets when parsing host list #313
Conversation
go/vt/vtgateproxy/discovery.go
Outdated
contentsChanged, err := b.parse() | ||
if err != nil || !contentsChanged { | ||
if err != nil { | ||
log.Error(err) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be logging these errors, not sure if we care about incrementing unchangedCount
in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to log once a second if there's a permanent error in the file on disk. Can we slow it down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I had some logging in an earlier iteration and removed it because I was worried about spam volume.
I think we should set a formatError=true
bit the first time we detect a parse error, and only log once when we enter that state, then when we properly parse the file, clear the bit so we log the next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch. Just a concern about logging volume.
go/vt/vtgateproxy/discovery.go
Outdated
contentsChanged, err := b.parse() | ||
if err != nil || !contentsChanged { | ||
if err != nil { | ||
log.Error(err) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to log once a second if there's a permanent error in the file on disk. Can we slow it down?
go/vt/vtgateproxy/discovery.go
Outdated
affinityCount.Add("local", local) | ||
affinityCount.Add("remote", remote) | ||
poolTypeCount.Add(r.poolType, int64(len(targets))) | ||
affinityCount.Set("local", local) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As-is this isn't quite accurate, since the set is already filtered by target by this point. We could move the
I realize counters aren't quiite right either, but using gauges across the fleet in prometheus isn't great for aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we could do is pivot the model so that instead of counting whenever we update a target, we could instead use something like stats.NewGaugesFuncWithMultiLabels
so that we generate the stats on demand regardless of whether or not any clients are connected.
I think that would be a better model frankly, since it would exercise all the parsing code and we could verify the discovery layer is running as we expect before sending any traffic through the system.
We could also wire in a /debug/status page so we can see the full state as well.
High level -- great find! I feel somewhat sheepish I did it this way in the first place. My .02 is that we should separate out the fix for the issue from the other changes to the logging / metrics, merge and deploy the fix ASAP, and then work on the observability separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 :)
* reinitalise targets when parsing host list * remove metrics and logging changes
* reinitalise targets when parsing host list * remove metrics and logging changes
* reinitalise targets when parsing host list * remove metrics and logging changes
* reinitalise targets when parsing host list * remove metrics and logging changes
Description
Fix an issue with discovery where we're appending to the list of targets but never reinitialising it. Dead hosts are never removed from the list and active ones are added multiple times.
Change metric types of active hosts from counters to gauges, we should only be tracking the current number not incrementing.I have tested this out with a local build and discovery is working correctly when the host list changes