Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invert the service discovery #268

Merged
merged 16 commits into from
Mar 23, 2024
Merged

Conversation

jscheinblum
Copy link

@jscheinblum jscheinblum commented Mar 22, 2024

This simplifies the service discovery and also is a little more flexible/generic.

The idea here is that we have one required param ("target") that defines which vtgate pool we're filtering from, then we can filter on any other discovery attribute.

This changes the logic to filter in the name resolution side, where we now filter down to the smallest possible list in the name resolution stage, then round robin between them later in the gate.

The purpose of this is that we only establish connections to the hosts we pick in name resolution - previously what was happening was we'd resolve to any matching host, but then only pick from matching hosts. That mean a larger than expected number of TCP connections, even if only a few were ever selected.

The new flow is:

grpc outbound connection: vtgate://<pool>/?filters....
-> vtgate:// discovery for $pool == conn
  -> get all matching service discovery entries == subconns
    [ wait for subconns to become active, then create a new picker]
    -> picker round robins between subconns

When the service discovery list changes, a new set of subconns is picked and used in the next round of outbound requests once they're active

			if (Str\contains($dbname, 'vifl')) {
				$target = 'vifl';
			} else if (Str\contains($dbname, 'aux')) {
				$target = 'aux';
			} else {
				$target = 'web';
			}
			$opts = new \AsyncMysqlConnectionOptions();
			$opts->setConnectTimeout($timeout_micros);
			$opts->setConnectionAttributes(dict[
				'target' => $target,
				'filter_az_id' => 'use1-az2',
			]);

@@ -18,6 +18,7 @@ import (

var (
jsonDiscoveryConfig = flag.String("json_config", "", "json file describing the host list to use fot vitess://vtgate resolution")
numConnectionsInt = flag.Int("num_connections", 4, "number of outbound GPRC connections to maintain")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit -- we don't usually include the type, e.g.Int in var names


data, err := os.ReadFile(r.jsonPath)
if err != nil {
return nil, nil, err
}

err = json.Unmarshal(data, &config)
err = json.Unmarshal(data, &pairs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another naming nit -- I think pairs is misleading. Can we change to hostInfo or something like that?

subConnsByAZ map[string][]balancer.SubConn
nextByAZ sync.Map
next uint32
subConns []balancer.SubConn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - comment above needs to be changed

})
}

// Must filter by type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally fine for the Slack use case, but I'm not sure is something we'll want to always have in the code for when we upstream it.

More fundamentally, I feel like the code as-written is trying to shoehorn two different behaviors into the same filters list.

The type is, in fact, a filter, since if that's specified and there aren't enough hosts that match, we will fail to route.

The other field(s) are actually an affinity but we're of course willing to route across AZs if there aren't enough available. Also we don't actually have any use case for more than one affinity field, and I'm having a hard time finding any reason for anyone upstream to really want that.

So, my suggestion is that we actually make this less generic, and rewrite this so that instead of the combined filters list, we change the signature of the resolver to something where we take in explicit configuration of something like:

type JSONGateConfigResolver struct {
        // If a field name is specified then filter the set of hosts to only those that match 
        // the given target, e.g. if targetTypeField is set to "type", then a route target of
        // vtgate://foo only matches hosts with "type: foo" in the host list
	targetTypeField string

        // If a field name is specified (e.g. "az_id"), then prefer routing to hosts that match
        // the affinity expressed in the target, e.g. if targetAffinityField is set to "az_id", 
        // then a target of vtgate://foo?az_id=1a will prefer hosts with "az_id: 1a" in the
        // host list, but may route to others if there are not enough candidates 
	targetAffinityField  hostFilters

	target   resolver.Target
	cc       resolver.ClientConn
	ticker   *time.Ticker
	rand     *rand.Rand // safe for concurrent use.

I think not only the comments but the more explicit separation between the two would make a lot of sense to me.

fmt.Printf("----\n")

// Nothing in the filtered list? Get them all
if len(filteredAddrs) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... what if you want 4 connections but there are only 2 in the same AZ. Seems like you would want to make sure to include those 2, but my read of the code below would ignore that and just fall back to the full set.

Implementation-wise I kinda think we could make this all a bit simpler if we did an approach like what I did here https://github.com/slackhq/vitess/pull/41/files#diff-7bddb8b37e422b4f434e7a7e7e55ee318a195c1235ca3284c18d5d11a640f3a4R186

Basically, take the whole json list, and first filter it to do the type matching, since that (as said above) is a real filter.

Then shuffle the whole thing randomly.

Then do another pass where we take the shuffled list and boost the ones that match the AZ affinity to the front of the list.

Then grab the first N.

This way you handle all the cases a bit more elegantly I think. Yes, it's a 2nd pass on the list, but this is small and since we only do this for the first connection to a given target I think it should be fine. If this is actually done each time we dial then we have bigger problems.

@demmer demmer changed the title Invert the service discoveryu Invert the service discovery Mar 22, 2024
demmer added 9 commits March 22, 2024 14:22
Change all the config so that instead of hard coded constants we set the
various connection attributes, json field names, etc using command line flags.

Then make the pool type and affinity arguments more explicit and less generic.
@jscheinblum jscheinblum merged commit de9cbc1 into vtgateproxy Mar 23, 2024
1 check passed
@jscheinblum jscheinblum deleted the jas_switch_to_filtering_subconns branch March 23, 2024 00:02
dedelala pushed a commit that referenced this pull request Jul 30, 2024
* Draft: very messy and doesn't compile

* Simplifyy

* less log, plz

* simplify more

* Simplified by a lot - much simpler

now pick fewer addresses

* fixy

* Account for infinite

* copyright nonsense

* clean up debug logging

* round_robin works!

* use rw mutex to serialize creation

* rework the filtering to make everything parameterized and more explicit

Change all the config so that instead of hard coded constants we set the
various connection attributes, json field names, etc using command line flags.

Then make the pool type and affinity arguments more explicit and less generic.

* no longer needed

* update comments

* only pass through the URL params we need

* affinity is actually optional

---------

Co-authored-by: Michael Demmer <[email protected]>
dedelala pushed a commit that referenced this pull request Nov 12, 2024
* Draft: very messy and doesn't compile

* Simplifyy

* less log, plz

* simplify more

* Simplified by a lot - much simpler

now pick fewer addresses

* fixy

* Account for infinite

* copyright nonsense

* clean up debug logging

* round_robin works!

* use rw mutex to serialize creation

* rework the filtering to make everything parameterized and more explicit

Change all the config so that instead of hard coded constants we set the
various connection attributes, json field names, etc using command line flags.

Then make the pool type and affinity arguments more explicit and less generic.

* no longer needed

* update comments

* only pass through the URL params we need

* affinity is actually optional

---------

Co-authored-by: Michael Demmer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants