CASSGO-4 Support of sending queries to the specific node #1793

worryg0d · 2024-08-05T08:39:05Z

Overview

This PR provides a mechanism that allows users to specify on which node the query will be executed. It is not a typical use case, but it makes sense with virtual tables.
For example, when we want to retrieve metrics for a specific node, we have to send queries to the associated system view of this node.

Implementation Overview

A new method SetHost() for the Query allows users to specify the host (node) on which the driver must send the query. It is implemented by adding a GetHost() to the ExecutableQuery interface. When the host is specified for the query the driver ignores any retry, speculative, and host selection policies.

Usage Example

Here is an example of retrieving metrics from system_view.cql_metrics for each node of the cluster:

cluster := NewCluster("127.0.0.1")
session, err := cluster.CreateSession()
if err != nil {
	panic(err)
}
defer session.Close()

hosts, err := session.GetHosts()
if err != nil {
	panic(err)
}

type cqlMetric struct {
	name  string
	value float64
}

// hostId to []cqlMetric
cqlMetrics := map[string][]cqlMetric{}

for _, host := range hosts {
	iter := session.Query("SELECT * FROM system_views.cql_metrics").
		SetHost(host).
		Iter()

	metrics := make([]cqlMetric, 0, iter.NumRows())
	metric := cqlMetric{}
	for iter.Scan(&metric.name, &metric.value) {
		metrics = append(metrics, metric)
	}
	cqlMetrics[host.HostID()] = metrics
}

martin-sucha · 2024-08-05T08:50:09Z

Should the API allow for other routing preferences? For example only a single data center/rack? Or be extensible to allow targeting a specific shard in the scylladb/gocql fork?

Should the API allow to route to a different host during retry, i.e. allow passing a HostSelectionPolicy or a similar interface/function?

worryg0d · 2024-08-06T10:48:32Z

Should the API allow for other routing preferences? For example only a single data center/rack? Or be extensible to allow targeting a specific shard in the scylladb/gocql fork?

Should the API allow to route to a different host during retry, i.e. allow passing a HostSelectionPolicy or a similar interface/function?

HostInfo provides API to retrieve information like dc or rack for the host, so users may filter retrieved hosts manually before binding them to the query.

In my opinion, re-routing queries bound to the specific hosts to others is not a good idea even during retries, at least as default behavior. Sending queries to a specific host is not a typical use case and if we try to do it, we assume that the specified host is reachable. Or, at least, we expect that the query was attempted to be sent to the specified host.

The idea of this API being extensible sounds good, but I am not sure about implementation. May you have some advice or ideas on how to properly implement this?

joao-r-reis · 2024-11-20T17:33:08Z

I think keeping this focused on the virtual table use case is best. Other routing preferences like shard awareness for scylla should probably be handled at the HostSelectionPolicy level. For this, we could add a function on Query and Batch that allows the HostSelectionPolicy to be provided instead of using the session level one but this can be handled on a separate JIRA and PR, it's a different use case than this one which is focused on virtual table queries that should be strongly associated with a specific host.

If a user sets the host using Query.SetHost() then either we disable retries and speculative executions altogether for that query or we allow retries and speculative executions only on that host (basically the host selection policy should never be used in the context of this query).

In the future if we add a way to provide a HostSelectionPolicy in addition to .SetHost() then the driver should probably return an error saying that you can't simultaneously set a host and provide a policy on the same query.

joao-r-reis · 2024-11-20T17:35:32Z

When the queryExecutor.executeQuery is called then it checks if the provided ExecutableQuery implements the hostGetter interface as well. We need this interface to not change the API ExecutableQuery. Also, the ExecutableQuery is implemented by Query and Batch as well, so it is good to avoid adding redundant methods for Batch as well.

So, if provided ExecutableQuery implements hostGetter, then it type-asserts it and calls the underlying method to get the host. If the host is not nil, it wraps the host into hostIter func which just returns the specified host.

The discussion around the ExecutableQuery interface and the Query API in general probably has a decent overlap with the work being dicussed and implemented in https://issues.apache.org/jira/browse/CASSGO-22 , justfyi.

joao-r-reis · 2024-11-20T17:50:40Z

query_executor.go

@@ -83,7 +83,28 @@ func (q *queryExecutor) speculate(ctx context.Context, qry ExecutableQuery, sp S
 }

 func (q *queryExecutor) executeQuery(qry ExecutableQuery) (*Iter, error) {
-	hostIter := q.policy.Pick(qry)
+	type hostGetter interface {


How about adding a private interface like:

type cqlQuery interface { ExecutableQuery getHost() *HostInfo }

And make gocql internal code reference cqlQuery (name subject to change?) instead of ExecutableQuery so that we don't have to worry about breaking the public ExecutableQuery interface when we add new functions to Query or Batch?

Alternatively, we could just add new functions to ExecutableQuery because there's no place where a user can provide a custom implementation of ExecutableQuery to the driver is there? If the driver API doesn't expose any function that receives an object of type ExecutableQuery then we should be fine changing this interface since there's no real use case where a user might implement it on their application.

From a quick glance at the code, ExecutableQuery is provided to implementations of HostSelectionPolicy but it's the driver that calls HostSelectionPolicy.Pick(ExecutableQuery) with Query or Batch objects as parameters so I don't see an issue in adding GetHost to ExecutableQuery

I don't mind extending ExecutableQuery interface, but adding private methods is not the best thing to me. The driver exposes HostSelectionPolicy interface and allows user to provide their own implementations, but users cannot properly test them because Pick() accepts ExecutableQuery that cannot be mocked for testing purposes due to private methods inside... Ofc they can just pass Query and Batch objects, but I'm unsure if it is enough for testing

These private methods would not be used in HostSelectionPolicy because HostSelectionPolicy declares the parameter as ExecutableQuery not cqlQuery or whatever the name would be for the private interface.

what private methods are you referencing?

Such methods as execute(), attempt(), etc.

Oh you're right, I didn't notice them. 👍 So it wouldn't actually make much of a difference if we added a private getHosts method there but we can keep it public

The only issue of making GetHost() public is a requirement to implement it for Batch only to satisfy the interface. So there will be a public API Batch.GetHost() which actually does noop. Maybe the first approach with internal query interface for internal use and the public one for exposing is better...

I think it's fine if Batch.GetHost() does a noop honestly, there's also a possibility that we might make it possible to set the host on batch queries in the future

Then let's leave this one as it is for now

joao-r-reis · 2024-11-21T14:20:17Z

query_executor.go


 	var lastErr error
 	var iter *Iter
 	for selectedHost != nil {
 		host := selectedHost.Info()
-		if host == nil || !host.IsUp() {
+		if (host == nil || !host.IsUp()) && specifiedHost == nil {
 			selectedHost = hostIter()
 			continue
 		}


we should probably add another if here to exit right away if specified host is != nil and the host is down

It makes sense not to send the query if the host is down. I added a check before this if that immediately returns an ErrNoConnections when the specified host is down.

Query.SetHost() allows users to specify on which node the Query will be executed. It is not a tipycal use case, but it makes sense with virtual tables which are available since C* 5.0.0. Patch by Bohdan Siryk; Reviewed by João Reis for CASSGO-4

worryg0d · 2024-11-26T09:57:22Z

Force pushed due to conflicts with the trunk

worryg0d force-pushed the query-to-specific-node branch from b2f7e6f to 0444ee3 Compare August 5, 2024 08:50

tengu-alt approved these changes Aug 5, 2024

View reviewed changes

OleksiienkoMykyta approved these changes Aug 5, 2024

View reviewed changes

joao-r-reis changed the title ~~Support of sending queries to the specific node~~ CASSGO-4 Support of sending queries to the specific node Oct 29, 2024

joao-r-reis reviewed Nov 20, 2024

View reviewed changes

worryg0d force-pushed the query-to-specific-node branch 2 times, most recently from d8ebae3 to c1e46be Compare November 21, 2024 11:25

joao-r-reis reviewed Nov 21, 2024

View reviewed changes

worryg0d force-pushed the query-to-specific-node branch from c1e46be to d83e8a2 Compare November 21, 2024 16:10

joao-r-reis approved these changes Nov 21, 2024

View reviewed changes

worryg0d force-pushed the query-to-specific-node branch from d83e8a2 to 33730fa Compare November 22, 2024 11:24

SetHost API for Query

63c6f5d

Query.SetHost() allows users to specify on which node the Query will be executed. It is not a tipycal use case, but it makes sense with virtual tables which are available since C* 5.0.0. Patch by Bohdan Siryk; Reviewed by João Reis for CASSGO-4

worryg0d force-pushed the query-to-specific-node branch from 33730fa to 63c6f5d Compare November 26, 2024 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSGO-4 Support of sending queries to the specific node #1793

CASSGO-4 Support of sending queries to the specific node #1793

worryg0d commented Aug 5, 2024 •

edited

Loading

martin-sucha commented Aug 5, 2024

worryg0d commented Aug 6, 2024

joao-r-reis commented Nov 20, 2024

joao-r-reis commented Nov 20, 2024

joao-r-reis Nov 20, 2024 •

edited

Loading

joao-r-reis Nov 20, 2024

joao-r-reis Nov 20, 2024

worryg0d Nov 20, 2024 •

edited

Loading

joao-r-reis Nov 20, 2024

worryg0d Nov 21, 2024

joao-r-reis Nov 21, 2024

worryg0d Nov 21, 2024

joao-r-reis Nov 21, 2024

worryg0d Nov 21, 2024

joao-r-reis Nov 21, 2024

worryg0d Nov 21, 2024

joao-r-reis Nov 21, 2024

worryg0d commented Nov 26, 2024

CASSGO-4 Support of sending queries to the specific node #1793

Are you sure you want to change the base?

CASSGO-4 Support of sending queries to the specific node #1793

Conversation

worryg0d commented Aug 5, 2024 • edited Loading

Overview

Implementation Overview

Usage Example

martin-sucha commented Aug 5, 2024

worryg0d commented Aug 6, 2024

joao-r-reis commented Nov 20, 2024

joao-r-reis commented Nov 20, 2024

joao-r-reis Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

worryg0d Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

worryg0d commented Nov 26, 2024

worryg0d commented Aug 5, 2024 •

edited

Loading

joao-r-reis Nov 20, 2024 •

edited

Loading

worryg0d Nov 20, 2024 •

edited

Loading