Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to control if remote replicas should be used #202

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

sylwiaszunejko
Copy link
Collaborator

@sylwiaszunejko sylwiaszunejko commented Jul 1, 2024

Previously the driver would always failover to use replicas in the remote datacenters if there was no replicas available in local_dc when using DC/RackAwareRoundRobinPolicy. There was no control over it. This is not how the other driver handle it.

In python-driver there is an additional field in the DCAwareRoundRobinPolicy named used_hosts_per_remote_dc and it is 0 by default.

In java-driver 4.x there is also similar mechanism as in python-driver, there is a field usedHostsPerRemoteDc.

In rust driver there is permit_dc_failover option in the DefaultPolicy. If it is set to true driver can use replicas in dcs other than local. By default it is false.

This PR adds a field to dc/rackAwareRR to control if dc failover is permitted. To avoid breaking changes I decided not to change the default behavior.

Refs: #201 <- this issue is about wrong dc name in DCAwareRoundRobinPolicy, but the difference in gocql behavior in that case originates in handling remote replicas by default.

@sylwiaszunejko
Copy link
Collaborator Author

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

@dkropachev
Copy link
Collaborator

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

@sylwiaszunejko , thansk for PR, one question:

  1. Do you consider changing this option while policy is in use ?

@sylwiaszunejko
Copy link
Collaborator Author

I send it as a draft to discuss it, because changing default behavior is not something to do lightly for sure. Also I went with the behavior similar to rust driver not java or python-driver, that also can be discussed.

@sylwiaszunejko , thansk for PR, one question:

1. Do you consider changing this option while policy is in use ?

My intention was to specify it only at the beginning as in other drivers. I wanted to add this option to the constructor, but in golang there is no option to specify default value for an argument and I didn't want to do API breaking changes by adding extra parameter in https://github.com/scylladb/gocql/blob/master/policies.go#L960

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 1, 2024

@sylwiaszunejko , thansk for PR, one question:

1. Do you consider changing this option while policy is in use ?

My intention was to specify it only at the beginning as in other drivers. I wanted to add this option to the constructor, but in golang there is no option to specify default value for an argument and I didn't want to do API breaking changes by adding extra parameter in https://github.com/scylladb/gocql/blob/master/policies.go#L960

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change it on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)

@sylwiaszunejko
Copy link
Collaborator Author

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)

I wanted to do it that way initially, but I was wondering if changing DCAwareRoundRobinPolicy definition is what we want.

@dkropachev
Copy link
Collaborator

Let's than implement it in a different way that won't polute HostSelectionPolicy and won't suggest posibility to change on a run, let's do something like this:

policy := DCAwareRoundRobinPolicy("local", HostPolicyOptionEnableDCFailover)

I wanted to do it that way initially, but I was wondering if changing DCAwareRoundRobinPolicy definition is what we want.

It won't be breaking change, let me show you.

@sylwiaszunejko
Copy link
Collaborator Author

sylwiaszunejko commented Jul 2, 2024

@dkropachev What is your opinion about changing the default behavior regarding using remote hosts?

@sylwiaszunejko sylwiaszunejko self-assigned this Jul 2, 2024
@sylwiaszunejko sylwiaszunejko requested a review from roydahan July 2, 2024 06:51
@dkropachev dkropachev force-pushed the permit_dc_failover branch from ac748fa to 9724d07 Compare July 2, 2024 12:56
@sylwiaszunejko
Copy link
Collaborator Author

@dkropachev something went wrong with your push, there are unnecessary commits, if I should fix something please let me know, I will fix it

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 2, 2024

@dkropachev What is your opinion about changing the default behavior regarding using remote hosts?

I would classify it as breaking change, it is better to avoid that by defaulting to old behavior.

@dkropachev dkropachev force-pushed the permit_dc_failover branch from 9724d07 to 7339ae1 Compare July 2, 2024 13:06
@dkropachev
Copy link
Collaborator

@dkropachev something went wrong with your push, there are unnecessary commits, if I should fix something please let me know, I will fix it

I am sorry for doing it, could you please push your version of the branch again

@sylwiaszunejko sylwiaszunejko force-pushed the permit_dc_failover branch 3 times, most recently from 90add8e to 64d83f5 Compare July 2, 2024 13:27
@sylwiaszunejko
Copy link
Collaborator Author

v2:

  • I fixed the code to not change the default behavior but add an option to disable using hosts in remote datacenters.

policies.go Outdated
type dcAwarePolicyOption func(p dcFailoverEnabledPolicy)

func HostPolicyOptionEnableDCFailover(p dcFailoverEnabledPolicy) {
p.setDCFailoverEnabled()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change meaning of it on opposite (and name as well) to do not change default behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already pushed code with the changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, but underlying variable did not change it's name and also let's make it's value as false by default and to be changed to true via this option

policies.go Outdated
}

func (d *dcAwareRR) setDCFailoverDisabled() {
d.permitDCFailover = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d.permitDCFailover = false
d.disableDCFailover = true

policies.go Outdated
@@ -1067,6 +1094,10 @@ func (d *rackAwareRR) MaxHostTier() uint {
return 2
}

func (d *rackAwareRR) setDCFailoverDisabled() {
d.permitDCFailover = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d.permitDCFailover = false
d.disableDCFailover = true

policies.go Outdated
Comment on lines 1135 to 1140
if d.permitDCFailover {
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get(), d.hosts[2].get())
} else {
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get())
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if d.permitDCFailover {
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get(), d.hosts[2].get())
} else {
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get())
}
if d.disableDCFailover {
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get())
}
return roundRobbin(int(nextStartOffset), d.hosts[0].get(), d.hosts[1].get(), d.hosts[2].get())

policies.go Outdated
Comment on lines 1056 to 1061
if d.permitDCFailover {
return roundRobbin(int(nextStartOffset), d.localHosts.get(), d.remoteHosts.get())
} else {
return roundRobbin(int(nextStartOffset), d.localHosts.get())
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if d.permitDCFailover {
return roundRobbin(int(nextStartOffset), d.localHosts.get(), d.remoteHosts.get())
} else {
return roundRobbin(int(nextStartOffset), d.localHosts.get())
}
if d.disableDCFailover {
return roundRobbin(int(nextStartOffset), d.localHosts.get())
}
return roundRobbin(int(nextStartOffset), d.localHosts.get(), d.remoteHosts.get())

policies.go Show resolved Hide resolved
Previously the driver would always failover to use replicas
in the remote datacenters if there was no replicas available
in local_dc when using DC/RackAwareRoundRobinPolicy. There was
no control over it. This is not how the other driver handle it.

This commit adds a option to dc/rackAwareRR to control if dc
failover is permitted. The default stays the same (use remote
datacenter).
@sylwiaszunejko
Copy link
Collaborator Author

I added a simple test

@wprzytula
Copy link
Collaborator

Please re-request review from me once it gets stable and not draft.

@wprzytula wprzytula removed their request for review July 3, 2024 07:25
@sylwiaszunejko sylwiaszunejko requested a review from dkropachev July 3, 2024 07:51
@roydahan
Copy link
Collaborator

roydahan commented Jul 3, 2024

The default behavior is something we should consider.
On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application.
(But then they may realize their queries are going to the wrong DC and hurting their performance).

On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it.
(In theory the failure that the user had, they could also just change the DC name).

I wonder what other maintainers think about it.
@fruch / @Lorak-mmk / @wprzytula ?

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 3, 2024

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).

On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).

I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this.
We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

Having made this decision, it will guide us in multiple similar dilemmas: to break with good intentions or to preserve old behaviour.
Having chosen 1., we would avoid major releases and API-breaking changes, as well as we would preserve old defaults.
With 2., we would admit that we are actually developing gocql, not only maintaining it, and so we would issue next major releases.

@wprzytula
Copy link
Collaborator

wprzytula commented Jul 3, 2024

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

@mykaul
Copy link

mykaul commented Jul 3, 2024

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

You mean https://github.com/scylladb/scylla-go-driver ?

@mykaul
Copy link

mykaul commented Jul 3, 2024

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).
On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).
I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2.
Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

@Lorak-mmk
Copy link

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).
On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).
I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2. Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

Availability? I thought the discussion was about changing the default to prevent sending requests to other DCs, as is the default in other drivers? Isn't that decreasing availability?

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 3, 2024

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

I know that upstream maintainers were also thinking of v2 and looked at scylladb/scylla-go-driver as a base for it.
It would be amazing if we manage to connect with them and work on the project together.

@Lorak-mmk
Copy link

My personal opinion would be to keep gocql a maintained-only driver, with no breaking changes, and instead develop the Scylla Go driver, which has good design from the very beginning (inspired by the Rust driver) and would promote good practices.

I know that upstream maintainers were also thinking of v2 and looked at scylladb/scylla-go-driver as a base for it. It would be amazing if we manage to connect with them and work on the project together.

It would defnitely be good to cooperate instead of having forks.

@mykaul
Copy link

mykaul commented Jul 3, 2024

The default behavior is something we should consider. On one hand, I understand that changing the behavior under the users feet may cause confusions and potentially fail their application. (But then they may realize their queries are going to the wrong DC and hurting their performance).
On the other hand, just introducing the option "quietly", existing users won't notice it and it requires them to actively change it. (In theory the failure that the user had, they could also just change the DC name).
I wonder what other maintainers think about it. @fruch / @Lorak-mmk / @wprzytula ?

This is how I see this. We need to decide which is, in general, our higher priority in gocql:

  1. backward-compatibility, API stability etc. - being mature and stable driver with no changes unless absolutely necessary, or
  2. ensuring the most sensible default behaviour of the driver, as well as promoting good practices among the users.

I'm in favor of no. 2. Don't punish users with bad default choices in the spirit of 'backward compat.' And it's not an API breaking change. It's a minor item that will improve their lives - especially when there's a failure. The default should be focus on maintaining availability in the face of failures.

Availability? I thought the discussion was about changing the default to prevent sending requests to other DCs, as is the default in other drivers? Isn't that decreasing availability?

Previously, there was no control for it. Now there is, and we are considering what the default should be. I'm in favor of ALLOWING sending requests to other DCs. I think the default in other drivers is wrong.

@sylwiaszunejko
Copy link
Collaborator Author

Previously, there was no control for it. Now there is, and we are considering what the default should be. I'm in favor of ALLOWING sending requests to other DCs. I think the default in other drivers is wrong.

@mykaul That's exactly what the code in this PR does so far. The default behavior is the same as it was without controlling mechanism, but we have the control available. And this option is in fact in line with @wprzytula approach of not doing breaking changes.

@roydahan
Copy link
Collaborator

roydahan commented Jul 3, 2024

Just to make sure, if we leave the default as is, the user would have the exact same issue as they had when pointing the wrong DC name without ability to tell.

@sylwiaszunejko
Copy link
Collaborator Author

Just to make sure, if we leave the default as is, the user would have the exact same issue as they had when pointing the wrong DC name without ability to tell.

True, to fail if the wrong DC is provided they would have to use HostPolicyOptionDisableDCFailover

@sylwiaszunejko sylwiaszunejko merged commit e6d9bec into scylladb:master Jul 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants