-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rack/dc aware load balancing #40
Add rack/dc aware load balancing #40
Conversation
@nyh please take a look. |
If I understand correctly, the example you gave and the Demo1 you modified are for the old Java SDK (v1). Did you also check the newer SDK v2 (it was released 7 years ago - it isn't very new...)? If you didn't, please do. If you did, please mention that both are supported (or not) in the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks mostly good for SDK v1 (I think it breaks v2?), but I think but I have one objection about the need for a new start() which needlessly (I think) changes the API of this library. Wouldn't it be easier if we just had a constructor that takes in addition to a url, also dc and rack? Basically, exactly the same code you added to Demo1 (the constructor will take both dc and rack, but null means not to set one or both).
I am also not sure there is a need for all the complexity you added for checking if the rack/dc feature is supported, and retesting this every minute. After all, the rack/dc option is something the user explicitly chooses for a specific cluster - just like the initial address of one known node. Why would a user pass wrong parameters? Why does the library need to check and recheck it, instead of just saying - if you pass wrong parameters, you'll get wrong results?
I don't mind if we have a method like checkSupportForRackandDC() or something like that, maybe some user will find it convenient, but this can be a stand-alone function and doesn't need to be mixed into all the other functions.
String path = "/localnodes"; | ||
boolean paramsInitiated = false; | ||
if (this.isRackSpecified()) { | ||
path += "?rack=" + this.rack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you may need to URL-encode the rack and dc names if they contain strange characters. Maybe not worth the hassle, but we may be suprised in the future if a customer uses strange characters in a rack/dc name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I did not want to mess with switching everything to URI, but it looks like we have to go that way.
return new URL(alternatorScheme, nextNode(), alternatorPort, file); | ||
} | ||
|
||
public RuntimeException checkIfRackAndDatacenterSetCorrectly() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for this method to return - not throw - an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, initially i was thinking that i can reuse this part of the code and fotgot to check on it later, switched to throwing them
return this; | ||
} | ||
|
||
public AlternatorRequestHandler checkIfRackAndDatacenterSetCorrectly() throws RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation on what exactly this function checks about the setting being "correct".
If I understand correctly (but please correct me if I'm wrong), it doesn't really check if the rack and datacenter are "correct" but just if this feature is supported in the server. In this case I have several thoughts/questions:
- Why bother check - why would a user try to set a rack if the server doesn't support it?
- If a user does want to check, why not have a function called checkSupportsRackandDatacenter which the user can call? Why would the user call this after already setting datacenter and rack?
- But on the other hand, why does the user need to call this explicitly - setrack()/setdc() could check immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We have had exactly same problem for drivers, user missconfigure driver and then does not know why it does not work or behaive in a way it does not expect.
- I will do that.
- Agree setRack, is not needed at this point, i will remove it.
List<String> hostsWithFakeRack = getNodes(fakeRackUrl); | ||
List<String> hostsWithoutRack = getNodes(url); | ||
if (hostsWithoutRack.isEmpty()) { | ||
// If list of nodes is empty, it is impossible to conclude if it supports rack/datacenter filtering or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case may have some theoretical interest, but it practice it should not be possible and indicates a logic error in Scylla. The whole assumption of this load balancer is that we start with one known live node, and can contact it to get the list of all live nodes. If a node responds but says it itself isn't live, nothing will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, that is why exception is thrown here. In next version this part is going to be a bit different.
But this case still be handled in similar manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was more about your comment than about the if. The comment makes it sound like this is some sort of reasonable, possible, case, in which case "we can't conclude" and the function failed to do what it was supposed to do (figure out if this option is supported or not). But this case should never happen - it should not be possible for us to contact a live node and learn from it that there are no live nodes in the cluster. I thought maybe the comment should say something about this case not being possible.
private boolean isRackAndDatacenterSupported() { | ||
if (this.supportsRackAndDatacenter == null) { | ||
LocalDateTime now = LocalDateTime.now(); | ||
if (this.nextRackAndDatacenterFeatureSniffTime == null || now.isAfter(this.nextRackAndDatacenterFeatureSniffTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to recheck every minute that the dc/rack feature is still supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to drop it.
Collections.sort(hostsWithFakeRack); | ||
Collections.sort(hostsWithoutRack); | ||
if (hostsWithoutRack.equals(hostsWithFakeRack)) { | ||
// When rack filtering is not supported server returns same nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have simplified this check - if rack filtering is supported, the result of filtering with a fake rack name is always an empty list. No real need to sort or compare hostsWithoutRack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just protection against the case when fakeRack
is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean when there is an actual rack called "fakeRack"? In this case, this check is wrong anyway - it will return "false" (because of the equality) even when the rack option is supported.
To avoid the case where fakeRack is actually exists, you can use a longer ridiculous name like this_FAKE_rack_couldn't possibly exist!
;-) Of course that's not perfect either.
return this; | ||
} | ||
|
||
public AlternatorRequestHandler start() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the API of AlternatorRequestHandler - not only did you add setDataCenter()/setRack() methods, you also made it necessary to call the start() method - something that wasn't needed in the past.
This will cause problems to existing users of this code (if we have any...), but moreover - the README.md which explains how to use this will need to be changed to call this start().
Personally, I think it would be better not to introduce this start() - I would just make the datacenter/rack an (optional) constructor parameter, just like uri already is. Why would that not be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my concern is that when class instance invokes threading, opens a file, opens a network connection during initialization it makes it hard and hacky to test such classes.
Alternative is to implement AlternatorLiveNodes
with start
on it, while in AlternatorRequestHandler
and AlternatorEndpointProvider
call start
in a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the implementation of a class (whether it relies on some background thread or not) makes it easier or harder to test, but my point was that it wasn't a necessary part of this patch, which had other goals.
But perhaps more important I commented that if you make this API change you also need to:
- Change the documentation - not just demo1.java - to explain the new way of creating this class.
- Change not just the SDK v1 class and demo1.java, but also the SDK v2 class and demo2.java, because those use the changed API as well.
Your new suggestion (having start() in the internal AlternatorLiveNodes but not in the external class) makes 1 above unnecessary, but 2 still is. But it still makes me wonder what you gain by this internal change, which just adds confusion (now there is an internal API which is different from the external API).
To make it clear, I'm not strongly against this change, only mildly against it (until I hear a convincing example of why the thread-in-constructor thing is problematic), but do insist that if you do this change you should do everything that it needs - including changing the README.md and the SDK v2 code to match the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkropachev I recommend not doing unnecessary changes at the moment.
I consider having thread running when you initialize class as bad practicce that does not allow properly test it and makes code hard to read and understand:
We had exactly same problem in drivers, user provides wrong rack, datacenter or something and can't figure out why queries are failing. Regarding check if feature is supported or not. same reason, if we can tell user that target cluster does not support feature, we should do that, otherwise code will produce unexpected results and user left to guess what is going on.
Makes sense, let's do it this way. |
I don't why the user needs to care if the object contains a thread or not, but if you decided to change this part (and you didn't strictly have to do this, you could have added a constructor parameter), you'll need to change the documentation - and hope nobody is using the existing version and will complain about the API change :-)
So the setrack() function (or the constructor, in the old style) can do the check. Why would a user even think to call a separate function to check this? And why does this feature need to be re-checked every minute - are we worried that the user will downgrade the cluster and lose this feature (I don't even think this is possible in Scylla)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added two small comments, since other important things are already covered by other reviewers.
private String rack; | ||
private String datacenter; | ||
private Boolean supportsRackAndDatacenter; | ||
private LocalDateTime nextRackAndDatacenterFeatureSniffTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably should use something different than LocalDateTime for measuring times between calls (long milis or nanos or Instant maybe), but I guess this works too.
if (datacenter != null && !datacenter.isEmpty()) { | ||
handler.setDatacenter(datacenter); | ||
} | ||
handler.checkIfRackAndDatacenterSetCorrectly().start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be needed to manually call checks for various options before start. Validation of all options could be done on start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@nyh , @Bouncheck , let's split it into parts, i will create smaller PRs to address separate issues and then we proceed with this one. |
Yes, although note that the rack-aware load balancing is the most important part, I think it should be done first before all the other minor (?) improvements. Also, I asked why you decided to fix this only for the v1 version of the SDK and not v2, and didn't get a reply. |
583b124
to
b670028
Compare
@nyh, @Bouncheck , could you please take a look agian, I have addressed comments and rebased to the master, also I have fixed both |
@dkropachev in a single-patch pull-request, it is the single patch's commit message - NOT the cover letter - which gets inserted into the git history and there will be no merge and therefore no cover letter. So please make sure that this single patch's commit message has the full description - and not just a single line. |
Oh, and of course please ensure that the commit message is up-to-date with the code. I see for example that instead of the setrack() et al. that you described, now it's part of the constructor. |
java/src/main/java/com/scylladb/alternator/AlternatorEndpointProvider.java
Outdated
Show resolved
Hide resolved
java/src/main/java/com/scylladb/alternator/AlternatorEndpointProvider.java
Outdated
Show resolved
Hide resolved
java/src/main/java/com/scylladb/alternator/AlternatorLiveNodes.java
Outdated
Show resolved
Hide resolved
public AlternatorLiveNodes(String alternatorScheme, List<String> liveNodes, int alternatorPort) { | ||
this.alternatorScheme = alternatorScheme; | ||
this.initialNodes = new ArrayList<>(liveNodes); | ||
public AlternatorLiveNodes(List<URI> liveNodes, String datacenter, String rack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this change to the constructor... I mean the change to use List<URI>
:
The existing interface assumed that we have a list of node hostnames (each a String), and all of them share the same scheme and port. This assumption is a valid one because this is indeed how Scylla works (all nodes use the same scheme and port), and because "/localnodes" returns just the host names, not the port and scheme, and we need to know what it will be for all nodes.
Now you are changing this constructor to take a list of URIs. Theoretically, a caller can pass in each one a different port and scheme, but if you try to do that, nothing will work! In fact, your code saves the port and scheme of the first node of the list, and then assumes (without checking!) that all the rest have the same port and scheme. But if it's what you do, why even allow passing separate URIs?
I think you shouldn't change the original interface with the separate scheme, port and hostname. Internally, if you want to build a list of URIs from that, you can (I think it's a bit wasteful, but not that much) - but it shouldn't be the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper API converts string to URI and then converts it back to String uri.getHost()
, which is a bit weird.
Is it ok to have it as public AlternatorLiveNodes(URI liveNode, String datacenter, String rack)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are now asking if it's ok to allow creating this class with just one known node instead of an array. I think it's better to allow the user to configure more than one known node (e.g., in case one of the known nodes is down!).
But your new API - that can accept a general list of URI objects but then assume (without even checking that) all of them the same scheme and port, is just wrong.
I think you should either keep the old signature of this constructor (it's just a constructor, performance is not important), or, if you want to keep the new API, please have the constructor verify that all given liveNodes have the same getPort() and getScheme() as the 0th one (which you saved in this.alternatorPort and this.alternator) - and if not throw an exception. You shouldn't just silently allow users to think such a use case might work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed.
// Should never happen, uri is already validated at this point | ||
throw new RuntimeException(e); | ||
conn.setRequestMethod("GET"); | ||
} catch (ProtocolException ignored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment what this catch is about. I assume it should never happen because the URI was already "validated" earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, please take a look.
@@ -82,14 +90,53 @@ public static void main(String[] args) { | |||
logger.addHandler(handler); | |||
logger.setUseParentHandlers(false); | |||
|
|||
ArgumentParser parser = ArgumentParsers.newFor("Demo3").build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Demo1, not Demo3.
Please seriously reconsider why you want to introduce so many lines and a non-standard library I never heard of - net.sourceforge.argparse4j - just for a demo of how to use the library. A demo is supposed to be a short example, not something very user-friendly.
If you do feel strongly about adding this command-line parsing, can you please consider using something built-in into Java and not some uncommon external library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that @elcallio introduced this external library and the argument and made the demo code lengthier with parsing code in Demo3, in commit 27cf111. This means I probably can't complain when you now copied it into Demo1 and Demo2. I still don't like it in any of the demo codes, but my case has become weaker...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just copy-paste from the Demo3.java
, I agree it is better to use std library, but since we already have dependancy on it, it is not big of the problem, can we have an issue to switch from net.sourceforge.argparse4j
to something else ?
Reason why it is needed it makes testing code so much easier.
As of now I am talking about only local testing, but in future we will do integration tests to make sure all our demos work flawlessly and with expected result.
56ee542
to
50418ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one comment. Other things I saw were already covered.
fakeRackUrl = new URI(uri.getScheme(), null, uri.getHost(), uri.getPort(), uri.getQuery(), "rack=fakeRack", ""); | ||
} catch (URISyntaxException e) { | ||
// Should not ever happen | ||
throw new FailedToCheck("Invalid URI: " + uri, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should not ever happen then i believe unchecked exception would be more appropriate. This forces the caller of the function to handle this case. If this case does not happen I think there is no reason to force handling of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there are other throws with this exception so the caller would need to do handle that anyway. I leave this to your judgement then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should not ever happen then i believe unchecked exception would be more appropriate. This forces the caller of the function to handle this case. If this case does not happen I think there is no reason to force handling of that.
True.
The idea there is that since there is already FailedToCheck
present in the same method, which signals that it can't complete check properly it is better to use it, otherwise I would use RuntimeException
Done, please take a look at commit and PR description. |
50418ec
to
394fd3c
Compare
@nyh can you please prioritize this review so we can "release" it? |
Will review it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, looks mostly good, but I still have a couple of things I'd like you to please fix (and a few more comments that I don't insist on). Please take a look at all the live comment threads on your single commit.
java/src/main/java/com/scylladb/alternator/AlternatorEndpointProvider.java
Show resolved
Hide resolved
public AlternatorLiveNodes(String alternatorScheme, List<String> liveNodes, int alternatorPort) { | ||
this.alternatorScheme = alternatorScheme; | ||
this.initialNodes = new ArrayList<>(liveNodes); | ||
public AlternatorLiveNodes(List<URI> liveNodes, String datacenter, String rack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are now asking if it's ok to allow creating this class with just one known node instead of an array. I think it's better to allow the user to configure more than one known node (e.g., in case one of the known nodes is down!).
But your new API - that can accept a general list of URI objects but then assume (without even checking that) all of them the same scheme and port, is just wrong.
I think you should either keep the old signature of this constructor (it's just a constructor, performance is not important), or, if you want to keep the new API, please have the constructor verify that all given liveNodes have the same getPort() and getScheme() as the 0th one (which you saved in this.alternatorPort and this.alternator) - and if not throw an exception. You shouldn't just silently allow users to think such a use case might work.
@@ -132,7 +139,7 @@ public URI nextAsURI() { | |||
public URI nextAsURI(String file, String query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: (not relevant to this patch, I don't know when this introduced)
What is called "file" in this function is normally called a "path" in URL jargon - not a "file".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the fix. Maybe you forgot to push it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github UI does not pick up new commits on the branch, I am trying to fix it.
394fd3c
to
114708d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor objections to some of the improvements you made to the API and structure of this code, which I'm not sure improve anything, but I will NOT insist on any of them - I just want this patch to go in.
However, there is one last thing that seems wrong - the reference to "1.1.1.1" - either it's a typo or I don't understand what you tried to do there. This is why I'm chosing "Request changes" here. When you fix this error (?), we can merge this PR. You can think about the rest of my comments about changes you did and I didn't like, later :-)
@@ -20,7 +20,13 @@ public class AlternatorEndpointProvider implements DynamoDbEndpointProvider { | |||
|
|||
public AlternatorEndpointProvider(URI seedURI) { | |||
futureCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot we merged this futureCache. I still don't understand why it's correct, or optimizing anything :-( But it's irrelevant to this patch.
} catch (AlternatorLiveNodes.ValidationError e) { | ||
throw new RuntimeException(e); | ||
} | ||
liveNodes.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: (not important, will not insist on this)
It seems this code snippet - calling the constructor, then validate, then start - was the only correct way to create this object, which is why the create() factory function existed. I don't understand why you wanted to get rid of it in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now yes, in future, when we address #46, having it started
is going to be optional.
As I mentioned before, if we ever made it to unit-test, it will be way easier to have it tested with this API, then when everything is done on initialization.
throw new RuntimeException(e); | ||
} | ||
} | ||
this.liveNodes.set(nodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why this code - converting initialNodes into liveNodes - isn't in the constructor. If you did that, you wouldn't need the silly used-once class field "initialNodes", and you wouldn't need to do exactly the same check in validate().
I mentioned this already and it seems I didn't convince you - so I just want to mention this one last time and I won't insist - the feature is more important than perfecting those unrelated changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place was changed, but by some reason github UI does not pick up new commits on the branch, I am trying to fix it.
try { | ||
this.hostToURI(liveNode); | ||
} catch (MalformedURLException | URISyntaxException e) { | ||
throw new ValidationError(String.format("failed to validate initial node %s", liveNode), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message isn't helpful for the user who'll see it. What does it mean that we "failed to validate" a node? It seems you only check that it's a valid-looking URL - not that it, for example, is a live node. You should say what sort of validation it failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be included in the original exception, so they will see what exactly is wrong.
} catch (URISyntaxException e) { | ||
logger.log(Level.WARNING, "nextAsURI", e); | ||
return null; | ||
this.hostToURI("1.1.1.1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you were trying to do here, but I doubt this 1.1.1.1 thing was intentional :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is, I just want to make sure here that alternatorScheme
and alternatorPort
are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that you are checking that the scheme and port are valid. I had no idea what you're doing here. It looked like a typo :-)
@@ -132,7 +139,7 @@ public URI nextAsURI() { | |||
public URI nextAsURI(String file, String query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the fix. Maybe you forgot to push it?
this.liveNodes = new AtomicReference<>(); | ||
this.alternatorPort = alternatorPort; | ||
this.nextLiveNodeIndex = new AtomicInteger(0); | ||
public AlternatorLiveNodes(URI liveNode, String datacenter, String rack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you are changing the API of this constructor again, after the previous patch already changed it.
Unlike the previous version of this pull request, the API you are suggesting here is not wrong (a single "liveNode" has just one port and scheme), but in my opinion, it's not better than the one before this patch - which was more general (allowed to add more than one initial node as the initial guess). I think it's a mistake to change it - it's not improving this class, it's making it less powerful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It addresses your prior comment in the following manner:
- It allows user to create an instance with only one initial nodes and reads schema and port from it.
- It also allows user to create an instance with multuple initial nodes, but in such case it does not assume schema and port from the first record, instead it forces user to pick correct schema and port for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agreed that this version is better than your previous change - at least it's not wrong - but I don't understand why you are changing this API.
Your point 2 seems wrong - after this change (the one in this patch) it is no longer possible to create an instance with multiple initial nodes, you can only specify one URI liveNode
. If this is what we had all along it wouldn't have been a disaster, but before this patch we could specify a list of hostnames (strings) - how is this patch improving anything?
UPDATE: Oh, I see you adeed two different constructors - one with one URI, and one with a list of URIs. I'll comment on that below.
@@ -1,5 +1,7 @@ | |||
package com.scylladb.alternator.test; | |||
|
|||
import com.amazonaws.services.dynamodbv2.xspec.B; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a mistake (e.g., you typed B and pressed control-space) and got this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed.
550d691
to
a0db766
Compare
scylladb/scylladb#12147 implements a feature for `/locanodes` to filter out nodes by rack and/or datacenter. Now we can make dynamodb client to target particular rack and/or datacenter. This PR allows user to target dynamodb client to a particular rack/datacenter or rack+datacenter, via following API: ``` AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, datacenter, rack); ``` Changes in auxiliary `AlternatorLiveNodes` API: 1. Constructor input argument is an single URI now 2. Validation is introduced 3. Start is moved out from the constructor ``` liveNodes = new AlternatorLiveNodes(Collections.singletonList(seedURI), datacenter, rack); try { liveNodes.validate(); liveNodes.checkIfRackAndDatacenterSetCorrectly(); } catch (AlternatorLiveNodes.ValidationError | AlternatorLiveNodes.FailedToCheck e) { throw new RuntimeException(e); } liveNodes.start(); ``` 1. Regular, old code: `AlternatorRequestHandler handler = new AlternatorRequestHandler(uri)` for `6.2.0` and `master` 2. Dc+Rack-aware: `AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, "dc1", "rack1")` for `6.2.0` and `master` 3. Dc-aware: `AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, "dc1", "")` for `6.2.0` and `master` 4. Rack-aware: `AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, "", "rack1")` for `6.2.0` and `master` All tests are done for following deployments: 1. MultiDc 2. MultiRack+MultiDc 3. SingleDc
a0db766
to
68cb118
Compare
@nyh, there was an issue with github UI not picking up new commits from the branch, I managed to fix it by pushing whole branch, as it is to the |
I'll review this again now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have some comments about the code and about some of the changes to the existing code which I found unnecessary and perhaps even a change for the worse, but I'm still clicking "approve" and will merge this PR because I want this feature in, and if I understand none of the changes in this version are completely wrong, and most importantly: Because this code keeps changing every time I review it, and I want to merge something and then you can continue making changes to it. Please consider the comments I left in this PR - even after it's merged - as ideas for your continuing changes to this code.
liveNodes.checkIfRackAndDatacenterSetCorrectly(); | ||
if (!datacenter.isEmpty() || !rack.isEmpty()) { | ||
if (!liveNodes.checkIfRackDatacenterFeatureIsSupported()) { | ||
logger.log(Level.SEVERE, String.format("server %s does not support rack or datacenter filtering", seedURI)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code keeps changing every time I review it...
I see now if the datacenter/rack feature is need (according to the user's options) but not supported, you print a message to the log, but don't throw any exception and the caller won't even know that there is a problem. What's the point of doing that?
import java.net.URISyntaxException; | ||
import java.net.HttpURLConnection; | ||
import java.net.ProtocolException; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this wildcard import, and don't understand why you had to replace the explicit imports that were already in this code (Collections, List, ArrayList, Scanner) by such a wildcard.
I think we should try to focus more on completing the desired feature, and less on improving (?) the style of the code in unrelated ways.
} | ||
|
||
public AlternatorLiveNodes(URI uri) { | ||
this(uri.getScheme(), Collections.singletonList(uri.getHost()), uri.getPort()); | ||
public AlternatorLiveNodes(List<URI> liveNodes, String scheme, int port, String datacenter, String rack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this API is not wrong but I find it strange: You need to explicitly pass "scheme" and "port", and all the nodes that will be learned later from the "/localnodes" API will get those same scheme and port tacked onto them - but the initial nodes won't - those can come with different scheme or port. Moreover, each time we retried "/localnodes", we replace - not append - the previous list, so if the initial nodes have the "wrong" port or scheme and happen to work anyway, they will be replaced immediately by nodes using the one same port or scheme.
All in all, this API is not "wrong", but doesn't seem better than the previous API we had.
throw new RuntimeException("liveNodes cannot be null or empty"); | ||
} | ||
this.alternatorScheme = scheme; | ||
this.initialNodes = liveNodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still scratch my head every time I see this "initialNodes" - they are set but then never used again ("livenodes" is the one used), so it didn't need to be a class member. You could set just livenodes, not initialNodes. I know you tried to explain to me how it's better for testing, or something, but it still doesn't make sense to me.
conn.setRequestMethod("GET"); | ||
} catch (ProtocolException e) { | ||
// It can happen only of conn is already connected or "GET" is not a valid method | ||
// Both cases not true, os it should happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it should not happen.
Despite I still have concerns and comments (that I'd like you to please consider in the future), I merged this PR, I think it's already good enough to be used. One thing your patch doesn't change is the documentation java/README.md. The patch does change the demo codes (we have 3 versions of that...) to include the new feature, but not the document - so please consider changing the document too. Namely, the document should explain the existance of the dc and rack options, and the possibilities it opens up, in particular:
I don't think users need to guess that these feature exist, so the documentation should be updated. |
Closes #35
Core PR that introduced rack and dc filtering on
/localnodes
: scylladb/scylladb#12147This PR allows user to target dynamodb client to a particular rack/datacenter or rack+datacenter, via following API:
Changes in auxiliary
AlternatorLiveNodes
API:Tested
AlternatorRequestHandler handler = new AlternatorRequestHandler(uri)
for6.2.0
andmaster
AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, "dc1", "rack1")
for6.2.0
andmaster
AlternatorRequestHandler handler = new AlternatorRequestHandler(uri, "dc1", "")
for6.2.0
andmaster
All tests are done for following deployments: