-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container should exit with non-zero if auto-config key changed #177
Comments
That's because the Relay Proxy maintains a streaming connection LaunchDarkly for auto-config information updates, not unlike the other streaming connection that is used to get flag data updates. This allows it to immediately detect changes like adding or removing an environment from your auto-config profile on the dashboard. If you change the key, then LaunchDarkly immediately closes that connection— since you would presumably not want anyone who had previously connected with that old key to keep on receiving SDK keys and other information via the auto-config stream. That would be a security violation. Anyway... regarding your suggestion that the Relay Proxy should quit if this happens, I think we'll have to discuss it internally. Personally I'm not convinced that this is right— I mean, on the one hand the Relay Proxy is not going to be able to keep functioning in any useful way if this has happened, but on the other hand, we have never before defined any circumstance in which it would just terminate the process after having previously had a successful startup, no matter what happened, so this might be surprising/undesirable behavior for some users. What I'm not sure about is whether this condition is currently detectable via external monitoring. I mean, the error log message is meant to help get your attention in that regard, but if for instance the |
The
|
Well, there are three different issues here. First, yes, we want this failure mode to be discoverable in some way. Second, regarding the HTTP semantics of the /status resource. It's been consistently the behavior of the Relay Proxy that the However, since the Relay Proxy is capable of doing more than one thing, this raises a third question of how broken would it have to be before we say "it's bad, kill it." I think in fact I was wrong in my previous comment to say that "the Relay Proxy is not going to be able to keep functioning in any useful way if this has happened [i.e. if the auto-config key has been changed]". It still possesses valid SDK keys of whatever environments it was using up till that point (assuming those keys have not also been rotated). It has only lost the ability to automatically detect changes to the configuration. It is not self-evident that it ought to stop serving flags to existing clients in this case; maybe that's what you want, but it isn't necessarily what everyone would want. There is already a different way to indicate "I don't want existing services that possess this SDK key to be able to continue to get flags", and that is to rotate the SDK key. |
So I think we will need to do some more internal discussion and the answer may be that this should be configurable behavior. Our customers have many different approaches to deployment and monitoring. |
I tested the ability of the LDRP to relay feature flags once the auto-config key has changed, using this example repo: The LDRP is not able to serve feature flag values and requests return with an error. |
Well, at this point I'm pretty confused. I'm the primary maintainer of the Relay Proxy code, but there's a lot of it so I certainly may have missed or forgotten something. And I haven't yet tried to replicate the test you described. But from my reading of the code right now... I do not see a way for it to do what you're saying it does. It looks to me like it only deactivates an environment if LaunchDarkly specifically told it that that environment was removed from the configuration, or that that SDK key (as in the environment key, not the auto-config key) was deactivated, and I don't see any code path from "auto-config stream was disconnected and then got a 401 error" to deactivating the environments, even if it would arguably be logical for it to do so. The code I'm looking at seems to match the intended behavior as I remembered it, which was "quit on startup if the auto-config key is bad, but after that, any errors on the auto-config stream just mean we won't get updates any more." |
I am happy to get on a Zoom call to test together. |
@barakbd Thank you but I do need to begin with some testing on my own here, as well as further review of the code— and also, it's not clear yet how high a priority this is, not to mention that there is still not agreement on the desired behavior. So I can't carve out time to collaborate closely like that. |
Any updates please? |
Hello @barakbd, thank you for reaching out to us, I think it will be helpful to understand your use case a bit more through a support ticket, do you mind creating a support ticket with us? |
Support ticket create: #20701 |
@eli-darkly - I just retested and you are correct. The LDRP continues to serve requests, until the container is killed and tres to restart, at which point it fails with a non-zero code. It seems this is the desired behavior, however I can see this causing confusion when someone tries to update the configuration (add/remove Project/Environment) and the LDRP would not respect the new configuration. |
Describe the bug
Container does not exit when auto config key is incorrect due to a configuration reset.
To reproduce
Expected behavior
The container should exit with non-zero code. See related issue: #165
Logs
** Docker Image**
https://hub.docker.com/layers/launchdarkly/ld-relay/latest/images/sha256-8932028a51c815a016b73a265c5102ec9bfa99125464a9470e6e44768601a4df?context=explore
Notes
I noticed that ERROR log was generated even without making a connection a feature flag request from an SDK. Does the relay proxy send a heartbeat or does the server make a call to the LDRP? In other words, wow does the LDRP know the auto-config key changed?
The text was updated successfully, but these errors were encountered: