-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Replace fixed test worker port with ephemeral ports #22748
base: master
Are you sure you want to change the base?
Conversation
Just want to leave my $.02 here -- We had similar problems at my old job and we had attempted this type of solution to grab free worker ports. Ultimately it ended up being more reliable to pick a fixed port number that is usually not used by the OS for our E2E integration tests. This type of port selection didn't work because after releasing the socket back to the OS, we found a race condition occurred quite often where we would get assigned the same port back to back or in close succession before the port is actually allocated to the new server's socket. I think in this case, since we don't launch workers in parallel, we probably won't run into this situation as often, but I do have a PR which parallelizes the launching that would probably cause issues (#22212). I think a better solution would let the worker bind to port 0, and then query the process internally for its assigned port once the socket is returned by the OS to the worker. |
@ZacBlanco Thank you for your comment. I also thought of the prestissimo side. We don't need to define a fixed port in the config. The worker will tell the coordinator how to reach it during the announcement. But not sure what would be needed for the HttpServer - we pass in the http/https config. I would need to look into it a bit more. |
92c4482
to
751b140
Compare
751b140
to
ca3fa13
Compare
ca3fa13
to
3ea39ef
Compare
@aditi-pandit FYI. I was thinking to add something to the C++ documentation about the |
@czentgr : Didn't entirely follow the reference in the question ? What do you want to document about http-server.port |
f424676
to
f835b46
Compare
f835b46
to
1696160
Compare
1696160
to
f7da189
Compare
c1066d6
to
eac7520
Compare
@aditi-pandit I was asking about the documentation to say that you can set the port options in the config to 0 and the OS will pick a port automatically. |
eac7520
to
34cecf6
Compare
34cecf6
to
4fd5343
Compare
@majetideepak @aditi-pandit Do you think this can gain traction? |
@czentgr : I like the direction you are proposing in the fix. Though I don't really hear any complaints about the current behavior either. So I feel that we can skip it. |
@aditi-pandit I found myself in need of this behavior--just for example, running E2E tests and a query runner simultaneously would fail without picking random ports. If there's no objections, I'd like to get this change in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr : Please add documentation for the ephemeral port configuration.
We could use the PortUtil utility. If the user specifies http.port=0 and/or https.port=0 we could then call the port utility to provide two ports (these are also from the ephemeral pool) and use these for the announcer and task uri. But it is possible that these are in use by the time the http server comes up. The http server itself passes the port number (0) to the OS when creating the listener and the OS also picks an ephemeral port. It does the same thing the utility does with respect to how the ephemeral port is assigned except there is no time gap. This does then necessitate the change to defer creation of the announce message body and task uri where the ports are used to after the http server has been started. |
4fd5343
to
727d3c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @czentgr
@amitkdutta @spershin : Would be great to get Meta approval as well. |
727d3c3
to
7d6e5ca
Compare
@amitkdutta @spershin Can you please take a look? |
7d6e5ca
to
433e892
Compare
433e892
to
5be4407
Compare
@amitkdutta @tangjiangling @xiaoxmeng Do you think you have comments on this. Or do you think we can proceed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr thanks for the change % comments.
} else { | ||
taskUri = fmt::format(kTaskUriFormat, kHttp, address_, httpPort); | ||
} | ||
auto setTaskUri = [&](bool useHttps, int port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to where it is first used in the code? thanks!
nit: s/setTaskUri/setTaskUriCb or setTaskUriFn/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving this (and the other lambda) before the httpServer start.
I'm thinking of moving the httpServer lambda into it's own function too if we have to move this inside because things get a bit less readable if we put these (longer) lambdas into the server lamdba as well.
uint64_t heartbeatFrequencyMs = systemConfig->heartbeatFrequencyMs(); | ||
if (heartbeatFrequencyMs > 0) { | ||
heartbeatManager_ = std::make_unique<PeriodicHeartbeatManager>( | ||
auto startAnnouncerAndHeartbeatManager = [&](bool useHttps, int port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
nit: s/startAnnouncerAndHeartbeatManager/startAnnouncerAndHeartbeatManagerCb/
break; | ||
} | ||
if (coordinatorDiscoverer_ != nullptr) { | ||
VELOX_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VELOX_CHECK_NOT_NULL
setTaskUri(httpsPort.has_value(), address.address.getPort()); | ||
break; | ||
} | ||
if (coordinatorDiscoverer_ != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave an empty line?
"The announcer is expected to have been created but wasn't."); | ||
const auto heartbeatFrequencyMs = systemConfig->heartbeatFrequencyMs(); | ||
if (heartbeatFrequencyMs > 0) { | ||
VELOX_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
uint64_t heartbeatFrequencyMs = systemConfig->heartbeatFrequencyMs(); | ||
if (heartbeatFrequencyMs > 0) { | ||
heartbeatManager_ = std::make_unique<PeriodicHeartbeatManager>( | ||
auto startAnnouncerAndHeartbeatManager = [&](bool useHttps, int port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this change to support ephemeral port? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The announcer constructs the announcement message with the port for the coordinator when it is instantiated. This is the port then used by the coordinator to communicate to the worker. That means we have to construct the announcer after we know what port we need to use in the announcement message.
Similarly, the HeartbeatManager sets the address and port of the listener in the header of a request via HTTP_HEADER_HOST. And since the port is determined on listener start (if not specified) we have to construct it later and provide the known port at that time.
@amitkdutta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Same comments as @xiaoxmeng. Othewise imported internally to run e2e tests, looks good to me. |
Previously the listener ports for the native works in the E2E tests was hard coded to be 1234 + worker count. The change looks in the OS for an available ephemeral port and uses this value when spawning the native workers. The native worker must is deferring some configuration until the port selection by the OS is known.
8f0eb84
5be4407
to
8f0eb84
Compare
Previously the listener ports for the native works in the E2E tests was hard coded to be 1234 + worker count.
The change looks in the OS for an available ephemeral port and uses this value when spawning the native workers.
Description
Motivation and Context
On my Mac I encountered problems running the E2E native tests. The worker was up and running and listened on the port. Yet for some reason the HTTP request timed out. The connection was set up but there was no response.
Connection could occur but each request was eaten. Changing the port to a different port resolved the issue.
and in the logs
continuously until the test case fails.
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.