-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for opting out of streaming HTTP bodies #68
Conversation
Would like to avoid exposing the buffer size in the API since those are low level implementation details. Personally I prefer a new initializer such as |
One terminology for this in a recent draft is "incremental", e.g. https://datatracker.ietf.org/doc/draft-kazuho-httpbis-incremental-http/00/ So maybe |
Alright, I understand that we want to hide the parameter details of the streamed version. I'll adjust the PR. As for naming, while "incremental" is a good name, it makes naming the other case difficult - I am not the biggest fan of "negative" naming, which "nonIncremental" would be. Why not stick to My suggestion: /// Specifies the mode in which HTTP request and response bodies are processed.
public enum HTTPBodyProcessingMode {
/// Processes the HTTP body incrementally as bytes become available.
///
/// Use this mode to handle large payloads efficiently or to begin processing
/// before the entire body has been received. Will throw a `URLSessionTransportError.streamingNotSupported`
/// error if not available on the platform.
case streamed
/// Waits until the entire HTTP body has been received before processing begins.
///
/// Use this mode when it's necessary or simpler to handle complete data payloads at once.
case buffered
} Then create a new public init, additionally to the existing one: In the init, we then either initialize with the .buffered Implementation or with the .streaming Implementation with the hardcoded buffer size values accordingly. This way developers can choose between .streamed and .buffered themselves, but cannot use bad buffer sizes by mistake. The existing init remains and handles the case where devs didn't specify anything, in which case the platformDefault continues to be selected, so nothing changes for existing users or users that do not manually provide a choice. |
@guoye-zhang adjusted the PR based on your comments. Chose buffered and streamed as the names based on my previous comment, but if you prefer incremental and nonIncremental I can adjust it again. |
Looks fine. Please run |
Done. |
@simonjbeaumont What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjbeaumont What do you think?
I'm not sure about the motivation as a debugging mechanism since it changes the codepath significantly from what would actually be used by default. I'm also concerned about offering such a tempting option for folks who don't consider the implications properly since buffering the whole response body can have a significant impact for large requests.
Do we have an understanding of why this is necessary before we make it?
I haven't used Pulse myself, but it seems to have several documented operating mode. FWICT it provides both a URLSessionProxy
and a URLSessionProxyDelegate
—presumably you have tried providing a session that has this delegate installed?
Potentially the issue here is that the extension that this OpenAPI transport provides, URLSession.bidirectionalStreamingRequest(...)
is installing it's own BidirectionalStreamingURLSessionDelegate
as the task delegate. Presumably this means that there are methods on the Pulse delegate that are no longer called.
@guoye-zhang might have some advice here, but we might be able to layer our delegate with whatever existing delegate might be there too, which might avoid the need for this.
If we do go this route, then I provided some feedback on the use of an enum in public API.
It's also changed the behaviour for users trying to use streaming on unsupported platforms. Before, this would fail at the initializer and now it fails on the first request. This is because you provided a new public initializer that bypassed the existing initializer. I'd prefer we refactor to delegate all the calls through one initializer to consolidate the platform support checks.
@@ -70,6 +70,31 @@ public struct URLSessionTransport: ClientTransport { | |||
/// - Parameter session: The URLSession used for performing HTTP operations. | |||
/// If none is provided, the system uses the shared URLSession. | |||
public init(session: URLSession = .shared) { self.init(session: session, implementation: .platformDefault) } | |||
/// Specifies the mode in which HTTP request and response bodies are processed. | |||
public enum HTTPBodyProcessingMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally do not use enums in public API because, right now, they are not extensible without breaking API. Instead for enum-like public API, we wrap an internal enum in a public struct with public static properties.
I'm a bit surprised about the pushback here, so maybe I'm misunderstanding something here: To my knowledge, iOS urlsession data tasks use "buffered" mode per default. If you go to typical tutorial sites such as https://www.avanderlee.com/concurrency/urlsession-async-await-network-requests-in-swift/ or even apple's own tutorial code from https://developer.apple.com/videos/play/wwdc2021/10095 developers are taught to do this: ![]() which as far as I understand is a buffered request? For pretty much any non-big request type, this makes perfect sense: things like json aren't usually live parsed, they are only parsed once all data arrives? So for me, the "streaming" case, the one that swift-openapi-urlsession prefers, is the outlier. Don't get me wrong, I understand that it is better - but shouldn't we allow developers to choose the other option if they need it, especially because its the usual client default for iOS developers using URLSession? If you're really worried we could add more documentation, but developers have to go out of their way to change modes, when no mode is specified, the "best" option is chosen for them as it always was. This change has no effect on existing projects. I agree with you that finding a way of making the streaming path work with things like pulse and wormholy is an even better option - but I'm worried that no-one has the time to actually go down that rabbit hole - and so blocking this "quick fix" will lead to no solution for the near future? https://github.com/kean/Pulse has 6.5k stars on Github, wormholy has 2.5k - so I'd argue they are not some niche thing no-one uses - and therefore a fix for this would be good... All my arguments hinge on my understanding of iOS's Happy to the make adjustments to remove the public enum if this PR has a chance to be merged. |
Before we go further, I didn't mean my response to be considered pushing back on getting a solution here, or getting one in a timely manner. Asking if we this could/should work with the default is the responsible thing to do. This also does not mean it's an either-or: it would be good to investigate a solution regardless of whether we merge this PR.
The Swift OpenAPI ecosystem chose a streaming-first approach so it could be suitable for both servers and clients since there are plenty of use cases where streaming will be a requirement, both for server-side and on-device use cases. Putting aside this discovery that the current implementation is not composing with other tooling, I'm not aware why anyone would express buffering as a requirement over streaming. Returning to this issue: that this doesn't work is meaningful and if there are ways to address it holistically (by altering the implementation) we should.
I'm sympathetic to offering an escape hatch, in the event we identify that the streaming mode cannot be fixed or even if we identify that such a fix would take too much time.
You have understood that API correctly, but that API is just one of many in URLSession for making requests. There are APIs for doing streaming too and, for the reasons above, we chose to build on those.
Thank you. If there's no timely solution for streaming, then of course. |
Oh yeah, forgot about the problems with enums. It's probably better to add a dedicated initializer |
Thanks for the clarification. I have adjusted the PR to remove the introduced public enum and use public struct instead, and unified the init as requested. |
@hactar thank you for some patience, since I had never tried to use Pulse before. I have now followed the documentation and been able to integrate it in the HelloWorldiOSClientApp example without making any changes to the Swift OpenAPI URLSession transport. There appears to be several documented approaches in Pulse:
Which one you can use will depend on what other components and libraries are in your stack. Using the Using the Using
This currently doesn't work because we do not call the session-wide delegate in addition to the task delegate. The Pulse I was however able to get this working with a patch to the OpenAPI URLSessionTransport, where we call into the session-installed delegate in addition to the internal task-installed delegate. This may be something we want to explore. For example, adding the following to the (session.delegate as? any URLSessionDataDelegate)?.urlSession?(session, dataTask: dataTask, didReceive: data) I also note one more option in the Pulse documentation, which is to use the To summarise I think we have two options here:
We'd need to consider the implications of (1) since it might not make sense to support all callbacks and for (2) we could provide the callbacks that would support this. WDYT @czechboy0 @hactar @guoye-zhang EDIT: looking closer at the docs (https://kean-docs.github.io/pulse/documentation/pulse/networklogging-article#Option-4-Manual) I think we'd only need to call the following methods from the session delegate:
So I'm more in the option (1) camp. I'll see if that works. |
The URLSession documentation says that the task delegate is called before the session delegate:
Does this mean that URLSession itself should call the session delegate after the task delegate for each of the callbacks or that the task delegate should do this? //cc @guoye-zhang |
Looks like these need to be manually called: func test() async throws {
final class Delegate: NSObject, URLSessionDelegate, URLSessionTaskDelegate, URLSessionDataDelegate {
let name: String
let finished: XCTestExpectation
init(name: String) {
self.name = name
self.finished = XCTestExpectation(description: name)
}
func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive data: Data) {
print("\(name): \(#function)")
guard let outie = (session.delegate as? any URLSessionDataDelegate), outie !== self else { return }
outie.urlSession?(session, dataTask: dataTask, didReceive: data)
}
func urlSession(_ session: URLSession, task: URLSessionTask, didFinishCollecting metrics: URLSessionTaskMetrics) {
print("\(name): \(#function)")
guard name == "innie" else { return }
guard let outie = (session.delegate as? any URLSessionTaskDelegate), outie !== self else { return }
outie.urlSession?(session, task: task, didFinishCollecting: metrics)
}
func urlSession(_ session: URLSession, task: URLSessionTask, didCompleteWithError error: Error?) {
print("\(name): \(#function)")
self.finished.fulfill()
guard let outie = (session.delegate as? any URLSessionTaskDelegate), outie !== self else { return }
outie.urlSession?(session, task: task, didCompleteWithError: error)
}
}
let outie = Delegate(name: "outie")
let innie = Delegate(name: "innie")
let session = URLSession(configuration: .default, delegate: outie, delegateQueue: nil)
let task = session.dataTask(with: URL(string: "http://localhost:8080/api/greet?name=Si")!)
task.delegate = innie
task.resume()
await fulfillment(of: [innie.finished, outie.finished])
}
|
OK, so I think this can be addressed with this pretty targeted patch to make Pulse work for streaming:
If we determined this a sensible thing to do, with or without a config option, presumably that would suffice @hactar ? |
@simonjbeaumont Thank you for taking the time to do the deep dive. It's very appreciated.
I've been using the PulseProxy module via I did also try the URLSessionProxy option for testing, because it does expose its session via .session, and I tried feeding that to the URLSessionTransport.Configuration - but all requests end up in a pending state here. I did not try URLSessionProxyDelegate as I didn't see a way of injecting this delegate into the OpenAPI URLSession - and the delegate actually being called. Not in front of my dev computer at the moment, but will try your |
Are you saying this did not work for you? This one worked fine for me with no changes. |
The phrasing is a bit misleading. If a method is implemented on the task delegate, it won't be called on the session delegate. |
Let's also forward the |
So I've tested adding @simonjbeaumont and @guoye-zhang proposed outie (is this a Severence reference? 😊) changes to BidirectionalStreamingURLSessionDelegate. The good news is that the requests are now no longer pending in the log, but are correctly marked as complete or failed - so thats fixed. The bad news is that, as with the PulseProxy option, request data and response data does not correctly appear in the PulseUI - like in the .streamed version of the screenshots above. |
@hactar thanks for giving it a try and reporting back. I have now reproduced what you're seeing. When I tested previously it was with a This all adds up since we have this logic inside the OpenAPI URLSession transport: let task: URLSessionTask
if requestBody != nil {
task = uploadTask(withStreamedRequest: urlRequest)
} else {
task = dataTask(with: urlRequest)
} IMO we've now found the local maximum for how we can play nicely with Pulse but that it won't allow the same visibility as using buffered URLSession APIs. I'm pretty confident based on what we've found so far that this would be the same if an adopter was driving the URLSession streamed APIs manually though, and that the situation is not worsened by the use of the OpenAPI transport layer. So, I think the current open questions are:
I'm open to both of these changes. If you can confirm that (1) makes no meaningful improvement over doing nothing (because folks can still use the I appreciate you giving us the time to think this through. WDYT @guoye-zhang @czechboy0 ? |
I have no strong opinion here. It's an improvement, but doesn't fully work. As Pulse is a debugging tool, developers would most likely need to see the whole request - a solution that goes half the way is of course better than nothing, but we appear to have an alternative to offer "full" support via buffering. Also, streaming really becomes the better option for larger amounts of data - usually in binary form. Seeing that in Pulse is probably less useful, Pulse is more about debugging human readable data thats being transmitted. I therefore I would try to answer two questions:
If the answer is no to both, then sure, why not. But else, maybe the disadvantages would then outweigh the advantages for something that might not be that useful to developers anyway?
I would argue yes, especially as we found no full solution via path (1). I don't think its likely that developers would enable buffered mode "by mistake" as streaming is still the default where available - and even they do, unless they are transmitting large amounts of data, I'd assume the actual performance difference is negligible - especially for clients. I don't think that when someone is posting 100 KB json, and receiving a 100 KB json response, buffering is going them hit me hard (especially since thats what most devs are taught to do client side on iOS - I was actually pleasantly surprised to see that openapi generator streams here). For some platforms there isn't a streamed implementation, and in these cases buffered seems to be doing a good enough job - so I don't think buffering is that much of a dangerous option.
For my projects, fixing (1) would have the advantage that I wouldn't have to remember to switch to .buffered mode for debugging - but thats pretty much it. The underlying issue (debugging network calls with a good UI) is fully solved by (2) - and I can even add a switch to my apps to enable "buffered debug mode" when I want to debug network calls, while using .streamed in production mode where I don't want to enable Pulse anyway.
The appreciation is mutual. It's very interesting watching the discussion and seeing the proposed changes for (1), I'm learning a lot about the fine details of URLSession. |
Now that we seem to have a pretty good understanding of the options, I'm happy for us to add a config option to force buffering. I didn't love the spelling of the enum, which would allow for things like asking for streaming even on platforms that don't support it, but if we spelled it e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's go ahead and support opting out of streaming.
I left one comment on the current state of the patch, which I think has something left over from an earlier design.
The remaining question of is around the API and whether we allow users to request something that would cause a runtime error.
Previously this enum was entirely internal and, while selecting streaming on a platform that doesn't support it would throw an error, there was no public API to do so, so no user code could lead to this.
Should we prefer an enum-like option that supports just .platformDefault
, and .buffering
; and NOT .streaming
in this case?
This comment is expressing similar, presuming I read it correctly, @czechboy0?
return .defaultStreaming | ||
} | ||
static var defaultStreaming: Self { | ||
.streaming( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the patch seems unnecessary now?
Yup that's great. |
Cool. I'll make the proposed changes. |
Done. |
Tests should be fixed now - I re-introduced the private init, as thats required for the tests which want for force .streamed, which is not possible via our public init. BTW, on my mac, to be able to run tests locally, I had to add |
@@ -67,32 +67,52 @@ public struct URLSessionTransport: ClientTransport { | |||
public var session: URLSession | |||
|
|||
/// Creates a new configuration with the provided session. | |||
/// - Parameter session: The URLSession used for performing HTTP operations. | |||
/// If none is provided, the system uses the shared URLSession. | |||
public init(session: URLSession = .shared) { self.init(session: session, implementation: .platformDefault) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new parameters to existing functions is an API break, as caught by CI.
1 breaking change detected in OpenAPIURLSession:
💔 API breakage: constructor URLSessionTransport.Configuration.init(session:) has been removed
We'd need to preserve this symbol that calls through to the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the issue.
The new API provides public init(session: URLSession = .shared, httpBodyProcessingMode: HTTPBodyProcessingMode = .platformDefault)
- and as there is a default parameter for httpBodyProcessingMode, doesn't Swift auto generate a public init(session: URLSession)
for us?
The following code in my app compiles fine:
func testFunction() async throws {
let sessionConfiguration = URLSessionConfiguration.default
let otherSession = URLSession(configuration: sessionConfiguration)
let transportConfiguration = URLSessionTransport.Configuration.init(session: .shared)
let otherTransportConfiguration = URLSessionTransport.Configuration.init(session: otherSession)
let transport1 = URLSessionTransport(configuration: transportConfiguration)
let transport2 = URLSessionTransport(configuration: otherTransportConfiguration)
let client1 = Client(serverURL: URL(string: "https://example.com")!, transport: transport1)
let client2 = Client(serverURL: URL(string: "https://example.com")!, transport: transport2)
_ = try await client1.get_sol_Appversions_sol_All()
_ = try await client2.get_sol_Appversions_sol_All()
}
So to me, there still is an (implicit) URLSessionTransport.Configuration.init(session:)
- isn't this API breakage error a false positive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a false positive. It's possible to refer to the function itself (e.g. let initializer = URLSessionTransport.Configuration.init(session:)
) which won't work if the function is no longer present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Interesting, I wasn't aware of this limitation of default parameter values. I have added back the old single parameter init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks for contributing here @hactar — both with the change and to the discussion.
Motivation
swift-openapi-urlsession offers both a streaming and buffering implementation for sending/receiving data. Which implementation is used is currently based on the platform its running on, mainly because "streaming" is not supported on all platforms. It is currently not possible to "force" buffering, even if one wants/needs it.
While adding https://github.com/kean/Pulse to my app, a tool for debugging network requests, I noticed that Pulse was not showing request bodies (apple/swift-openapi-generator#749) - because Pulse appears not do work with the streamed data mode. Neither does https://github.com/pmusolino/Wormholy.
I have now made switching .buffered mode public, allowing developers to choose which mode they want to work with. This fixes the issue I have with Pulse - which previously was not showing any request body.
Modifications
I made Implementation, platformDefault and an already existing init public. I had to make the Implementation enum Sendable, else Xcode produced a warning.
Result
There are no functional changes, other than making previously private functionality public. Technically someone could now manually try to set "streaming" mode on a platform that doesn't support it, but then an error is thrown correctly.
Test Plan
Tested the change in my own app: switching to buffered mode was now possible, and now tools like Pulse work correctly.