Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly close files when uploading them. #174

Closed
wants to merge 1 commit into from

Conversation

plietar
Copy link
Member

@plietar plietar commented Sep 3, 2024

When uploading files to outpack, orderly was consistently leaking file handles. While these handles would eventually be garbage collected and closed, the GC would print a lot of noise onto the console:

Warning messages:
1: closing unused connection 10 (/tmp/RtmpN8ck5L/filecc11f7514a5e7)
2: closing unused connection 9 (/tmp/RtmpN8ck5L/filecc11f1c3c766d/archive/data/20240903-173208-56e33176/data.rds)
3: closing unused connection 8 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/c/20240903-173208-03afd21d/script.R)
4: closing unused connection 7 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/c/20240903-173208-03afd21d/data.rds)
5: closing unused connection 6 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/b/20240903-173207-e4d60aab/script.R)
6: closing unused connection 5 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/b/20240903-173207-e4d60aab/data.rds)
7: closing unused connection 4 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/a/20240903-173207-d2210a12/data.rds)

This seems to be like an httr/httr2 bug (the behaviour was present both before and after the upgrade), but I cannot find much reference to this issue online. There's a brief mention of it in the httr2 source, saying it "leaks connection if [the] request doesn't complete", however I am seeing this behaviour even for succesful requests.

The httr2 implementation does not make much sense to me actually: it only closes the connections on short reads, but as far as I can tell libcurl always provides an accurate nbytes argument to the callback, meaning short reads never actually occur.

This works around that issue by not using httr2's implementation of file upload, and instead implements an alternative where the caller provides and already opened connection. It is the caller's responsibility to close it after the request is complete. The only place we use this, orderly_location_http$put_file, does so using a withr::defer call to ensure this happens across all flow paths, even errors.

I don't think we can easily write tests for this, since the garbage collector closes the handles for us already. The GC warnings don't get detected by evaluate_promise or expect_silent.

When uploading files to outpack, orderly was consistently leaking file
handles. While these handles would eventually be garbage collected and
closed, the GC would print a lot of noise onto the console:

```
Warning messages:
1: closing unused connection 10 (/tmp/RtmpN8ck5L/filecc11f7514a5e7)
2: closing unused connection 9 (/tmp/RtmpN8ck5L/filecc11f1c3c766d/archive/data/20240903-173208-56e33176/data.rds)
3: closing unused connection 8 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/c/20240903-173208-03afd21d/script.R)
4: closing unused connection 7 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/c/20240903-173208-03afd21d/data.rds)
5: closing unused connection 6 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/b/20240903-173207-e4d60aab/script.R)
6: closing unused connection 5 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/b/20240903-173207-e4d60aab/data.rds)
7: closing unused connection 4 (/tmp/RtmpN8ck5L/filecc11f22b55a39/archive/a/20240903-173207-d2210a12/data.rds)
```

This seems to be like an httr/httr2 bug (the behaviour was present both
before and after the upgrade), but I cannot find much reference to this
issue online. There's a brief mention of it in the
[httr2 source][req-body.R], saying it "leaks connection if [the] request
doesn't complete", however I am seeing this behaviour even for succesful
requests.

The httr2 implementation does not make much sense to me actually: it
only closes the connections on short reads, but as far as I can tell
libcurl always provides an accurate `nbytes` argument to the callback,
meaning short reads never actually occur.

This works around that issue by not using httr2's implementation of file
upload, and instead implements an alternative where the caller provides
and already opened connection. It is the caller's responsibility to
close it after the request is complete. The only place we use this,
`orderly_location_http$put_file`, does so using a `withr::defer` call to
ensure this happens across all flow paths, even errors.

I don't think we can easily write tests for this, since the garbage
collector closes the handles for us already. The GC warnings don't get
detected by `evaluate_promise` or `expect_silent`.

[req-body.R]: https://github.com/r-lib/httr2/blob/0b795b664ce652c5bfeaf81b7048df389ec05689/R/req-body.R#L210
@plietar plietar force-pushed the no-leak-connections branch from 27c3dbb to 000860c Compare September 3, 2024 18:08
@plietar
Copy link
Member Author

plietar commented Sep 3, 2024

Standalone reproduction of the httr2 bug:

> httr2::request("https://example.com") |> httr2::req_body_file("/dev/null") |> httr2::req_perform()
<httr2_response>
POST https://example.com/
Status: 200 OK
Content-Type: text/html
Body: In memory (1256 bytes)

> gc()

           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  6221410 332.3    9245604 493.8  9245604 493.8
Vcells 37136790 283.4   64900073 495.2 42431065 323.8
Warning message:
In .Internal(gc(verbose, reset, full)) :
  closing unused connection 3 (/dev/null)

I guess I should report that upstream

Upstream issue: r-lib/httr2#534

@plietar plietar requested a review from M-Kusumgar September 4, 2024 10:01
@plietar
Copy link
Member Author

plietar commented Sep 4, 2024

I'm actually gonna close this. Reported upstream and there's a fix on the way.

Until a new httr2 version is released it's only a mild inconvenience anyway, and from what I can tell only happens if the user has a very recent version of libcurl.

@plietar plietar closed this Sep 4, 2024
@plietar plietar deleted the no-leak-connections branch September 4, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant