Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 16k file bucket #143

Merged
merged 4 commits into from
Feb 25, 2024
Merged

Fix 16k file bucket #143

merged 4 commits into from
Feb 25, 2024

Conversation

bdon
Copy link
Member

@bdon bdon commented Feb 21, 2024

@msbarry

currently archives less than 16K don't work with file buckets (maybe same for HTTP depending on server)

The initial IO is for 16384 bytes to save roundtrips over the network. This is a useful optimization, and it doesn't seem worth special-casing non-network archives to do something different (do one IO for a few bytes to then get the exact length...)

However doing a ReadAt for 16384 bytes will read the entire archive and then raise an EOF error. This PR special cases range requests at offset 0 to allow shorter responses. I don't love this because it adds hidden hardcoded behavior based on offset=0.

Alternative: add another boolean parameter to NewRangeReaderEtag that's like allowShortResponse bool to make this behavior explicit. Thoughts?

@bdon bdon mentioned this pull request Feb 21, 2024
@msbarry
Copy link
Contributor

msbarry commented Feb 21, 2024

It looks like HTTP/gocloud buckets don't care if your byte range request goes past the end of the file? To match that behavior, maybe we could just remove the && offset == 0 check?

Also make sure you update mockBucket, and you can remove the extra logic I added to fakeArchive to pad if <16kb.

@bdon
Copy link
Member Author

bdon commented Feb 22, 2024

To match that behavior, maybe we could just remove the && offset == 0 check?

It should be OK for files, however on HTTP we need to special-case it - right now it's caught by isRefreshRequiredCode

If we are treating ETags correctly and the server correctly implements conditional requests we should never see a 416, only 412s when etags are actually expired. The only case where we see a 416 is the <16kb case, and we don't even get the N successfully read bytes. So our HTTP bucket should probably detect 416 and retry. A bit ugly...

@msbarry
Copy link
Contributor

msbarry commented Feb 22, 2024

I didn't think it was an etag thing, I meant that HTTP server range requests return the overlap of the requested range and the actual data range. I've tried on a few different HTTP servers and when you request from within the file past the end of it, it always returns a 206. 416 is only when the start of the range is > the file length. Are you saying that caddy returns that error when backed by an HTTP bucket or file bucket?

The 416 error code spec indicates that it should only be thrown when none of the ranges "overlap the current extent"

@bdon bdon force-pushed the fix-16k-file-bucket branch from 0567099 to e0f80f9 Compare February 25, 2024 14:34
@bdon
Copy link
Member Author

bdon commented Feb 25, 2024

@msbarry ok, updated

@@ -60,11 +60,15 @@ func (m mockBucket) NewRangeReaderEtag(_ context.Context, key string, offset int
if len(etag) > 0 && resultEtag != etag {
return nil, "", 412, &RefreshRequiredError{}
}
if offset+length > int64(len(bs)) {
if offset > int64(len(bs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be >=

@bdon bdon merged commit 5e5daa7 into main Feb 25, 2024
4 checks passed
@bdon bdon deleted the fix-16k-file-bucket branch February 25, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants