-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle when etag changes on remote archive #130
Conversation
pmtiles/server.go
Outdated
r, _, err := server.bucket.NewRangeReaderEtag(ctx, name+".pmtiles", int64(header.MetadataOffset), int64(header.MetadataLength), rootValue.etag) | ||
if isRefreshRequredError(err) && len(purgeEtag) == 0 { | ||
purgeEtag = rootValue.etag | ||
goto start // TODO cleaner way to handle the retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we need to re-architect this a little bit to make the retry logic cleaner
what the JS code does: https://github.com/protomaps/PMTiles/blob/main/js/index.ts#L826
- in the
Get
for a single tile/tilejson/metadata request, we could expose the 412 error directly in the response code. - If that Get code sees a 412, it sends a new type of "invalidate" message to the single server thread. To optimize this should be a "promise" - it waits on a channel to know when that invalidation is done
- Once that goroutine receives a message on that channel it proceeds to make a 2nd attempt from scratch as if there was fresh information.
It remains the possibility that the 2nd attempt returns a 412, in that case I think it's legitimate to surface that to the end client. Otherwise we could easily create an infinite loop or DDOS with a badly behaving ETag system that returns a different strong Etag on every request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, splitting out getHeaderMetadataAttempt
and getTileAttempt
makes sense and would be a cleaner way to handle the 1 retry.
What do you think about having NewRangeReaderEtag
return a struct with reader, etag, reloadRequired? I'm hesitant to just pass through a response code since the on-disk implementation is going to need to check a different way.
And I think the promise invalidation works similarly, the second request gets made with purgeEtag
set to the etag that just turned out to be invalidated, so the first request purges it from the cache and kicks off a new request, the subsequent ones will use the cached new request. Do you think that's sufficient?
pmtiles/bucket.go
Outdated
return body, err | ||
} | ||
|
||
func (b HTTPBucket) NewRangeReaderEtag(_ context.Context, key string, offset, length int64, etag string) (io.ReadCloser, string, error) { | ||
reqURL := b.baseURL + "/" + key | ||
|
||
req, err := http.NewRequest("GET", reqURL, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this shouldn't use NewRequestWithContext
and pass through the context variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably use that instead.
pmtiles/server.go
Outdated
r, _, err := server.bucket.NewRangeReaderEtag(ctx, name+".pmtiles", int64(header.MetadataOffset), int64(header.MetadataLength), rootValue.etag) | ||
if isRefreshRequredError(err) && len(purgeEtag) == 0 { | ||
purgeEtag = rootValue.etag | ||
goto start // TODO cleaner way to handle the retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, splitting out getHeaderMetadataAttempt
and getTileAttempt
makes sense and would be a cleaner way to handle the 1 retry.
What do you think about having NewRangeReaderEtag
return a struct with reader, etag, reloadRequired? I'm hesitant to just pass through a response code since the on-disk implementation is going to need to check a different way.
And I think the promise invalidation works similarly, the second request gets made with purgeEtag
set to the etag that just turned out to be invalidated, so the first request purges it from the cache and kicks off a new request, the subsequent ones will use the cached new request. Do you think that's sufficient?
Change
pmtiles serve
to keep track of etag from the initial root request on remote archives (HTTP or S3) and make all subsequent requests to that archive withIf-Match: ${root_etag}
. If any of those requests fail with a 412 or 416 http status code (the remote archive changed) then purge all entries from the cache with that etag and refetch the root and any directories needed for that request at most once.Followups not included in this PR:
Fixes #17