Handle when etag changes on remote archive #130

msbarry · 2024-02-04T20:13:35Z

Change pmtiles serve to keep track of etag from the initial root request on remote archives (HTTP or S3) and make all subsequent requests to that archive with If-Match: ${root_etag}. If any of those requests fail with a 412 or 416 http status code (the remote archive changed) then purge all entries from the cache with that etag and refetch the root and any directories needed for that request at most once.

Followups not included in this PR:

dedicated file handler that doesn't use gocloud
integration test in CI that spins up minio and runs tests where the file changes against that
handle azure etags
prometheus counters to track etag reload events

Fixes #17

pmtiles/bucket.go

bdon · 2024-02-05T08:29:28Z

pmtiles/server.go

+	r, _, err := server.bucket.NewRangeReaderEtag(ctx, name+".pmtiles", int64(header.MetadataOffset), int64(header.MetadataLength), rootValue.etag)
+	if isRefreshRequredError(err) && len(purgeEtag) == 0 {
+		purgeEtag = rootValue.etag
+		goto start // TODO cleaner way to handle the retry?


yeah I think we need to re-architect this a little bit to make the retry logic cleaner

what the JS code does: https://github.com/protomaps/PMTiles/blob/main/js/index.ts#L826

in the Get for a single tile/tilejson/metadata request, we could expose the 412 error directly in the response code.

If that Get code sees a 412, it sends a new type of "invalidate" message to the single server thread. To optimize this should be a "promise" - it waits on a channel to know when that invalidation is done

Once that goroutine receives a message on that channel it proceeds to make a 2nd attempt from scratch as if there was fresh information.

It remains the possibility that the 2nd attempt returns a 412, in that case I think it's legitimate to surface that to the end client. Otherwise we could easily create an infinite loop or DDOS with a badly behaving ETag system that returns a different strong Etag on every request.

Sounds good, splitting out getHeaderMetadataAttempt and getTileAttempt makes sense and would be a cleaner way to handle the 1 retry.

What do you think about having NewRangeReaderEtag return a struct with reader, etag, reloadRequired? I'm hesitant to just pass through a response code since the on-disk implementation is going to need to check a different way.

And I think the promise invalidation works similarly, the second request gets made with purgeEtag set to the etag that just turned out to be invalidated, so the first request purges it from the cache and kicks off a new request, the subsequent ones will use the cached new request. Do you think that's sufficient?

pmtiles/bucket.go

msbarry · 2024-02-05T10:47:35Z

pmtiles/bucket.go

+	return body, err
+}
+
+func (b HTTPBucket) NewRangeReaderEtag(_ context.Context, key string, offset, length int64, etag string) (io.ReadCloser, string, error) {
 	reqURL := b.baseURL + "/" + key

 	req, err := http.NewRequest("GET", reqURL, nil)


Is there a reason why this shouldn't use NewRequestWithContext and pass through the context variable?

It should probably use that instead.

msbarry · 2024-02-05T10:55:38Z

pmtiles/server.go

+	r, _, err := server.bucket.NewRangeReaderEtag(ctx, name+".pmtiles", int64(header.MetadataOffset), int64(header.MetadataLength), rootValue.etag)
+	if isRefreshRequredError(err) && len(purgeEtag) == 0 {
+		purgeEtag = rootValue.etag
+		goto start // TODO cleaner way to handle the retry?


Sounds good, splitting out getHeaderMetadataAttempt and getTileAttempt makes sense and would be a cleaner way to handle the 1 retry.

What do you think about having NewRangeReaderEtag return a struct with reader, etag, reloadRequired? I'm hesitant to just pass through a response code since the on-disk implementation is going to need to check a different way.

And I think the promise invalidation works similarly, the second request gets made with purgeEtag set to the etag that just turned out to be invalidated, so the first request purges it from the cache and kicks off a new request, the subsequent ones will use the cached new request. Do you think that's sufficient?

pmtiles/server.go

pmtiles/bucket.go

msbarry added 4 commits February 4, 2024 09:59

optional etag range reader

5ffb014

handle refresh required

c8db6fa

got it working

b5ed486

todo

1143be4

bdon reviewed Feb 5, 2024

View reviewed changes

pmtiles/bucket.go Show resolved Hide resolved

bdon reviewed Feb 5, 2024

View reviewed changes

pmtiles/bucket.go Show resolved Hide resolved

bdon reviewed Feb 5, 2024

View reviewed changes

msbarry commented Feb 5, 2024

View reviewed changes

msbarry added 2 commits February 6, 2024 06:53

add server tests

07bb740

clean up retry logic

1b0377c

msbarry commented Feb 6, 2024

View reviewed changes

pmtiles/server.go Show resolved Hide resolved

msbarry commented Feb 6, 2024

View reviewed changes

pmtiles/server.go Outdated Show resolved Hide resolved

msbarry added 4 commits February 6, 2024 07:15

built-in json assert

af2ef10

tests for http bucket

49bf172

tweak test

2de57ac

500 on retry loop

c8ad5cf

msbarry marked this pull request as ready for review February 7, 2024 11:53

use context on http request

2ddb448

msbarry commented Feb 7, 2024

View reviewed changes

pmtiles/bucket.go Show resolved Hide resolved

revive

eeeec21

bdon merged commit 1f898fd into protomaps:main Feb 8, 2024
1 check passed

bdon mentioned this pull request Aug 6, 2024

Azure Reloading pmtiles header on file change #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle when etag changes on remote archive #130

Handle when etag changes on remote archive #130

msbarry commented Feb 4, 2024 •

edited

Loading

bdon Feb 5, 2024

msbarry Feb 5, 2024

msbarry Feb 5, 2024

bdon Feb 7, 2024

msbarry Feb 5, 2024

Handle when etag changes on remote archive #130

Handle when etag changes on remote archive #130

Conversation

msbarry commented Feb 4, 2024 • edited Loading

bdon Feb 5, 2024

Choose a reason for hiding this comment

msbarry Feb 5, 2024

Choose a reason for hiding this comment

msbarry Feb 5, 2024

Choose a reason for hiding this comment

bdon Feb 7, 2024

Choose a reason for hiding this comment

msbarry Feb 5, 2024

Choose a reason for hiding this comment

msbarry commented Feb 4, 2024 •

edited

Loading