-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: jwt cache is not purged #3801
base: main
Are you sure you want to change the base?
Conversation
test/io/test_io.py
Outdated
# the cache should now have three tokens | ||
response = postgrest.admin.get("/metrics") | ||
assert response.status_code == 200 | ||
assert "pgrst_jwt_cache_size 3.0" in response.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem Is it possible to have the size measured on bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. There is no built-in function for this in Data.Cache
. I have tried using https://hackage.haskell.org/package/ghc-datasize to calculate size in bytes at runtime, but I have run into problems with that too.
We can try manually writing some logic to calculate the size in bytes, but I guess that would require some effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be much code in that library: https://github.com/def-/ghc-datasize/blob/master/src/GHC/DataSize.hs. So maybe we can just vendor the code and use unsafePerformIO
if it's cumbersome to integrate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, this metric could be a new PR so it's easier to review.
src/PostgREST/Auth.hs
Outdated
checkCache <- C.lookup (getJwtCache appState) token | ||
|
||
-- Call first to delete the expired JWTs | ||
C.purgeExpired jwtCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it is a good idea to add this in the hot path of request processing, unless we are sure it is a constant time operation (and fast).
Otherwise it should be moved to a separate thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkleczek Good catch. The Data.Cache
says:
purgeExpired :: (Eq k, Hashable v) => Cache k v -> IO ()
Delete all items that are expired. This is one big atomic operation.
This indeed means that it is not a constant time operation so we should definitely move this to a separate thread.
5444935
to
7ffca04
Compare
src/PostgREST/Auth.hs
Outdated
checkCache <- C.lookup (getJwtCache appState) token | ||
|
||
-- Purge expired tokens in a separate thread | ||
_ <- forkIO $ C.purgeExpired jwtCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is going to start a new thread on every request? That doesn't look efficient.
I think we should have a background thread that is notified for doing the purge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest - for starters I would implement a simple solution that executes purgeExpired periodically ( frequency subject to configuration possibly - but not necessarily ).
Triggering purge upon every request might mean the purging thread is running constantly wasting cycles as there is nothing to do most of the times.
The ultimate solution would be to have some kind of scheduler that triggers purging upon next nearest expiry - but that would require more complex bookkeeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it some more - it seems to me the best moment to trigger purge is upon a cache miss - just before/after inserting a new entry.
The reasoning is that:
- We expect it to be rare (otherwise there is no point of the cache)
- It makes sure the cache is not growing (as inserting new entries does garbage collection)
- Since this is time expiration based cache there is no real risk of starvation - sooner or later we are going to have a cache miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a much better idea. This way, we wouldn't need to do maintain any new thread state, hence avoiding extra overhead. @steve-chavez WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, looks like great idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem not sure how to help with that - do you think it might be ready soon? Seems to me without it JWT caching is kind of useless (ie. has memory leak). I think it is a pretty good candidate for quick patch release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkleczek Yes, I agree that JWT caching is not production ready yet. @steve-chavez WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should definitely do a patch release. We need to merge #3802 first for testing, I'll review it tomorrow.
7ffca04
to
4a0d255
Compare
Looking at the latest loadtest results https://github.com/PostgREST/postgrest/actions/runs/13016585172. There's a dropdown in perf wrt v12.2.5, but this is because of #3882, compared to |
I'm trying to manually test this by cherry-picking #3802 (which won't be merged before this because #3802 (comment)). So then I do: curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 1 postgrest_test_author)"
# .. 3 more times
# then
curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 480.0 I wait for like a minute, do new requests and the jwt cache size never goes down.
@taimoorzaeem Can you double check this and see what's the problem? |
For some reason, doing the request with the same JWT also increase the jwt cache size every time:
And that shouldn't happen at all |
Fixes #3788.
Fixed the JWT cache not purging expired tokens in cache. This is tested by adding a new metric
pgrst_jwt_cache_size
.@steve-chavez I can also divide this into two commits, one for adding the new metric and the other for bug fix. Please review and let me know.
Edit: I'll continue development once #3802 gets merged.