Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: jwt cache is not purged #3801

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

taimoorzaeem
Copy link
Collaborator

@taimoorzaeem taimoorzaeem commented Nov 22, 2024

Fixes #3788.

Fixed the JWT cache not purging expired tokens in cache. This is tested by adding a new metric pgrst_jwt_cache_size.

@steve-chavez I can also divide this into two commits, one for adding the new metric and the other for bug fix. Please review and let me know.

Edit: I'll continue development once #3802 gets merged.

# the cache should now have three tokens
response = postgrest.admin.get("/metrics")
assert response.status_code == 200
assert "pgrst_jwt_cache_size 3.0" in response.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taimoorzaeem Is it possible to have the size measured on bytes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. There is no built-in function for this in Data.Cache. I have tried using https://hackage.haskell.org/package/ghc-datasize to calculate size in bytes at runtime, but I have run into problems with that too.

We can try manually writing some logic to calculate the size in bytes, but I guess that would require some effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be much code in that library: https://github.com/def-/ghc-datasize/blob/master/src/GHC/DataSize.hs. So maybe we can just vendor the code and use unsafePerformIO if it's cumbersome to integrate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this metric could be a new PR so it's easier to review.

checkCache <- C.lookup (getJwtCache appState) token

-- Call first to delete the expired JWTs
C.purgeExpired jwtCache
Copy link
Contributor

@mkleczek mkleczek Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it is a good idea to add this in the hot path of request processing, unless we are sure it is a constant time operation (and fast).

Otherwise it should be moved to a separate thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkleczek Good catch. The Data.Cache says:

purgeExpired :: (Eq k, Hashable v) => Cache k v -> IO ()

Delete all items that are expired. This is one big atomic operation.

This indeed means that it is not a constant time operation so we should definitely move this to a separate thread.

checkCache <- C.lookup (getJwtCache appState) token

-- Purge expired tokens in a separate thread
_ <- forkIO $ C.purgeExpired jwtCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is going to start a new thread on every request? That doesn't look efficient.

I think we should have a background thread that is notified for doing the purge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest - for starters I would implement a simple solution that executes purgeExpired periodically ( frequency subject to configuration possibly - but not necessarily ).

Triggering purge upon every request might mean the purging thread is running constantly wasting cycles as there is nothing to do most of the times.

The ultimate solution would be to have some kind of scheduler that triggers purging upon next nearest expiry - but that would require more complex bookkeeping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more - it seems to me the best moment to trigger purge is upon a cache miss - just before/after inserting a new entry.

The reasoning is that:

  1. We expect it to be rare (otherwise there is no point of the cache)
  2. It makes sure the cache is not growing (as inserting new entries does garbage collection)
  3. Since this is time expiration based cache there is no real risk of starvation - sooner or later we are going to have a cache miss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a much better idea. This way, we wouldn't need to do maintain any new thread state, hence avoiding extra overhead. @steve-chavez WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, looks like great idea 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taimoorzaeem not sure how to help with that - do you think it might be ready soon? Seems to me without it JWT caching is kind of useless (ie. has memory leak). I think it is a pretty good candidate for quick patch release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkleczek Yes, I agree that JWT caching is not production ready yet. @steve-chavez WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should definitely do a patch release. We need to merge #3802 first for testing, I'll review it tomorrow.

@taimoorzaeem taimoorzaeem marked this pull request as ready for review January 28, 2025 18:01
@steve-chavez
Copy link
Member

Looking at the latest loadtest results https://github.com/PostgREST/postgrest/actions/runs/13016585172. There's a dropdown in perf wrt v12.2.5, but this is because of #3882, compared to main the numbers are roughly the same, so no perf loss in this PR.

@steve-chavez
Copy link
Member

I'm trying to manually test this by cherry-picking #3802 (which won't be merged before this because #3802 (comment)).

So then I do:

curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 1 postgrest_test_author)"
# .. 3 more times

# then
curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 480.0

I wait for like a minute, do new requests and the jwt cache size never goes down.

curl localhost:3000/authors_only -H "Authorization: Bearer $(postgrest-gen-jwt --exp 1 postgrest_test_author)"
# 3 more times

curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 840.0

@taimoorzaeem Can you double check this and see what's the problem?

@steve-chavez
Copy link
Member

For some reason, doing the request with the same JWT also increase the jwt cache size every time:

export JWT=$(postgrest-gen-jwt --exp 1 postgrest_test_author)

curl localhost:3000/authors_only -H "Authorization: Bearer $JWT"
[]

curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 120.0

curl localhost:3000/authors_only -H "Authorization: Bearer $JWT"
[]

curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 240.0

curl localhost:3000/authors_only -H "Authorization: Bearer $JWT"
[]
curl localhost:3001/metrics
# HELP pgrst_jwt_cache_size_bytes The JWT cache size in bytes
# TYPE pgrst_jwt_cache_size_bytes gauge
pgrst_jwt_cache_size_bytes 360.0

And that shouldn't happen at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

JWT cache is not purged?
3 participants