Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for comparing ZedTokens #1162

Open
jason-who-codes opened this issue Feb 10, 2023 · 9 comments
Open

Add API for comparing ZedTokens #1162

jason-who-codes opened this issue Feb 10, 2023 · 9 comments
Assignees
Labels
area/api v1 Affects the v1 API area/datastore Affects the storage system priority/4 maybe This might get done someday state/needs discussion This can't be worked on yet

Comments

@jason-who-codes
Copy link

We have some use cases involving event streams, where the events would include ZedTokens from SpiceDB (e.g. the results of the watch API). Consumers of these events will call SpiceDB to look up additional information in response, providing event's ZedToken to get data at_least_as_fresh as the event, and persist the results somewhere (e.g. a database). Ideally, event consumers would be able to compare the ZedToken from an async event against a ZedToken already stored in the DB, determine which one is "newer", and call SpiceDB providing the most recent ZedToken.

To support this use case, we would need a client library or gRPC endpoint for comparing ZT's. We recognize that some datastores allow for concurrent updates, so it may not be possible to conclusively say on ZedTokens is "before" another. We could work around a case of "concurrent" ZedTokens by simply making a fully_consistent request to SpiceDB. So if we had a function like compare(ZT1, ZT2) it could return (for example):

  • DEFINITELY_BEFORE: in which case we call SpiceDB with at_least_as_fresh(ZT2)
  • DEFINITELY_AFTER: in which case we call SpiceDB with at_least_as_fresh(ZT1)
  • INCONCLUSIVE/CONCURRENT: in which case we call SpiceDB with fully_consistent

Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh (so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.

Note: this capability for comparing ZedTokens is mentioned in footnote 3 of the Tiger Cache Proposal #207

@jzelinskie jzelinskie added priority/4 maybe This might get done someday area/api v1 Affects the v1 API area/datastore Affects the storage system state/needs discussion This can't be worked on yet labels Feb 17, 2023
@josephschorr josephschorr self-assigned this May 11, 2023
@josephschorr
Copy link
Member

Alternatively/additionally, the existing Consistency parameter for SpiceDB operations could be modified to allow passing in a list of ZedTokens for at_least_as_fresh (so the operation would be performed on data at least as fresh as the "newest" of all the provided tokens) to avoid an extra roundtrip for comparison.

@jason-who-codes would you prefer this approach vs a comparison API? Are there any other areas where a comparison API would make sense/provide value?

@jason-who-codes
Copy link
Author

Yep - passing in a list of tokens for at_least_as_fresh would generally be preferable, as it eliminates a service call round-trip. It also prevents us from needing to decide to make a fully_consistent request if the tokens are concurrent (I suspect that SpiceDB internally could do something "smarter" to guarantee at_least_as_fresh as both tokens without resorting to full consistency)

@croemmich
Copy link

croemmich commented Aug 1, 2023

This would be beneficial to us as well. We have a large model and quite a few relationships, so uncached performance can be a bit rough. We've extended the quantization window to 24 hours and rely heavily on at_least_as_fresh to ensure consistency.

We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent.

To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.

@josephschorr
Copy link
Member

@croemmich can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all? If you are storing a ZedToken for an updated object, then at least as fresh should "just work" when sent that ZedToken

@geropl
Copy link

geropl commented Oct 11, 2023

We've implemented a middleware layer that stores consistency tokens for various objects when relationships are written and then query with the most recent. To implement this, we've done a bit of a naughty by un-opaquing the ZedToken. With CockroachDB and memdb, it's just a base64 encoded integer timestamp. However, I'd love to be able to pass multiple tokens and let SpiceDB take care of it.

Same here. We are using MySQL, where we generate code for the (internal) DecodedZedToken (source) to make sure to properly parse it.

can you expand on why, exactly, your middleware layer needs to compare ZedTokens at all?

After reading the documentation, the original Zookie paper, this blog post and asking for clarification here our understanding is that there is no guarantee for "reading our own writes" in case that involves traversing a hierarchy of objects.

E.g. take the following example:
Schema:

  • User, Organization, Document
  • Organization has members and documents

T0: Alice(T0) is member of organization O1(T0)
T1: Alice adds document D1(T1)
T2: Alice adds Bob as member of organization O1(T2), Bob(T2)

When reading document D1 now, we would use ZedToken T1, which could lead to us not seeing Bob being a member of O1.
To avoid this, we want to make sure that we are always using the most recent ZedToken - for which we either need to compare it locally (quickly) - or can pass a list of tokens.

@josephschorr Honestly, the local comparison of the integer would be great. What exactly is the reason this is not part of the API ? My current understanding is that global ordering should be possible as long as we use a common persistence layer for all SpiceDB instances and use that to source the integer in the first place (which seems to be the case for postrgres and mysql at least). Happy to learn more, though! 🙏
Also, if there are case where this is not doable, this could be signaled with a flag ala comparable bool? 🤔

@vroldanbet
Copy link
Contributor

vroldanbet commented Oct 11, 2023

I think we agree on the need to either compare zedtokens or have SpiceDB accept multiple and have it pick the most recent. Each datastore may have a different underlying of zedtokens so it's just not a timestamp - this is the case of postgres implementation which uses PG internal datatypes. Exposing those internals via a client library would turn it into API and make it not possible to evolve the underlying datastore implementation without breaking clients. For example PG implementation was also a timestamp before it started using PG's MVCC xid, xmin and xmax types.

Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?

@geropl
Copy link

geropl commented Oct 11, 2023

For example PG implementation was also a timestamp before it started using PG's MVCC xid, xmin and xmax types.

Ok, thanks for the explanation! Missed that.

Would having the APIs accepting multiple zedtokens so that SpiceDB picks up the most recent satisfy your requirements?

Yes, that would work. 👍

@mgagliardo91
Copy link

mgagliardo91 commented Jan 3, 2024

We recently ran into a similar issue with what this Issue is hoping to address with our GraphQL API. GraphQL resolvers are inherently asynchronous, and multiple mutations/queries can be made in the same "request". To solve the issue of a single GraphQL request creating multiple asynchronous SpiceDB writes, and then resolving the underlying GraphQL query (within a mutation, lets say) that could end up hitting SpiceDB for a permission - we were also trying to issue subsequent requests with the latest token across all of the async writes.

If we could instead pass a list of ZedTokens and have SpiceDB determine use the latest as the consistency value, that would address our issue of not being able to decode/order tokens on the client side.

This is not blocking us - instead we are issuing fullyConsistent requests whenever any write has occurred within the lifecycle described above, but it would be convenient to use.

@benvernier-sc
Copy link

Adding some context as to cases where making ZedTokens a repeated field might not be enough.

When consuming changes from the Watch API I get a ZedToken for when the change happened (let's call it ZT1). This might prompt me to recompute some expanded permissions using LookupResources or LookupSubjects, the result of which also gives me a ZedToken (ZT2). If I then consume another change from the Watch API with its own ZedToken (ZT3) and it affects the same resource or subject as the previous change, I might want to know whether ZT2 is fresher than ZT3, and if it is I can bypass making another call to LookupResources as it can be quite expensive.

Making the field repeated means I would still have to make the expensive call, and would only guarantee better freshness of the results which is not really what I'm after here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api v1 Affects the v1 API area/datastore Affects the storage system priority/4 maybe This might get done someday state/needs discussion This can't be worked on yet
Projects
None yet
Development

No branches or pull requests

8 participants