Sharing 1.5 - Single Whitelist Persistence #351

jbee · 2024-10-29T10:34:05Z

jbee
Oct 29, 2024
Collaborator

Motivation

The current sharing system is multi-dimensional making it unnecessary hard to understand and work with.
This impacts both complexity and performance.
However, for practical reasons it is hard to switch to a fundamentally different system.
Ideally the user experience and interaction with sharing should not change to avoid consts of transitioning.

Practical Example

With the current Sharing model there are multiple fields to check.
Two are maps of UID => access pattern.
A SQL to evaluate this has multiple parts because there are multiple fields to check.
For the maps with each group a user is a member of the query gets another expression making the SQL long, complex and thus it is fair to assume costly in terms of performance.

A filter looks something like this

{sharing.external matches X} 
OR {sharing.public matches Y} 
OR {sharing.user matches Z} 
OR {sharing.group matches G1} 
OR {sharing.group matches G2}
...

Proposal

This proposal is a solution that only changes how the sharing information is stored and processed but it should be possible to reconstruct the current API layout from the structure.

The sharing is a whitelist of UIDs and special tokens for read and write.
Empty sets are omitted.

{
 "r": ["{uid1}", "{uid2}", ...], 
 "w": ["{uid1}", "{uid2}", ...]
}

In Java

class Sharing {
Set<String> r;
Set<String> w;
}

The proposal is also to drop the data vs metadata distinction and to always imply both.
If this should be maintained more sets could be used or UIDs can be extended with a prefix character (see tokens).

Lookup

For a lookup it is always known upfront which of the 2/4 lists to check.
So the check itself is always a check for set intersection.
The users set of associated UIDs and tokens is checked against the whitelist of sharing.
If there is at least 1 contained in both access is allowed.
This way the SQL needed to perform sharing checks is a single X in Y where X and Y are JSON arrays of string.
For an in memory check this equally is a Set<String> containsAny check.

A filter would always look like this

{sharing.r intersectsWith [UID1, UID2, ...]}

Tokens

In the whitelists sets UIDs of users and user groups would be mixed.
The set would also allow non-UID tokens with special meaning. For example, a token to allow public access, instead of putting an UID in the set "p" could be added to symbolize that public access is allowed.

UIDs could also be prefixed with a token to mark them as read/write/dataread/datawrite instead of having multiple lists. E.g. r{uid} to give metadata read access to the UID.

Performance

From a performance standpoint it makes most sense to use an actual JSON array as the sole structure as that can be indexed in postgres AFAIK.

In such a form it is clear that tokens need to be used, e.g. r{uid} (user/group can read) and w{uid} (user/group can write) etcetera
JSON

[token1, token2]

Open Issues

The issues are not related to the representation but to the sharing concept itself. These are good to keep in mind when changing the design to maybe solve some of them in the process.

ATM a user can see information that should not be accessible according to sharing as long as the user can use another root object to start from which references the object. If that is the case the user can use fields=ref[a,b,c] to see a,b,c of the reference from another root.
does data sharing always imply metadata sharing? if not this is another loophole that is hard to get right. for example when filtering metadata objects using sharing users with data read access could accidentally get access to metadata that they do not have metadata read access to. so data always implying metadata by definition would greatly help in falling into that trap. if this semantic is chosen the split into data and metadata read/write might better be represented as access levels that build upon each other but that all are on a shared axis. Again that would also help to represent them in a single list which represents such axis.
applying sharing consistently is hard as there are many code-path to read (e.g. single object vs list views)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing 1.5 - Single Whitelist Persistence #351

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Sharing 1.5 - Single Whitelist Persistence #351

jbee Oct 29, 2024 Collaborator

Motivation

Practical Example

Proposal

Lookup

Tokens

Performance

Open Issues

Replies: 0 comments

jbee
Oct 29, 2024
Collaborator