Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamps and ACL caching #37

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Timestamps and ACL caching #37

wants to merge 14 commits into from

Conversation

kjetilk
Copy link
Member

@kjetilk kjetilk commented Feb 18, 2019

I started thinking about the design of an ACL cache, and figured some metadata set on the authorizations, using DC properties could help with that.

This proposal helps both with caching individual authorizations, like a specialized reverse proxy or a Solid app could, as well as recommendations for Solid servers to implement so that legacy HTTP caches may cache ACL resources.

I think this should be in the WAC spec, so I submit it as a proposal for future consideration.

@gobengo
Copy link

gobengo commented Mar 28, 2019

wrt adding timestamp rdf properties, using them for HTTP cache headers: I agree, and is this unique to Authorizations? To me it seems like this advice is good for any type of thing coming out of a solid server. Curious if others agree.

Without blocking this particular PR, should this advice be 'lifted' into a more general part of the solid specs?

@kjetilk
Copy link
Member Author

kjetilk commented Mar 28, 2019

wrt adding timestamp rdf properties, using them for HTTP cache headers: I agree, and is this unique to Authorizations? To me it seems like this advice is good for any type of thing coming out of a solid server. Curious if others agree.

Yeah, actually, I agree. :-) But, I note that for the WAC spec, we can specify how the RDF graph that specifies the authorization looks, but for other data, that is not so much the case...

Without blocking this particular PR, should this advice be 'lifted' into a more general part of the solid specs?

...so, in the interest of orthogonal specifications, I think this should be considered independently from other specs.

@kjetilk kjetilk added this to the Spec Pull Requests milestone Apr 15, 2019
@michielbdejong
Copy link
Contributor

I agree that any RDF source should produce an ETag header, so that clients can request it with an If-None-Match header, or use the similar less granular mechanism based on Last-Modified and if-Modified-Since.

I don't see why you would put those timestamps inside the data, though?

@TallTed
Copy link
Contributor

TallTed commented Apr 25, 2019

@michielbdejong - It's worth noting that many systems do not properly track "Modified" dates for files, blurring lines with "Touched" and "Opened" (among other actions). Tracking modification datetime info explicitly within the data thus can have value.

That said, having worked with multiple systems that use such internal tracking, I can also say that relying on humans to (remember to) (accurately) do the work of changing those dates is similarly fraught with peril, so it would be good if increasingly intelligent technology could be brought to bear on it.

and usability. Since servers must always use the most recent
authorizations for operations, discrepancies between a client/proxy
cache and what the server uses may arise if an application uses a
stale authorization. That will not be security critical (since the the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo - 'the' twice

@dmitrizagidulin
Copy link
Member

@kjetilk I like this proposal. Especially the 'issued' and 'modified' attributes.

I'm a bit unsure about the 'valid' predicate, though. Is the intention to just use it for cache control? In which case, maybe an explicit cache control header from an http-headers ontology would be clearer?

If the intention is broader, I suspect this might be a bit confusing in terms of user interface and usability. What's the pain point that the 'valid' term is solving?

@michielbdejong
Copy link
Contributor

many systems do not properly track "Modified" dates for files

That's irrelevant for the Solid spec, right? We can just warn against that in the spec, saying, beware if you implement your storage layer directly on a file system, looking at the mtime might not be good enough to implement proper ETags. The server should probably generate the ETag in code, and store it explicitly alongside the data?

I'm not (yet) convinced that the possible advantages of the changes suggested in this PR merit their cost.

In any case, and in a separate note, I think we should use a versioned spec, not a living document, so the 0.7 spec as it stands now will forever keep pointing at its snapshot versions of the various sub-specs, and at some point we need to do a triage round to establish which proposals would be eligible for going into the 0.8 spec (and I'm hoping we can postpone that until at least the end of 2018).

@kjetilk
Copy link
Member Author

kjetilk commented Apr 26, 2019

I don't see why you would put those timestamps inside the data, though?

I tried to explain that in the spec itself, but I'll be happy to further clarify. There are a few reasons for this: It gives increased granularity, as you can cache individual authorizations, not just at a "ACL file level". It also provides orthogonality to the HTTP protocol, you don't need to rely on things being served over HTTP to use caching. I think this will be very important soon in an IoT world. Another aspect of this is that you don't need a separate layer of storage to manage these times, they are right there in the authorizations.

@kjetilk
Copy link
Member Author

kjetilk commented Apr 26, 2019

many systems do not properly track "Modified" dates for files

That's irrelevant for the Solid spec, right? We can just warn against that in the spec, saying, beware if you implement your storage layer directly on a file system, looking at the mtime might not be good enough to implement proper ETags. The server should probably generate the ETag in code, and store it explicitly alongside the data?

I think we should keep mtime and ETag completely separate. They are two different things. Etags can be computed in many ways, and it is very important to get them right, but it is orthogonal to mtime.

So, there are two topics of importance here, one is caching generally, and one is an implementation detail of an ACL cache.

As I said above, caching generally should not have to rely on the "ACL file" as the smallest unit of caching, it should be possible for a specialized cache to consider each individual authorization as the smallest unit.

For the ACL cache that needs to be present for performance reasons in the actual authorization process, it is also as a matter of practicality, you don't want to look up the mtime on the backend if you can rely on that they are correct in your ACL cache, but you ACL cache should not consider your backend. The ACL cache should basically be a memory quad store that can be queried for authorizations really fast, and getting the mtimes from the ACL cache should also be a really fast operation.

So, either you store it with the authorization itself, or you store it in a separate resource, but I think that would be a bad design. To say that an authorization itself has been modified at a certain time is exactly what you should say, and this is saying it.

@kjetilk
Copy link
Member Author

kjetilk commented Apr 26, 2019

@kjetilk I like this proposal. Especially the 'issued' and 'modified' attributes.

I'm a bit unsure about the 'valid' predicate, though. Is the intention to just use it for cache control? In which case, maybe an explicit cache control header from an http-headers ontology would be clearer?

If the intention is broader, I suspect this might be a bit confusing in terms of user interface and usability. What's the pain point that the 'valid' term is solving?

Yeah, well, I'm not sure the max-age has a good place in the HTTP headers ontology, but if it did, it would indeed be more precise.

So, I mostly chose the dct:valid predicate for consistency with the others, and possibly some unexpected reuse. You never know, what people might use it for if an authorization is said to be valid up to a certain time. :-)

Now, it was motivated from the observation I had that parsing a simple ACL file came at about 200 ms cost. That's quite a lot. We will be looking up ACL files for pretty much everything, it will be a situation where every millisecond counts when we get to the point that UX is based on the integration of a lot of resources. To not have to look up an ACL at all, but to use it without further ado from an ACL cache could mean a lot of milliseconds. :-)

It is certainly possible to use a different predicate for it, but I think it is a good fit myself.

@TallTed
Copy link
Contributor

TallTed commented Jul 31, 2019

Needs a round of conflict resolution, @kjetilk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants