-
Notifications
You must be signed in to change notification settings - Fork 103
Clarify and detail the definition of globbing to only apply to folders #148
base: master
Are you sure you want to change the base?
Conversation
bb1d331
to
35474c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to remove globbing altogether, but this is indeed an improvement. I think this PR is a clarification of the text, not a change of the intent of the text, right?
I would change the example, adding file3.ttl
to which the user has no access.
Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc. Also, how would the prefixes be done? What if they clash? Is it just a text-concatenation, or should it understand Turtle?
What would the server do if one of the documents has a syntax error?
Another consideration w/r/t globbing is the issue of blank nodes. Consider the case of two documents in a container:
<> a ldp:RDFSource ;
skos:prefLabel "resource 1" ;
prov:wasGeneratedBy [
a prov:Activity, as:Create ;
prov:wasAssocatedWith <https://example.com/my-webid> ;
prov:atTime "2019-03-27T19:23:00.512Z"^^xsd:dateTime ] . and also:
<> a ldp:RDFSource ;
skos:prefLabel "resource 2" ;
prov:wasGeneratedBy [
a prov:Activity, as:Create ;
prov:wasAssocatedWith <https://example.com/another-webid> ;
prov:atTime "2019-03-27T20:13:00.000Z"^^xsd:dateTime ] . If the underlying RDF serialization library converts each resource's blank node to I am no fan of blank nodes in the context of linked data, but if this is specified behavior, this general case would need to be considered. As an alternative to globbing, I wonder whether LDPath would satisfy the existing use cases. |
Not just editorial, we exclude filename globs. Can you recheck? Agreed with
rest.
Blank nodes are not an issue (will be parsed to different nodes). Note that
text says union of datasets (not concat of files).
|
The formal specification for merging RDF graphs are here: |
The Web Annotation specification makes use of Prefer headers, specifically |
Yes sorry, I agree, and I also applaud that simplifying change. My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs. I think we should prescribe merging instead of concatenating, and refer to the document @kjetilk mentioned. Just to make sure I understand blank nodes correctly, should we assume that the graphs have no blank nodes in common? For instance, when merging two graphs that both state that Alice drank a glass of wine, both using a blank node, then we have no ground to assume that these two docs describe the same event, right? So the result should be that we just keep both blank nodes (not mark them as identical), and so we in the resulting merge we say Alice drank two glasses of wine, right? |
Agree with all of the above; Will do it for this section, but a more rigorous version of this spec is needed indeed.
Up to the implementation.
From the current version:
Not really as per the above, but will add spec reference as per @kjetilk. |
35474c1
to
8d0ee13
Compare
Applied the suggested changes. This part of the spec is now very precise, giving clear expectations for a server (and also for a client library that would assume this functionality). Contrasts a bit with the rest of the spec, but good to start somewhere. @michielbdejong Could you re-review? @melvincarvalho Given that you are the main stakeholder for globbing, could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I'd take a look at gold as node solid server was based on gold for globbing. The write up in the spec should be considered a really good try based on someone writing up a documentation having not created that spec. So while the write up might not be perfect, to an extent gold should lead the text. What I recall is
I think globbing should be used sparingly, but is a great tool to take solid apps to the new level. It's anticipated that new techniques will emerge over time -- things like http2 and websockets, proxies, caching, databases, and out of band streaming were expected to emerge over time and therefore the need for globbing would grow less and less. These techniques did not emerge yet, because we've focused resource on gettng a stable node solid server. So tl;dr let's support what work in gold which the |
Prefixes are just a per file short hand to save you typing out characters over and over. They are designed not to clash. For example if you expand the prefixes, you get consistent URIs which are also UUIDs, if that makes sense. |
@michielbdejong as background consider a chat app that may have 50 friends, with 1000s of messages. What do you do at startup time? It's a reasonable, if not good design decision to put each chat item in its own file. Now at startup you'll need to get 1000 files. Just watch stuff crash until you queue or throttle it. Then think about async type JS frameworks which will update a UI when a new file is received. You can get incredible lag if you are pulling in a lot of files (think about thats also a problem with http2). Globbing makes some of these problems go away. And make more than the simplest apps more feasible. Trust me you are hitting so many walls with solid at that stage, that things like globbing a god send. The current generation of app builders have not hit these walls yet, but I look forward to people doing so, appreciating how useful this feature can be to get you to prototype (where otherwise people would just give up) as app data grows. I think as new techniques become available we'll only then be able to evaluate apps which have 100s of files or even 1000s. Also remember globbing is not just about http, you can get files from a cache, from indexeddb, from a file system and sometimes these operations are much slower one a time etc. |
Thanks @melvincarvalho!
To the best of my knowledge, this is what the current proposal contains.
First, I disagree (no trouble in the way I currently specified it, which is a main reason for specifying). Second and more importantly: what do you propose alternatively? That we a) skip files with blank nodes b) throw an error on blank nodes c) not include triples with blank nodes d) canonicalize them e) not spec it and leave up to implementers f) … ? |
@RubenVerborgh well I dont use bnodes, so dont have a problem with any merge strategy. |
Thanks, I will consider the above as no objection, unless I hear otherwise. |
But creates new ones. Do we really want a single stream with an unknown number of messages, that might be very long and cause server and client pressure? As opposed to fetching resources individually (which, under HTTP/2, comes with virtually zero overhead). This is not a problem we can solve with theoretical arguments, but which requires experimental measurements. (Take it from a researcher who happens to specialize in the partitioning of Linked Data documents to tackle that specific situation 😉 http://linkeddatafragments.org/publications/jws2016.pdf) But let's discuss removal in #145. |
Why is the BasicContainer stuff in there with contains? Does gold / databox do that? Not that I think it's a terrible idea, just dont remember seeing that before, so wonder where it came from. If we're cleaning up the spec, then gold should be the reference imho. |
I haven't seen activities around solid repos from @deiu for quite some time. If you wan't his feedback you might need to make effort to reach out to him via some more direct channel. |
@elf-pavlik he may be busy. Let's give him a bit of time to respond, should he wish to. Or possibly Tim could weigh in. The main concern is regarding the timing. Once the timing is clarified, I could get behind it. |
@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline, or preferred set of time lines for changes to the spec. I think that may alleviate potential misunderstandings. I'm feeling quite pressured to take time out from other urgent matters, which is not an optimal way to make specification changes. |
I didn't even formally propose it, so no need to object. There are currently four described behaviors, and at least two of them are incompatible:
The goal of this PR was to ensure that 2. is reflected in 1.
That's it right there. No need to doubt my word.
It's a private conversation that I hence cannot share.
For all we know, this is a spec correction.
That's exactly what this PR does. The
That's fine, but we might not necessarily wait for that. All I'm doing here is aligning the spec with behavior that you want, GOLD has, and NSS has. Instead of all the tangential discussions, a simple “thank you” could also have worked.
However, given that we only have one stakeholder and two apps that use globbing, I (personally) am inclined to drop 3, possibly 2, possibly 1. Timeline: ASAP. If there were more stakeholders, my answer would be different. |
No actual changes requested (only additional reviews).
Solid lives through its specifications. Contentious changes to the spec cannot be carried out unilaterally on such a short time line. Globbing came about iteratively through many years of work on solid, and produced critical apps in its evolution. It's an important part of solid. Your proposal remains problematic, not least because there are some factual errors that could be expanded upon, and I will take some time go to through the reasons for that. As such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :) We also could use some time for Tim or Andrei to weigh in if they choose to. I appreciate the review from @michielbdejong however I cant help but reflect that we had almost this exact same conversation c. 6 years ago in unhost on the unhosted spec. When basti and I wanted to add HEAD to the GET part of the REST spec of remote storage, and michiel in this case wanted a freeze due to "slippery slope". Somewhat amusing that we are now in reverse roles, but I do fondly recall our discussion, and hope that you can empathize with that meta principle :)
@RubenVerborgh I do thank you for this and other contributions to the topic. I simply ask that contentious spec changes have a longer time line on which to discuss. Spec changes should have a high bar for changes. Hope this is making at least some sense. |
Or the GOLD implementation apparently. There is a bug in the spec, this fixes it. It would be very different if globbing were actually used by more than one stakeholder. Given that we have identified all stakeholders, and that this issue correctly captures globbing as defined by them, I don't understand the resistance. |
Discussed out of band with @melvincarvalho; this should be on hold until he and @timbl can discuss. |
Also reduces globbing in scope to all files in one folder, as in solid/solid-spec#148
Also reduces globbing in scope to all files in one folder, as in solid/solid-spec#148
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge this because it describes our understanding of the 0.8 spec better than the current text. This is regardless of whether globbing will be removed altogether in the next spec version or not.
Closes #147.
8d0ee13
to
0dfc010
Compare
Resolved conflict; mergeable again. |
OK by me, and my assumption is that it is also OK by @timbl , but perhaps we should have a go from him? |
There is a suspicion that globbing is currently defined too loosely, and that this loose behavior is not depended upon (#147).
This PR restricts globbing to concatenating RDF files in a single folder.