Clarify and detail the definition of globbing to only apply to folders #148

RubenVerborgh · 2019-03-26T21:13:23Z

There is a suspicion that globbing is currently defined too loosely, and that this loose behavior is not depended upon (#147).

This PR restricts globbing to concatenating RDF files in a single folder.

michielbdejong

I would prefer to remove globbing altogether, but this is indeed an improvement. I think this PR is a clarification of the text, not a change of the intent of the text, right?

I would change the example, adding file3.ttl to which the user has no access.

Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc. Also, how would the prefixes be done? What if they clash? Is it just a text-concatenation, or should it understand Turtle?

What would the server do if one of the documents has a syntax error?

acoburn · 2019-03-27T13:31:42Z

Another consideration w/r/t globbing is the issue of blank nodes. Consider the case of two documents in a container:

/container/resource1.ttl

<> a ldp:RDFSource ;
      skos:prefLabel "resource 1" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/my-webid> ;
           prov:atTime             "2019-03-27T19:23:00.512Z"^^xsd:dateTime ] .

and also:

/container/resource2.ttl

<> a ldp:RDFSource ;
      skos:prefLabel "resource 2" ;
      prov:wasGeneratedBy [
           a prov:Activity, as:Create ;
           prov:wasAssocatedWith <https://example.com/another-webid> ;
           prov:atTime             "2019-03-27T20:13:00.000Z"^^xsd:dateTime ] .

If the underlying RDF serialization library converts each resource's blank node to _:b0 (anything, really, but the issue is for any blank node label conflicts), then the resulting graph would be semantically different than requesting each resource individually.

I am no fan of blank nodes in the context of linked data, but if this is specified behavior, this general case would need to be considered.

As an alternative to globbing, I wonder whether LDPath would satisfy the existing use cases.

RubenVerborgh · 2019-03-27T13:48:06Z

Not just editorial, we exclude filename globs. Can you recheck? Agreed with rest. Blank nodes are not an issue (will be parsed to different nodes). Note that text says union of datasets (not concat of files).

kjetilk · 2019-03-27T15:01:36Z

The formal specification for merging RDF graphs are here:
https://www.w3.org/TR/rdf-mt/#graphdefs
I think that should be the basis of this feature, if it is implemented. E.g. INSERT DATA is specified in terms of RDF Merge.

acoburn · 2019-03-27T15:16:11Z

The Web Annotation specification makes use of Prefer headers, specifically oa:PreferContainedDescriptions and a resulting ActivityStream-based container to fetch an entire container of resources. This interaction fits naturally within the existing LDP specification without defining a new semantic for /*.

michielbdejong · 2019-03-28T09:12:14Z

Not just editorial, we exclude filename globs.

Yes sorry, I agree, and I also applaud that simplifying change.

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs. I think we should prescribe merging instead of concatenating, and refer to the document @kjetilk mentioned.

Just to make sure I understand blank nodes correctly, should we assume that the graphs have no blank nodes in common? For instance, when merging two graphs that both state that Alice drank a glass of wine, both using a blank node, then we have no ground to assume that these two docs describe the same event, right? So the result should be that we just keep both blank nodes (not mark them as identical), and so we in the resulting merge we say Alice drank two glasses of wine, right?

RubenVerborgh · 2019-03-28T09:22:28Z

@michielbdejong

I would change the example, adding file3.ttl to which the user has no access.

Also maybe we can specify that if the user has no read access to the container, then the request would fail, but if there are no Turtle documents to which the have read access, it would return an empty RDF doc.

What would the server do if one of the documents has a syntax error?

Agree with all of the above;
however, a note here that the spec currently is not at this level of detail at all.
It should be, don't get me wrong, but I did not omit any details that were previously specified.

Will do it for this section, but a more rigorous version of this spec is needed indeed.

Also, how would the prefixes be done?

Up to the implementation.

Is it just a text-concatenation, or should it understand Turtle?

From the current version:

Then a request to /data/*
will return the union of the datasets in file1.ttl and file2.ttl

My only issue then with this PR is that it talks about 'concatenating', not 'merging' Turtle docs.

Not really as per the above, but will add spec reference as per @kjetilk.

RubenVerborgh · 2019-03-28T09:24:00Z

@acoburn Interesting, created #149 to track that idea.

RubenVerborgh · 2019-03-28T10:13:20Z

Applied the suggested changes. This part of the spec is now very precise, giving clear expectations for a server (and also for a client library that would assume this functionality). Contrasts a bit with the rest of the spec, but good to start somewhere.

@michielbdejong Could you re-review?

@melvincarvalho Given that you are the main stakeholder for globbing, could you please review?

michielbdejong

Great!

melvincarvalho · 2019-03-28T12:50:19Z

I'd take a look at gold as node solid server was based on gold for globbing. The write up in the spec should be considered a really good try based on someone writing up a documentation having not created that spec. So while the write up might not be perfect, to an extent gold should lead the text.

What I recall is

just use /* type URIs is what I thought globbing was
I'm unsure that the BasicContainer thing is part of globbing ... is the /* a container even?
Unsure what the merge spec is, and dont have time to read that doc right now, but I'd avoid bnodes and globbing in a design

I think globbing should be used sparingly, but is a great tool to take solid apps to the new level.

It's anticipated that new techniques will emerge over time -- things like http2 and websockets, proxies, caching, databases, and out of band streaming were expected to emerge over time and therefore the need for globbing would grow less and less. These techniques did not emerge yet, because we've focused resource on gettng a stable node solid server.

So tl;dr let's support what work in gold which the /* pattern. And take the triples. If you mix bnodes and globbing you're asking for trouble. Globbing is very useful but other web servers such as apache dont use it, so it's slightly bespoke. We should avoid feature creep, and keep it basic but useful.

melvincarvalho · 2019-03-28T12:55:07Z

Also, how would the prefixes be done? What if they clash?

Prefixes are just a per file short hand to save you typing out characters over and over.

They are designed not to clash.

For example if you expand the prefixes, you get consistent URIs which are also UUIDs, if that makes sense.

melvincarvalho · 2019-03-28T13:02:54Z

@michielbdejong as background consider a chat app that may have 50 friends, with 1000s of messages. What do you do at startup time?

It's a reasonable, if not good design decision to put each chat item in its own file.

Now at startup you'll need to get 1000 files. Just watch stuff crash until you queue or throttle it. Then think about async type JS frameworks which will update a UI when a new file is received. You can get incredible lag if you are pulling in a lot of files (think about thats also a problem with http2).

Globbing makes some of these problems go away. And make more than the simplest apps more feasible. Trust me you are hitting so many walls with solid at that stage, that things like globbing a god send. The current generation of app builders have not hit these walls yet, but I look forward to people doing so, appreciating how useful this feature can be to get you to prototype (where otherwise people would just give up) as app data grows.

I think as new techniques become available we'll only then be able to evaluate apps which have 100s of files or even 1000s. Also remember globbing is not just about http, you can get files from a cache, from indexeddb, from a file system and sometimes these operations are much slower one a time etc.

RubenVerborgh · 2019-03-28T13:03:55Z

Thanks @melvincarvalho!

So tl;dr let's support what work in gold which the /* pattern.

To the best of my knowledge, this is what the current proposal contains.
Do you see any issues/inconsistencies? Any reason not to merge?

If you mix bnodes and globbing you're asking for trouble.

First, I disagree (no trouble in the way I currently specified it, which is a main reason for specifying). Second and more importantly: what do you propose alternatively? That we a) skip files with blank nodes b) throw an error on blank nodes c) not include triples with blank nodes d) canonicalize them e) not spec it and leave up to implementers f) … ?

melvincarvalho · 2019-03-28T13:06:02Z

@RubenVerborgh well I dont use bnodes, so dont have a problem with any merge strategy.

RubenVerborgh · 2019-03-28T13:06:31Z

Thanks, I will consider the above as no objection, unless I hear otherwise.

RubenVerborgh · 2019-03-28T13:07:37Z

Globbing makes some of these problems go away.

But creates new ones. Do we really want a single stream with an unknown number of messages, that might be very long and cause server and client pressure? As opposed to fetching resources individually (which, under HTTP/2, comes with virtually zero overhead).

This is not a problem we can solve with theoretical arguments, but which requires experimental measurements. (Take it from a researcher who happens to specialize in the partitioning of Linked Data documents to tackle that specific situation 😉 http://linkeddatafragments.org/publications/jws2016.pdf)

But let's discuss removal in #145.

melvincarvalho · 2019-03-28T13:08:18Z

Any reason not to merge?

Why is the BasicContainer stuff in there with contains? Does gold / databox do that?

Not that I think it's a terrible idea, just dont remember seeing that before, so wonder where it came from. If we're cleaning up the spec, then gold should be the reference imho.

elf-pavlik · 2019-03-28T17:05:53Z

I would like to see feedback and from Tim and Andrei before a substantive spec change.

I haven't seen activities around solid repos from @deiu for quite some time. If you wan't his feedback you might need to make effort to reach out to him via some more direct channel.

melvincarvalho · 2019-03-28T18:07:54Z

@elf-pavlik he may be busy. Let's give him a bit of time to respond, should he wish to. Or possibly Tim could weigh in.

The main concern is regarding the timing. Once the timing is clarified, I could get behind it.

melvincarvalho · 2019-03-28T18:36:53Z

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline, or preferred set of time lines for changes to the spec.

I think that may alleviate potential misunderstandings. I'm feeling quite pressured to take time out from other urgent matters, which is not an optimal way to make specification changes.

RubenVerborgh · 2019-03-28T22:59:10Z

@melvincarvalho

of first moving it to a "deprecated" section

You have my formal objection to this.

I didn't even formally propose it, so no need to object.
My point is that even if we decide to start the removal process, it would be good to document the current behavior.

There are currently four described behaviors, and at least two of them are incompatible:

the spec
@melvincarvalho's interpretation of what the spec should be (Write a client-side alternative for globbing solid#253 (comment))
the GOLD implementation
the NSS implementation

The goal of this PR was to ensure that 2. is reflected in 1.

in a discussion with @timbl, he expressed his preference for removing globbing altogether

Citation required.

That's it right there. No need to doubt my word.

I'd like to see the context of this. And also the time line.

It's a private conversation that I hence cannot share.
Timeline: today, pointing to these issues and asking for input.

Not now. The bar for spec changes is extremely high, and by default should be unanimous.

For all we know, this is a spec correction.

A bug fix that aligns the spec with gold test indicated I think would be fine.

That's exactly what this PR does. The TestGlob you pointed to tests only one star at the end of a container path: https://github.com/linkeddata/gold/blob/b000d003f9e2aa40e4977839ca063f09435f80c8/server_test.go#L1193

I will aim to give a more specific review to this work. But not this week. More likely in the month of April.

That's fine, but we might not necessarily wait for that.

All I'm doing here is aligning the spec with behavior that you want, GOLD has, and NSS has. Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh perhaps the easiest thing would be, could you state your preferred timeline

We fix the incorrect description of globbing (this PR).
We implement client-side globbing.
[optional] We deprecate globbing.
We remove globbing.

However, given that we only have one stakeholder and two apps that use globbing, I (personally) am inclined to drop 3, possibly 2, possibly 1.

Timeline: ASAP. If there were more stakeholders, my answer would be different.

No actual changes requested (only additional reviews).

melvincarvalho · 2019-03-29T06:39:04Z

Timeline: today, pointing to these issues and asking for input.

Solid lives through its specifications.

Contentious changes to the spec cannot be carried out unilaterally on such a short time line. Globbing came about iteratively through many years of work on solid, and produced critical apps in its evolution. It's an important part of solid.

Your proposal remains problematic, not least because there are some factual errors that could be expanded upon, and I will take some time go to through the reasons for that. As such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

We also could use some time for Tim or Andrei to weigh in if they choose to.

I appreciate the review from @michielbdejong however I cant help but reflect that we had almost this exact same conversation c. 6 years ago in unhost on the unhosted spec. When basti and I wanted to add HEAD to the GET part of the REST spec of remote storage, and michiel in this case wanted a freeze due to "slippery slope". Somewhat amusing that we are now in reverse roles, but I do fondly recall our discussion, and hope that you can empathize with that meta principle :)

Instead of all the tangential discussions, a simple “thank you” could also have worked.

@RubenVerborgh I do thank you for this and other contributions to the topic. I simply ask that contentious spec changes have a longer time line on which to discuss. Spec changes should have a high bar for changes. Hope this is making at least some sense.

RubenVerborgh · 2019-03-29T07:00:03Z

s such, I've readded the "on-hold" tag. It would be good not to have a "tag war", which sort of defeats the purpose of the on-hold principle too! :)

on-hold is for technical blockers; I have been made aware of none. Could you please remove? (or link to another issue that blocks this one?)

Solid lives through its specifications.

Or the GOLD implementation apparently. There is a bug in the spec, this fixes it.

It would be very different if globbing were actually used by more than one stakeholder. Given that we have identified all stakeholders, and that this issue correctly captures globbing as defined by them, I don't understand the resistance.

RubenVerborgh · 2019-03-29T08:20:35Z

Discussed out of band with @melvincarvalho; this should be on hold until he and @timbl can discuss.

Also reduces globbing in scope to all files in one folder, as in solid/solid-spec#148

michielbdejong

I think we should merge this because it describes our understanding of the 0.8 spec better than the current text. This is regardless of whether globbing will be removed altogether in the next spec version or not.

Closes #147.

RubenVerborgh · 2019-07-01T12:22:42Z

Resolved conflict; mergeable again.

kjetilk · 2019-07-01T13:11:44Z

OK by me, and my assumption is that it is also OK by @timbl , but perhaps we should have a go from him?

RubenVerborgh added the enhancement label Mar 26, 2019

RubenVerborgh requested review from melvincarvalho and timbl March 26, 2019 21:13

RubenVerborgh self-assigned this Mar 26, 2019

This was referenced Mar 26, 2019

Specify exactly what globbing patterns are allowed #147

Open

Write a client-side alternative for globbing solid/solid#253

Closed

RubenVerborgh force-pushed the feature/restrict-globbing branch from bb1d331 to 35474c1 Compare March 26, 2019 21:15

RubenVerborgh requested a review from michielbdejong March 26, 2019 21:15

RubenVerborgh mentioned this pull request Mar 26, 2019

Remove globbing from the spec #145

Open

michielbdejong suggested changes Mar 27, 2019

View reviewed changes

RubenVerborgh mentioned this pull request Mar 28, 2019

Consider whether oa:PreferContainedDescriptions could replace globbing #149

Open

RubenVerborgh force-pushed the feature/restrict-globbing branch from 35474c1 to 8d0ee13 Compare March 28, 2019 10:10

RubenVerborgh requested a review from michielbdejong March 28, 2019 10:10

michielbdejong approved these changes Mar 28, 2019

View reviewed changes

RubenVerborgh removed the on-hold label Mar 28, 2019

RubenVerborgh mentioned this pull request Mar 28, 2019

Remove globbing #151

Open

melvincarvalho added the on-hold label Mar 29, 2019

RubenVerborgh assigned melvincarvalho and timbl and unassigned RubenVerborgh Mar 29, 2019

RubenVerborgh mentioned this pull request Mar 29, 2019

Mark globbing as "at risk" #152

Merged

RubenVerborgh assigned michielbdejong Mar 29, 2019

Mitzi-Laszlo added this to the Spec Pull Requests milestone May 7, 2019

RubenVerborgh added a commit to nodeSolidServer/node-solid-server that referenced this pull request Jun 2, 2019

Avoid globbing hack.

3d43f72

Also reduces globbing in scope to all files in one folder, as in solid/solid-spec#148

RubenVerborgh added a commit to nodeSolidServer/node-solid-server that referenced this pull request Jun 2, 2019

Avoid globbing hack.

bb5ba24

Also reduces globbing in scope to all files in one folder, as in solid/solid-spec#148

michielbdejong previously approved these changes Jul 1, 2019

View reviewed changes

michielbdejong mentioned this pull request Jul 1, 2019

Version 0.8 Changelog #199

Closed

Restrict globbing to /*

0dfc010

Closes #147.

RubenVerborgh dismissed michielbdejong’s stale review via 0dfc010 July 1, 2019 12:22

RubenVerborgh force-pushed the feature/restrict-globbing branch from 8d0ee13 to 0dfc010 Compare July 1, 2019 12:22

RubenVerborgh removed the on-hold label Jul 1, 2019

dmitrizagidulin mentioned this pull request Sep 24, 2019

Should globbing be included as a server-side feature? solid/specification#64

Closed

Potherca mentioned this pull request Oct 2, 2020

Fix typo in "Solid HTTPS REST API Spec" #231

Open

NoelDeMartin mentioned this pull request Aug 14, 2021

Add migration instructions for older apps solid/solidproject.org#607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify and detail the definition of globbing to only apply to folders #148

Clarify and detail the definition of globbing to only apply to folders #148

RubenVerborgh commented Mar 26, 2019

michielbdejong left a comment

acoburn commented Mar 27, 2019

RubenVerborgh commented Mar 27, 2019 via email

kjetilk commented Mar 27, 2019

acoburn commented Mar 27, 2019

michielbdejong commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

michielbdejong left a comment

melvincarvalho commented Mar 28, 2019 •

edited

Loading

melvincarvalho commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019 •

edited

Loading

RubenVerborgh commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019 •

edited

Loading

melvincarvalho commented Mar 28, 2019

elf-pavlik commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019 •

edited

Loading

melvincarvalho commented Mar 29, 2019 •

edited

Loading

RubenVerborgh commented Mar 29, 2019

RubenVerborgh commented Mar 29, 2019 •

edited

Loading

michielbdejong left a comment

RubenVerborgh commented Jul 1, 2019

kjetilk commented Jul 1, 2019

Clarify and detail the definition of globbing to only apply to folders #148

Are you sure you want to change the base?

Clarify and detail the definition of globbing to only apply to folders #148

Conversation

RubenVerborgh commented Mar 26, 2019

michielbdejong left a comment

Choose a reason for hiding this comment

acoburn commented Mar 27, 2019

RubenVerborgh commented Mar 27, 2019 via email

kjetilk commented Mar 27, 2019

acoburn commented Mar 27, 2019

michielbdejong commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

michielbdejong left a comment

Choose a reason for hiding this comment

melvincarvalho commented Mar 28, 2019 • edited Loading

melvincarvalho commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019 • edited Loading

RubenVerborgh commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019 • edited Loading

melvincarvalho commented Mar 28, 2019

elf-pavlik commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

melvincarvalho commented Mar 28, 2019

RubenVerborgh commented Mar 28, 2019 • edited Loading

melvincarvalho commented Mar 29, 2019 • edited Loading

RubenVerborgh commented Mar 29, 2019

RubenVerborgh commented Mar 29, 2019 • edited Loading

michielbdejong left a comment

Choose a reason for hiding this comment

RubenVerborgh commented Jul 1, 2019

kjetilk commented Jul 1, 2019

melvincarvalho commented Mar 28, 2019 •

edited

Loading

melvincarvalho commented Mar 28, 2019 •

edited

Loading

RubenVerborgh commented Mar 28, 2019 •

edited

Loading

RubenVerborgh commented Mar 28, 2019 •

edited

Loading

melvincarvalho commented Mar 29, 2019 •

edited

Loading

RubenVerborgh commented Mar 29, 2019 •

edited

Loading