Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are root-relative paths valid? #1681

Closed
mattgarrish opened this issue May 25, 2021 · 6 comments · Fixed by #1725
Closed

Are root-relative paths valid? #1681

mattgarrish opened this issue May 25, 2021 · 6 comments · Fixed by #1725
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation

Comments

@mattgarrish
Copy link
Member

This was raised in the epubcheck tracker: w3c/epubcheck#1252 (comment)

We don't say anything about resolving a path that starts with a slash.

Is the root the root of the container or is it the location of the package document? If it's the former, the paths will be problematic for reading systems that serve unzipped content using the location of the package document as the root.

It seems like we should formally disallow root-relative paths unless we want to spec out the behaviour and are sure that all reading systems already do the same thing.

@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label May 25, 2021
@rdeltour
Copy link
Member

This is also linked to #1374: until the URL of the package document is known, there is no way to know how a path-absolute or path-relative URL will be parsed.

@dauwhe dauwhe added the Agenda+ Issues that should be discussed during the next working group call. label Jun 9, 2021
@iherman
Copy link
Member

iherman commented Jun 11, 2021

The issue was discussed in a meeting on 2021-06-10

List of resolutions:

  • Resolution No. 2: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688
View the transcript

2. URLs and the package document

See github issue #1681, #1374, #1688, #1686.

Dave Cramer: this is a bunch of issues that revolve around how you interpret URLs in the package document, especially if they're absolute URLs
… came from an issue in epubcheck
… and there's also an older issue about what the IRI of the package document is
… or what if there are file scheme URLs in the manifest
… and what happens if two URLs resolve to the same item in the manifest?

Matt Garrish: in epubcheck there was a root-relative URL that caused an error, and that spawned all of this
… e.g. "/something/thing"
… so what is the root of the epub?
… to me it doesn't make sense that we even allow these root-relative URLs
… the root differs based on the RS
… and Romain mentioned that we require that all resources resolve to something inside container, but depending on what RS does, there is even ambiguity about what that even is

Dave Cramer: in issue 1688 Romain he suggests that manifest items should have one of the special schemes (except file:)

Matt Garrish: there are edge cases where file scheme items make sense, but not generally for epub

Dave Cramer: it goes against epub as a portable format, and the file scheme ties the epub to a specific file system
… how much out there does have file URLs on purpose, not by accident?

Matt Garrish: never heard of one
… and they'd end up being remote resources

Dave Cramer: okay, so what if we just say no file URLs in epub?
… what is the risk that we break something?
… maybe this is something where we try to enforce it and see if anyone complains

Matt Garrish: most RS probably won't do anything with file URL
… probably security concern

Wendy Reid: depending on platform you might not even be able to access parts of the file system (e.g. iOS apps)

Dave Cramer: can we start by resolving on this point from 1688?

Proposed resolution: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688 (Wendy Reid)

Dave Cramer: +1

Matthew Chan: +1

Matt Garrish: +1

Wendy Reid: +1

Toshiaki Koike: +1

Shinya Takami (高見真也): +1

Ben Schroeter: +1

Resolution #2: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688

Dan Lazin: is there a use case for some of these other schemes? Why would you have an FTP in your epub?

Matt Garrish: if we go too far, do we prevent future stuff? will we have to come back and re-add this in the future?
… FTP kind of fits within the web framework
… maybe we just leave it to authors to stick with HTTP, HTTPS, etc.

Ben Schroeter: is the idea that if we disallow file scheme, then we also disallow "slash URLs"?

Matt Garrish: not sure those are the same
… i think 1681 is contingent on us forcing RS to unpack epub in a certain way
… otherwise we can't say there is a single consistent root that can be referenced
… and we don't tell RS how/where to unpack right now
… this kind of came up 5 years ago with multiple rendition, but we left it buried in the discussions we had

Dave Cramer: what would be the consequences of forbidding root-relative paths?

Matt Garrish: not sure there are any, because epubcheck had forbidden these until a recent update
… we're reasonably safe from backwards compatibility point of view

Dave Cramer: and this is just for href on manifest?

Matt Garrish: no, this would be anywhere, e.g. in content docs too
… all the "../" stuff would still be okay
… i proposed somewhere that we say all content must be below the packat document
… if we could enforce an authoring requirement that made a root, then we could enforce these relative paths
… but maybe its cleaner to just disallow them

Dan Lazin: do we support the base tag?
… and does that have implications for the handling of these issues?

Dave Cramer: we've been phasing out xml:base, its been forbidden from package file for example

Dan Lazin: the base tag allows you to define what the relative path is relative to
… so if we're allowing or disallowing certain types of URLs, maybe we should take a stance on base too
… not sure what stance though

Matt Garrish: base would force you to have all external resources, right? It exists, but I don't imagine anyone really going there

Dan Lazin: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

Marisa DeMeglio: there was a resolution a few weeks ago about dumping xml:base from the spec

Dave Cramer: and that's separate from the HTML base element
… i think i just want to say no root-relative URLs

Dan Lazin: if you set base to some website, and then use root-relative URLs, your URLs would appear to be relative, when they are actually absolute
… but maybe that's too far of a stretch

Dave Cramer: but can we really say anything about base because its part of HTML?

Matt Garrish: so you must not use root-relative URLs unless you use a base?
… but it also applies to SVGs, to the package document...

Dan Lazin: what was the harm in not banning root-relative?

Matt Garrish: because the RS might treat zip root as the root, but they could also treat location of package doc as root
… so no consistent root

Dan Lazin: maybe permit it, but use SHOULD NOT?
… is it acceptable for an author to write an epub for a specific RS?
… and where it has undefined behavior for other RS
… probably acceptable, right?

Dave Cramer: yes, e.g. with books that only work with iBooks because of scripting support

Matt Garrish: maybe just a note that root-relative could cause issues if authors use it?

Dave Cramer: so does that mean that there are epubs that could be built to work in some RS, but expose an interop issue if opened in another RS?

Matt Garrish: right
… usually this happens in epubs that try to go from one folder to a sibling folder
… but when all content is below the package document its fine
… but we don't specify that right now, only that content must be below the root

Dave Cramer: not sure what the right course of action is, but maybe we can continue this another time with Romain present

Wendy Reid: we need RS people here on next call that know exactly what RSes are doing right now

Marisa DeMeglio: one of the github threads has a sample, but I wasn't able to download it
… maybe if we wrote to the mailing list Romain could provide samples
… would also love to have a list of epubs that must absolutely continue to work

Dan Lazin: I have filed #1699

Matt Garrish: also, there's not much hand authoring, and most tools will put all the content into one folder
… we only ran into an issue with this with multiple renditions, and that hasn't really gone anywhere
… so is this maybe more of a theoretical issue

@iherman
Copy link
Member

iherman commented Jun 18, 2021

The issue was discussed in a meeting on 2021-06-18

  • no resolutions were taken
View the transcript

2. What is the relationship between URLs and the package doc (what is home?)

See github issue #1681, #1374, #1687, #1686.

Wendy Reid: we started this discussion last week. Core question is: Where is home (given we allow both relative and absolute URLs) in the epub context

Romain Deltour: we have to keep in mind: 1) what things have to be put in epub core spec, and 2) what are the rules for epub RS spec
… later is more important because we can say whatever we want in core, but authors may deviate, and then it is up to RS to decide how to react
… also, i think we should look into question of what is home first, and that will inform what to do with root-relative URLs

Wendy Reid: okay, so what is the IRI of the package document then?

Ivan Herman: we can't really answer what the IRI of the package is, and i'm not sure we should try
… rather, what do we expect RS to do conceptually?
… who epub structure relies on the idea that epub is kind of a frozen website
… i think we say this is the conceptual model within which epub exists, and we should not say exactly how RS can do that
… just as long as the observable behavior is identical
… so as long as after epub is unpacked there is a root that we can refer to, it is fine
… and whether this root is the same IRI of the package or not is none of our business

Matt Garrish: we have 2 issues, 1) are these resources within the container and how do we determine that? 2) what happens when you unpack, and where do these resources go?
… so I don't think there can be a consistent root unless we start to enforce these things
… inside epub resources can be within the container, but that might not be true once the epub is unpacked
… e.g. do you have to unpack everything in the zip? Or just whatever is in the epub under the package?

Brady Duga: so absolute URIs are not allowed, and what relative IRI is interpreted by the language in question (e.g. HTML, or CSS, depending on what type of document it is)
… so why do we have to define what root is if we don't allow absolute URIs?

Matt Garrish: i think the issue is root-relative is still a relative path, so do we have to say "all relative is allowed, except root relative"

Romain Deltour: even with regular relative URLs, the spec is silent on what happens if the relative URL tries to go below the container root?
… and is it possible to look at RSes today and test what they do?

Ivan Herman: i was surprised to find that some RS don't automatically unpack the whole zip
… i thought this was obvious
… but then what if there is a relative URL that is not on manifest, but also happens to be in zip?

Matt Garrish: we have requirement in OCF that all relative resources must resolve to something in container
… i don't think that was the issue

Gregorio Pellegrino: i know that Colibrio streams files out of zip without unzipping

Wendy Reid: yes, there are more examples of RS doing that beyond that

Ivan Herman: but conceptually an RS unpacks the whole zip file onto a domain (as if it were a file system). If we do that then all these concepts become clear
… but i'm not sure if a streaming based solution meets that conceptual model

Hadrien Gardeur: streaming from zip is what Readium does by default
… unzipping is a problem for DRM. Some expectation that you keep the epub zipped. And we've done some optimizations with this in mind

Romain Deltour: i'm surprised that resources that are not in the same directory tree as the OPF would not be accessible in the epub
… going back to the point about defining what should happen conceptually, the spec could say that we define a URL that must be used as the base when resolving relative URLs (e.g., https://ocf.example.org))

Ivan Herman: +1 to romain

Romain Deltour: this defines unambiguously how relative URLs are to be resolved
… and we can say this URL is the root of the OCF
… this makes it so that relative URLs cannot go outside of the container
… and then RSes know what relative URLs point to

Wendy Reid: going back to romain's point about testing, there are a variety of ways that RSes handle these URLs
… we are especially unsure what happens when files are outside the container
… so this is good reason to do some testing

Ivan Herman: would some sort of conceptual model clash with how things are implemented?

Hadrien Gardeur: we treat OPF as base, and that seems to work in most cases. Seems to make more sense to us than treating zip as base
… but these two are most common implementations

Matt Garrish: this originally came up in multiple renditions when we had issues referencing across sibling directories
… not sure if this is still an obstacle, worth testing

Romain Deltour: drawback of conceptual solution is that sometimes adding this layer of abstraction makes spec harder to use
… so we want to respect people who are actually having to implement it

Wendy Reid: is the best way forward at this point for us to do some sort of testing? (e.g. OPF as base, zip as base, examples of files living outside when OPF is base)

Ivan Herman: i think we should also test environment where multiple renditions is implemented
… if we end up with something that makes multiple renditions impossible, then we should just remove the multiple rendition note

Wendy Reid: do we know if a functioning implementation of multiple renditions?

Hadrien Gardeur: barnes and noble were using multiple renditions for newspapers and magazines
… not sure if they still use it

Wendy Reid: okay, so maybe we test on Nook app
… okay, so for now we test. Will have to ask Dan and the rest of the testing folk to help
… for now we don't have consensus on any sort of language, right?

@iherman
Copy link
Member

iherman commented Jun 25, 2021

The issue was discussed in a meeting on 2021-06-24

List of resolutions:

  • Resolution No. 1: Provide a note in the core spec that this is a known issue, include non-normative advice about what to do, close issue 1687
  • Resolution No. 2: Declare root relative paths not recommend (should not be used), close 1681
View the transcript

1. Refine the requirements on how RS must process the container structure

See github issue #1687, #1681, #1686.

Wendy Reid: per discussion last week, mgarrish made us a test epub for this
… we've put it through various RS, Apple, Thorium, Colibrio, Kobo Desktop, Kobo iOS, ADE, more...
… aside from Apple and ADE, the test epub has worked
… it seems like most RS are flexible in their sourcing, but with our two fail cases, there is some variability in implementation

Brady Duga: and most of this was done via sideloading, and publisher pipelines are often different
… if publisher sent apple a book, we might have gotten a different result

Matt Garrish: we still have the problem that the spec doesn't say anything about this. There is no authoring requirement for where to put your content (other that below the root). And for RS there is no requirement for how to unpack, etc.
… it seems like it should be common sense. But beyond what we've already said, not sure what we should do. Maybe note it as a potential issue?

Wendy Reid: it probably doesn't hurt to refine language, but at this point creating a firm requirement would impact some existing RS implementations
… and it might make authors uncomfortable
… do we note that there is some confusion as to implementation, but clarify that we aren't going to enforce anything?

Matt Garrish: easiest solution is probably an authoring requirement. Esp. because most authors have probably never tried to do anything like the test epub
… so say authors should put their content under the package document

Brady Duga: this has been an issue forever, and the only time we noticed was with multiple renditions, which hasn't been implemented really. So is a 3rd solution to just leave it?
… if some publisher creates an epub that just doesn't work on Apple, maybe that can just be between that specific publisher and Apple...

Matt Garrish: this whole thing really only came up because of that root-relative thing, so on that issue maybe we just say not to use those

Wendy Reid: right, so we advise not to use root-relative, and we can't say specifically how RS will behave if you do it

Matt Garrish: can we resolve just to use something similar to the note we were going to have for multiple renditions?

Proposed resolution: Provide a note in the core spec that this is a known issue, include non-normative advice about what to do, close issue 1687 (Wendy Reid)

Brady Duga: +1

Wendy Reid: +1

Matthew Chan: +1

Matt Garrish: +1

Masakazu Kitahara: +1

Ben Schroeter: +1

Toshiaki Koike: +1

Shinya Takami (高見真也): +1

Resolution #1: Provide a note in the core spec that this is a known issue, include non-normative advice about what to do, close issue 1687

Wendy Reid: the other two related issues first are root relative paths valid? is this now moot?

Matt Garrish: i think we are on safer ground to just disallow those, especially because in the past epubcheck has had those come up as an error
… it may work on some RS, but that's fine

Proposed resolution: Declare root relative paths not recommend (should not be used), close 1681 (Wendy Reid)

Wendy Reid: +1

Matthew Chan: +1

Matt Garrish: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Ben Schroeter: +1

Brady Duga: +1

Resolution #2: Declare root relative paths not recommend (should not be used), close 1681

Wendy Reid: the second one: what should RS do when manifest item has duplicate entries?
… this is worth testing (and should be easy enough to test)

Matt Garrish: i think the issue with this was that if there were multiple copies of the same item in manifest, then RS might not know which manifest item to go to when one copy is referenced

@iherman
Copy link
Member

iherman commented Jul 2, 2021

The issue was discussed in a meeting on 2021-07-02

  • no resolutions were taken
View the transcript

2. Are root-relative paths valid?

See github issue #1681, #1374.

See github pull request #1725.

Dave Cramer: What more needs to happen or can happen in the spec for root-relative paths?

Ivan Herman: one problem we need to address is that we have a problem with iBooks and others that rely on Adobe ADE, namely that they rely on a specific way of organizing the files, which is not in the standard.
… Matt's test was done according to the standard, but iBooks and others get it wrong. We can either acknowledge that problem as a warning and keep the standard as is (iBooks doesn't conform), or we reverse-engineer and put into standard a restricted version of how files can be organized, in order to conform with iBooks. We need to decide if this will harm current eBooks.
… I personally would hate to put restrictions in the standard, but that's just me

Romain Deltour: the test was done with valid ePub with shared resources - there is still the issue of root-relative URL paths and paths that would go outside the container. I think we need the spec to address that.
… some kind of language defining the root is likely necessary.
… and review interoperability with reading systems.

John Foliot: Is an unintended consequence that a publisher would have to create two versions, one for iBooks and another for other reading systems?

Dave Cramer: I don't see huge problems around interoperability because EPUBs are consistent with folder structure, generally.

Ivan Herman: Whatever works for iBooks works for others - but there are perfectly valid ePubs that iBooks doesn't take.
… As for the questions of Romain, we have decided that path relative URLs shouldn't be used, and paths shouldn't go outside the package. We need to make this clear in the documentation but there is not a fundamental technical problem with this.

Romain Deltour: these are edge cases, we don't see this problem often if ever.
… What we have is a recommendation for authors, but we need a recommendation for reading systems on how to process URLs.
… How should a reading system deal when authors don't follow recommendation.

Ivan Herman: it would be helpful to have a clearly-worded proposal for reading systems. Hoping Romain's help with this.

Dave Cramer: everyone seems to agree that having ../.. etc. to outside the package is not a good practice.

Hadrien Gardeur: from a reading system perspective, they need to resolve URIs, and expose the HTML resource (or any resource) to web view.
… reading systems have different ways of doing this, but you need to get the web view to do what you want, and how this is achieved can impact what we are discussing.

Ivan Herman: What precisely should the recommendation in the reading system spec be to cover all implementations?

Hadrien Gardeur: we don't know how each RS works behind the scenes, we can only speculate.

Ivan Herman: If we put something in the spec, it's up to RS how to implement
… we don't have to define that.
… Whatever we do, the author of an EPUB should have a clear mental model of what's happening. The RS implementation is not under the author's influence. If we are saying EPUb is a website in a box, we should be able to clearly define the root, and stop there.

Hadrien Gardeur: On the web, we don't think about files and root containers. For reading systems, we are deciding how an EPUB behaves. So weary of this conceptual approach.

Dave Cramer: we are really talking about edge cases here. Hoping that we can build some tests based on the write-up and what we are trying to achieve.
… hoping we can get clear enough to cover our edge cases without restricting RS implementation.

Hadrien Gardeur: difficult to test everywhere - gets tricky when you have to consider different CSS, etc

Dave Cramer: let's get some proposals down with Romain's help, and get Matt to take a look at them, and proceed from there.

Ivan Herman: Must have a clear statement somewhere whether we intend to restrict EPUB content and define organization of EPUB package.

@dauwhe dauwhe added Agenda+ F2F Possible agenda item for F2F and removed Agenda+ Issues that should be discussed during the next working group call. labels Aug 25, 2021
@iherman
Copy link
Member

iherman commented Oct 30, 2021

The issue was discussed in a meeting on 2021-10-29

  • no resolutions were taken
View the transcript

2.2. Are root-relative paths valid? (issue epub-specs#1681)

See github issue epub-specs#1681.

Romain Deltour: the issue is on how we resolve relative URLs in EPUB.
… this creates a lot of questions, since the URL parser algorith take two arguments as input.
… the relative URL and the root URL.
… so we started to question about which is the root URL for EPUBs.
… the spec already as a paragraph on URLs in EPUBs (in the OCF section).
… we required for those URLs to be resolved as root relative URLs.
… so I don't understand why for OCF is ok to define it, but for EPUB contents is not.
… I think we should concentrate on how to resolve URLs in a general way.

Dave Cramer: See see epubcheck issue.

Ivan Herman: are you suggesting to use OCF section on URLs for the other documents?.

Romain Deltour: not really, I think there are other issues on URLs.
… because the OCF section is underspecified.

@mattgarrish mattgarrish added EPUB33 Issues addressed in the EPUB 3.3 revision and removed Agenda+ F2F Possible agenda item for F2F labels Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants