Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file URLs in manifest items #1688

Closed
rdeltour opened this issue May 27, 2021 · 2 comments · Fixed by #1703
Closed

file URLs in manifest items #1688

rdeltour opened this issue May 27, 2021 · 2 comments · Fixed by #1703
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision PR required Spec-ReadingSystems The issue affects the EPUB Reading Systems 3.3 Recommendation Topic-OCF The issue affects the OCF section of the core EPUB 3 specification

Comments

@rdeltour
Copy link
Member

Manifest items can identify resources with absolute URLs. So an EPUB can theoretically use local file system resource with file URLs.

Using file URLs in the manifest is not a good practice, but has never been strictly forbidden (as far as I know). I think it's probably often used by mistake rather than intentionally. But there might be legit use cases, like an internal documentation system (as @mattgarrish pointed out in #1374).

Would it be reasonable to say that absolute URLs SHOULD have a special scheme that is not file?
In parallel we can also make it explicit what Reading Systems can/should do with absolute URLs as manifest entries.

@iherman
Copy link
Member

iherman commented May 28, 2021

Would it be reasonable to say that absolute URLs SHOULD have a special scheme that is not file?

+1 to that!

@mattgarrish mattgarrish added Spec-ReadingSystems The issue affects the EPUB Reading Systems 3.3 Recommendation Topic-OCF The issue affects the OCF section of the core EPUB 3 specification labels May 28, 2021
@dauwhe dauwhe added the Agenda+ Issues that should be discussed during the next working group call. label Jun 9, 2021
@iherman
Copy link
Member

iherman commented Jun 11, 2021

The issue was discussed in a meeting on 2021-06-10

List of resolutions:

  • Resolution No. 2: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688
View the transcript

2. URLs and the package document

See github issue #1681, #1374, #1688, #1686.

Dave Cramer: this is a bunch of issues that revolve around how you interpret URLs in the package document, especially if they're absolute URLs
… came from an issue in epubcheck
… and there's also an older issue about what the IRI of the package document is
… or what if there are file scheme URLs in the manifest
… and what happens if two URLs resolve to the same item in the manifest?

Matt Garrish: in epubcheck there was a root-relative URL that caused an error, and that spawned all of this
… e.g. "/something/thing"
… so what is the root of the epub?
… to me it doesn't make sense that we even allow these root-relative URLs
… the root differs based on the RS
… and Romain mentioned that we require that all resources resolve to something inside container, but depending on what RS does, there is even ambiguity about what that even is

Dave Cramer: in issue 1688 Romain he suggests that manifest items should have one of the special schemes (except file:)

Matt Garrish: there are edge cases where file scheme items make sense, but not generally for epub

Dave Cramer: it goes against epub as a portable format, and the file scheme ties the epub to a specific file system
… how much out there does have file URLs on purpose, not by accident?

Matt Garrish: never heard of one
… and they'd end up being remote resources

Dave Cramer: okay, so what if we just say no file URLs in epub?
… what is the risk that we break something?
… maybe this is something where we try to enforce it and see if anyone complains

Matt Garrish: most RS probably won't do anything with file URL
… probably security concern

Wendy Reid: depending on platform you might not even be able to access parts of the file system (e.g. iOS apps)

Dave Cramer: can we start by resolving on this point from 1688?

Proposed resolution: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688 (Wendy Reid)

Dave Cramer: +1

Matthew Chan: +1

Matt Garrish: +1

Wendy Reid: +1

Toshiaki Koike: +1

Shinya Takami (高見真也): +1

Ben Schroeter: +1

Resolution #2: Absolute URLs for manifest items should have a special scheme that is not file:, close issue 1688

Dan Lazin: is there a use case for some of these other schemes? Why would you have an FTP in your epub?

Matt Garrish: if we go too far, do we prevent future stuff? will we have to come back and re-add this in the future?
… FTP kind of fits within the web framework
… maybe we just leave it to authors to stick with HTTP, HTTPS, etc.

Ben Schroeter: is the idea that if we disallow file scheme, then we also disallow "slash URLs"?

Matt Garrish: not sure those are the same
… i think 1681 is contingent on us forcing RS to unpack epub in a certain way
… otherwise we can't say there is a single consistent root that can be referenced
… and we don't tell RS how/where to unpack right now
… this kind of came up 5 years ago with multiple rendition, but we left it buried in the discussions we had

Dave Cramer: what would be the consequences of forbidding root-relative paths?

Matt Garrish: not sure there are any, because epubcheck had forbidden these until a recent update
… we're reasonably safe from backwards compatibility point of view

Dave Cramer: and this is just for href on manifest?

Matt Garrish: no, this would be anywhere, e.g. in content docs too
… all the "../" stuff would still be okay
… i proposed somewhere that we say all content must be below the packat document
… if we could enforce an authoring requirement that made a root, then we could enforce these relative paths
… but maybe its cleaner to just disallow them

Dan Lazin: do we support the base tag?
… and does that have implications for the handling of these issues?

Dave Cramer: we've been phasing out xml:base, its been forbidden from package file for example

Dan Lazin: the base tag allows you to define what the relative path is relative to
… so if we're allowing or disallowing certain types of URLs, maybe we should take a stance on base too
… not sure what stance though

Matt Garrish: base would force you to have all external resources, right? It exists, but I don't imagine anyone really going there

Dan Lazin: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

Marisa DeMeglio: there was a resolution a few weeks ago about dumping xml:base from the spec

Dave Cramer: and that's separate from the HTML base element
… i think i just want to say no root-relative URLs

Dan Lazin: if you set base to some website, and then use root-relative URLs, your URLs would appear to be relative, when they are actually absolute
… but maybe that's too far of a stretch

Dave Cramer: but can we really say anything about base because its part of HTML?

Matt Garrish: so you must not use root-relative URLs unless you use a base?
… but it also applies to SVGs, to the package document...

Dan Lazin: what was the harm in not banning root-relative?

Matt Garrish: because the RS might treat zip root as the root, but they could also treat location of package doc as root
… so no consistent root

Dan Lazin: maybe permit it, but use SHOULD NOT?
… is it acceptable for an author to write an epub for a specific RS?
… and where it has undefined behavior for other RS
… probably acceptable, right?

Dave Cramer: yes, e.g. with books that only work with iBooks because of scripting support

Matt Garrish: maybe just a note that root-relative could cause issues if authors use it?

Dave Cramer: so does that mean that there are epubs that could be built to work in some RS, but expose an interop issue if opened in another RS?

Matt Garrish: right
… usually this happens in epubs that try to go from one folder to a sibling folder
… but when all content is below the package document its fine
… but we don't specify that right now, only that content must be below the root

Dave Cramer: not sure what the right course of action is, but maybe we can continue this another time with Romain present

Wendy Reid: we need RS people here on next call that know exactly what RSes are doing right now

Marisa DeMeglio: one of the github threads has a sample, but I wasn't able to download it
… maybe if we wrote to the mailing list Romain could provide samples
… would also love to have a list of epubs that must absolutely continue to work

Dan Lazin: I have filed #1699

Matt Garrish: also, there's not much hand authoring, and most tools will put all the content into one folder
… we only ran into an issue with this with multiple renditions, and that hasn't really gone anywhere
… so is this maybe more of a theoretical issue

@wareid wareid removed the Agenda+ Issues that should be discussed during the next working group call. label Jun 16, 2021
iherman added a commit that referenced this issue Jun 16, 2021
The [WG resolution](https://www.w3.org/publishing/groups/epub-wg/Meetings/Minutes/2021-06-10-epub#resolution2) says "SHOULD NOT" and not "MUST NOT", so this is what the PR uses. There may be a question whether "MUST NOT" is more appropriate here.

Fixes #1688.
@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB33 Issues addressed in the EPUB 3.3 revision PR required Spec-ReadingSystems The issue affects the EPUB Reading Systems 3.3 Recommendation Topic-OCF The issue affects the OCF section of the core EPUB 3 specification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants