-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine the requirements on how RS must process the container directory structure #1687
Comments
Copying previous comments made in #1374: from @mattgarrish
and:
and @iherman (about Matt's last paragraphs):
also from @mattgarrish
|
Picking from @mattgarrish:
I do not think it would be particularly shocking to require this. |
If we require all the content be unpacked, then conceivably the reading system has to produce the "/EPUB" or "/OEBPS" directory that the vast majority of EPUBs use - so you'd always have this directory on top of whatever subdomain you serve the publication from (if that's how you serve the epub). For the majority of EPUBs, the only thing this accomplishes is making the generally useless meta-inf directory available. That's why I wonder if the requirement should be changed to authoring - publication resources may be located anywhere in the abstract container provided they are at or below the directory that contains the package file(s). Multiple renditions already suggests you do this, and I suspect most single-rendition publications don't encounter the problem, so it shouldn't be backwards-breaking. But doing this also makes the rules on relative paths more complex, as there would be a rule that the references not resolve outside the container and, for content, that they not resolve to a directory above the package document. |
To finish up the thought, if we disallow root-relative paths, then with such an authoring restriction it doesn't matter if the reading system unpacks the ocf root or only the directory with the content. The resources will all be there and the relative paths to them in the content will always work. Another option, anyway. |
The requirement on reading system is conceptual and not a hard requirement on how they MUST implement things. If they want to optimize things, they are free to do so. But by making this conceptual requirement we have a clear framework to define things unambiguously and that, at the end of the day, what counts.
I do not think we should introduce a structural requirement at this point. We do not know how EPUB files are structured in the wild, and we do not want to create backward incompatibilities... |
But that's essentially saying that what they do right now is fine, no? If there's no requirement, then unpacking and discarding everything but the folder where package document is located is as legitimate as not unpacking everything but the directory where the package document is located. You net access to the same set of files. |
Well... I may be lost. What I understood in this and the other thread was that, in some cases, content with a relative URL-s are not found because it is not unpacked. My reaction was that everything should be (conceptually) unpacked, ie, if that is what happens then it is an RS bug. But maybe I need a hard reset to understand the problem :-( |
Right, this is one of the problems of the current state of affairs. I think we're maybe talking across each other right now. The other problem is why do we say that authors can locate their content anywhere below the root but then have no rules on unpacking or requirement that all the content in the container be available? It's a major gotcha for authoring that isn't explained anywhere. I'm not proposing this as a resolution to the question of how do we determine what is in the container. I'm suggesting, given the known state of the world, that we may want to have a recommendation/requirement for authors to make sure their content is structured in a way that won't cause them unexpected grief in the reading systems that don't unpack everything. That's probably more realistic than expecting reading systems to change. We'll still need to figure out how to consistently check what is in the container and/or what is below the package document, but that may be more of a conceptual problem to work out, as you say. |
Ah. So we should probably, beyond clarifying the full conceptual unpacking,:
Would that cover our issues? |
The issue was discussed in a meeting on 2021-06-18
View the transcript2. What is the relationship between URLs and the package doc (what is home?)See github issue #1681, #1374, #1687, #1686. Wendy Reid: we started this discussion last week. Core question is: Where is home (given we allow both relative and absolute URLs) in the epub context Romain Deltour: we have to keep in mind: 1) what things have to be put in epub core spec, and 2) what are the rules for epub RS spec Wendy Reid: okay, so what is the IRI of the package document then? Ivan Herman: we can't really answer what the IRI of the package is, and i'm not sure we should try Matt Garrish: we have 2 issues, 1) are these resources within the container and how do we determine that? 2) what happens when you unpack, and where do these resources go? Brady Duga: so absolute URIs are not allowed, and what relative IRI is interpreted by the language in question (e.g. HTML, or CSS, depending on what type of document it is) Matt Garrish: i think the issue is root-relative is still a relative path, so do we have to say "all relative is allowed, except root relative" Romain Deltour: even with regular relative URLs, the spec is silent on what happens if the relative URL tries to go below the container root? Ivan Herman: i was surprised to find that some RS don't automatically unpack the whole zip Matt Garrish: we have requirement in OCF that all relative resources must resolve to something in container Gregorio Pellegrino: i know that Colibrio streams files out of zip without unzipping Wendy Reid: yes, there are more examples of RS doing that beyond that Ivan Herman: but conceptually an RS unpacks the whole zip file onto a domain (as if it were a file system). If we do that then all these concepts become clear Hadrien Gardeur: streaming from zip is what Readium does by default Romain Deltour: i'm surprised that resources that are not in the same directory tree as the OPF would not be accessible in the epub
Romain Deltour: this defines unambiguously how relative URLs are to be resolved Wendy Reid: going back to romain's point about testing, there are a variety of ways that RSes handle these URLs Ivan Herman: would some sort of conceptual model clash with how things are implemented? Hadrien Gardeur: we treat OPF as base, and that seems to work in most cases. Seems to make more sense to us than treating zip as base Matt Garrish: this originally came up in multiple renditions when we had issues referencing across sibling directories Romain Deltour: drawback of conceptual solution is that sometimes adding this layer of abstraction makes spec harder to use Wendy Reid: is the best way forward at this point for us to do some sort of testing? (e.g. OPF as base, zip as base, examples of files living outside when OPF is base) Ivan Herman: i think we should also test environment where multiple renditions is implemented Wendy Reid: do we know if a functioning implementation of multiple renditions? Hadrien Gardeur: barnes and noble were using multiple renditions for newspapers and magazines Wendy Reid: okay, so maybe we test on Nook app |
I've hacked up one of the multiple rendition samples to very basically test whether resources in a sister directory to the opf file can be accessed (attaching with .zip to post, so just delete the extra extension). First test in Apple Books and the images and css were not rendered (kind of a big problem). Thorium displays the book fine, as did the dropbox viewer. I'll try some other apps tomorrow, but anyone who wants to try it out feel free to post what results you get. (Interestingly, I did open an issue about this after the AHL work finished but it got reassigned back to the MR spec to address in 3.1: #619. To show how shot my brain is when it comes to multiple renditions, the note about the problem was added earlier in this revision as part of clearing off the open issues.) |
Ouch. It works with colibrio (on my mac) and on the Firefox extension to read epub. On my iPad it does not run with the (old) bluefire reader, on aldiko, and, as you said, on Apple Books. It works with Marvin; I do not remember how to side-load books to the Kobo or Google player. It is a mess. |
Ya, I just tried the Adobe Digital Editions app on Windows and it didn't render the css or images, but the Google Play chrome app did. I was sort of hoping with time I'd be proven wrong and we wouldn't have to deal with this, but looks like the same state of affairs we discovered with multiple renditions. |
Tried this book on the Kobo iOS app and the desktop application, both worked just fine. I also tested this on VoiceDream Reader, it also rendered fine. |
The issue was discussed in a meeting on 2021-06-24 List of resolutions:
View the transcript1. Refine the requirements on how RS must process the container structureSee github issue #1687, #1681, #1686. Wendy Reid: per discussion last week, mgarrish made us a test epub for this Brady Duga: and most of this was done via sideloading, and publisher pipelines are often different Matt Garrish: we still have the problem that the spec doesn't say anything about this. There is no authoring requirement for where to put your content (other that below the root). And for RS there is no requirement for how to unpack, etc. Wendy Reid: it probably doesn't hurt to refine language, but at this point creating a firm requirement would impact some existing RS implementations Matt Garrish: easiest solution is probably an authoring requirement. Esp. because most authors have probably never tried to do anything like the test epub Brady Duga: this has been an issue forever, and the only time we noticed was with multiple renditions, which hasn't been implemented really. So is a 3rd solution to just leave it? Matt Garrish: this whole thing really only came up because of that root-relative thing, so on that issue maybe we just say not to use those Wendy Reid: right, so we advise not to use root-relative, and we can't say specifically how RS will behave if you do it Matt Garrish: can we resolve just to use something similar to the note we were going to have for multiple renditions?
Wendy Reid: the other two related issues first are root relative paths valid? is this now moot? Matt Garrish: i think we are on safer ground to just disallow those, especially because in the past epubcheck has had those come up as an error
Wendy Reid: the second one: what should RS do when manifest item has duplicate entries? Matt Garrish: i think the issue with this was that if there were multiple copies of the same item in manifest, then RS might not know which manifest item to go to when one copy is referenced |
Test book works in Kindle Previwer 3 for Mac. Fails in ADE 4.5 on Mac. Fails in iBooks as we already knew. Sigh. |
The EPUB core spec says in the File and Directory Structure section:
But @mattgarrish reports in #1374 that some reading systems do not handle that correctly. They don't allow content that is not in the directory of the Package Document (or any descendant directory).
There are at least two (exclusive) options:
The text was updated successfully, but these errors were encountered: