-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define what RS should do when the manifest has duplicate entries #1686
Comments
Copying what @mattgarrish said in #1374
|
Although, on consideration, this is probably more complicated than simply ignoring the entries as the spine could have separate references to each entry. It's more of a lookup consideration - the reading system should use the first manifest item in document order that matches the resource in the case of duplicate entries that resolve to the same resource. It would be helpful to know what, if anything, existing reading systems do with duplicate references to a resource, though. @wareid @danielweck @bduga ? |
The issue was discussed in a meeting on 2021-06-10 List of resolutions:
View the transcript2. URLs and the package documentSee github issue #1681, #1374, #1688, #1686. Dave Cramer: this is a bunch of issues that revolve around how you interpret URLs in the package document, especially if they're absolute URLs Matt Garrish: in epubcheck there was a root-relative URL that caused an error, and that spawned all of this Dave Cramer: in issue 1688 Romain he suggests that manifest items should have one of the special schemes (except Matt Garrish: there are edge cases where file scheme items make sense, but not generally for epub Dave Cramer: it goes against epub as a portable format, and the file scheme ties the epub to a specific file system Matt Garrish: never heard of one Dave Cramer: okay, so what if we just say no file URLs in epub? Matt Garrish: most RS probably won't do anything with file URL Wendy Reid: depending on platform you might not even be able to access parts of the file system (e.g. iOS apps) Dave Cramer: can we start by resolving on this point from 1688?
Dan Lazin: is there a use case for some of these other schemes? Why would you have an FTP in your epub? Matt Garrish: if we go too far, do we prevent future stuff? will we have to come back and re-add this in the future? Ben Schroeter: is the idea that if we disallow file scheme, then we also disallow "slash URLs"? Matt Garrish: not sure those are the same Dave Cramer: what would be the consequences of forbidding root-relative paths? Matt Garrish: not sure there are any, because epubcheck had forbidden these until a recent update Dave Cramer: and this is just for href on manifest? Matt Garrish: no, this would be anywhere, e.g. in content docs too Dan Lazin: do we support the base tag? Dave Cramer: we've been phasing out Dan Lazin: the base tag allows you to define what the relative path is relative to Matt Garrish: base would force you to have all external resources, right? It exists, but I don't imagine anyone really going there
Marisa DeMeglio: there was a resolution a few weeks ago about dumping Dave Cramer: and that's separate from the HTML base element Dan Lazin: if you set base to some website, and then use root-relative URLs, your URLs would appear to be relative, when they are actually absolute Dave Cramer: but can we really say anything about base because its part of HTML? Matt Garrish: so you must not use root-relative URLs unless you use a base? Dan Lazin: what was the harm in not banning root-relative? Matt Garrish: because the RS might treat zip root as the root, but they could also treat location of package doc as root Dan Lazin: maybe permit it, but use SHOULD NOT? Dave Cramer: yes, e.g. with books that only work with iBooks because of scripting support Matt Garrish: maybe just a note that root-relative could cause issues if authors use it? Dave Cramer: so does that mean that there are epubs that could be built to work in some RS, but expose an interop issue if opened in another RS? Matt Garrish: right Dave Cramer: not sure what the right course of action is, but maybe we can continue this another time with Romain present Wendy Reid: we need RS people here on next call that know exactly what RSes are doing right now Marisa DeMeglio: one of the github threads has a sample, but I wasn't able to download it
Matt Garrish: also, there's not much hand authoring, and most tools will put all the content into one folder |
The issue was discussed in a meeting on 2021-06-18
View the transcript2. What is the relationship between URLs and the package doc (what is home?)See github issue #1681, #1374, #1687, #1686. Wendy Reid: we started this discussion last week. Core question is: Where is home (given we allow both relative and absolute URLs) in the epub context Romain Deltour: we have to keep in mind: 1) what things have to be put in epub core spec, and 2) what are the rules for epub RS spec Wendy Reid: okay, so what is the IRI of the package document then? Ivan Herman: we can't really answer what the IRI of the package is, and i'm not sure we should try Matt Garrish: we have 2 issues, 1) are these resources within the container and how do we determine that? 2) what happens when you unpack, and where do these resources go? Brady Duga: so absolute URIs are not allowed, and what relative IRI is interpreted by the language in question (e.g. HTML, or CSS, depending on what type of document it is) Matt Garrish: i think the issue is root-relative is still a relative path, so do we have to say "all relative is allowed, except root relative" Romain Deltour: even with regular relative URLs, the spec is silent on what happens if the relative URL tries to go below the container root? Ivan Herman: i was surprised to find that some RS don't automatically unpack the whole zip Matt Garrish: we have requirement in OCF that all relative resources must resolve to something in container Gregorio Pellegrino: i know that Colibrio streams files out of zip without unzipping Wendy Reid: yes, there are more examples of RS doing that beyond that Ivan Herman: but conceptually an RS unpacks the whole zip file onto a domain (as if it were a file system). If we do that then all these concepts become clear Hadrien Gardeur: streaming from zip is what Readium does by default Romain Deltour: i'm surprised that resources that are not in the same directory tree as the OPF would not be accessible in the epub
Romain Deltour: this defines unambiguously how relative URLs are to be resolved Wendy Reid: going back to romain's point about testing, there are a variety of ways that RSes handle these URLs Ivan Herman: would some sort of conceptual model clash with how things are implemented? Hadrien Gardeur: we treat OPF as base, and that seems to work in most cases. Seems to make more sense to us than treating zip as base Matt Garrish: this originally came up in multiple renditions when we had issues referencing across sibling directories Romain Deltour: drawback of conceptual solution is that sometimes adding this layer of abstraction makes spec harder to use Wendy Reid: is the best way forward at this point for us to do some sort of testing? (e.g. OPF as base, zip as base, examples of files living outside when OPF is base) Ivan Herman: i think we should also test environment where multiple renditions is implemented Wendy Reid: do we know if a functioning implementation of multiple renditions? Hadrien Gardeur: barnes and noble were using multiple renditions for newspapers and magazines Wendy Reid: okay, so maybe we test on Nook app |
The issue was discussed in a meeting on 2021-06-24 List of resolutions:
View the transcript1. Refine the requirements on how RS must process the container structureSee github issue #1687, #1681, #1686. Wendy Reid: per discussion last week, mgarrish made us a test epub for this Brady Duga: and most of this was done via sideloading, and publisher pipelines are often different Matt Garrish: we still have the problem that the spec doesn't say anything about this. There is no authoring requirement for where to put your content (other that below the root). And for RS there is no requirement for how to unpack, etc. Wendy Reid: it probably doesn't hurt to refine language, but at this point creating a firm requirement would impact some existing RS implementations Matt Garrish: easiest solution is probably an authoring requirement. Esp. because most authors have probably never tried to do anything like the test epub Brady Duga: this has been an issue forever, and the only time we noticed was with multiple renditions, which hasn't been implemented really. So is a 3rd solution to just leave it? Matt Garrish: this whole thing really only came up because of that root-relative thing, so on that issue maybe we just say not to use those Wendy Reid: right, so we advise not to use root-relative, and we can't say specifically how RS will behave if you do it Matt Garrish: can we resolve just to use something similar to the note we were going to have for multiple renditions?
Wendy Reid: the other two related issues first are root relative paths valid? is this now moot? Matt Garrish: i think we are on safer ground to just disallow those, especially because in the past epubcheck has had those come up as an error
Wendy Reid: the second one: what should RS do when manifest item has duplicate entries? Matt Garrish: i think the issue with this was that if there were multiple copies of the same item in manifest, then RS might not know which manifest item to go to when one copy is referenced |
How about something like: When presented with a single manifest item that is repeated multiple times in the linear flow of the spine, reading systems should do their best to display that content in the correct location of that linear flow. The reading system should treat these as distinct pages for UI purposes (for example, each occurrence could be independently bookmarked or annotated), but when following an internal link to that item the reading system should move to the position of the first occurrence of the document in the linear flow. |
The issue was discussed in a meeting on 2021-10-29
View the transcript2.1. Define what RS should do when the manifest has duplicate entries (issue epub-specs#1686)See github issue epub-specs#1686. Dave Cramer: Question - how should a reading system handle this situation?. Romain Deltour: I think that we need to hear from Reading Systems. Hadrien Gardeur: For reading systems, duplicated resources are not really an issue when moving forward/backward in the spine. It becomes an issue when you need to "jump" to a resource (link or ToC), since we don't know where to jump to.. Dave Cramer: we have a proposal if there's duplicate items to just ignore, but the first one. Brady Duga: I think we don't have EPUBs like that. Romain Deltour: yes, now it is picked by EPUBcheck. Rick Johnson: we also filters EPUBs via EPUBcheck. Brady Duga: it may be an issue for linking. Dan Lazin: since we don't have a definition in the spec, but we are blocking it via EPUBcheck... I think it is not really important. Matt Garrish: I think there should be a consistent manage of non conformant EPUBs. Dave Cramer: my question is: is there any interoperable problem to solve?. Hadrien Gardeur: I think we don't know enough how RSs handle this problem. Laurence Zaysser: I think it's an authoring problem. Ivan Herman: as an editor I think the core spec document have to say that the elements must not be repeated. Matt Garrish: It's difficult to answer... I think that adding a guideline in the RSs spec should be good. Brady Duga: as a RS point of view I have multiple questions: links, display, annotations, bookmarks, etc.. Laurence Zaysser: I think this case may happen in text-books. Ivan Herman: maybe the solution is to remove the duplicate content. Dave Cramer: I think we should not remove content. Matt Garrish: I don't link the idea to hide the content, but I don't want to allow them on the authors side.
Rick Johnson: I agree with comments from Brady and Matt, that showing it benefits the user (they see what the author wanted), the only issue is clicking on a link will take them to an unexpected place in the reading order (which is the bug/issue we can point to). Brady Duga: I may write something as a proposal for the note.
Dave Cramer: ok, we ask brady to propose a text and then we can have a resolution in an another meeting. |
My understanding is that PR #1889 only tackles one part of the issue: a same item referenced multiple times in the spine. But there is another aspect (in fact, the primary issue presented in the OP), which is two different manifest For instance the OPF has:
This is disallowed in EPUB too, but this trick could be used to circumvent a logic based only on spine-level references. From the spine point of view, you refer to two different |
The EPUB core spec forbids two
item
elements to identify the same resource. But the RS spec could (and should IMO) define how an RS must handle such non-conforming EPUBs.The text was updated successfully, but these errors were encountered: