Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to support Structural Navigation #2320

Open
brittnylapierre opened this issue Oct 25, 2024 · 7 comments
Open

How to support Structural Navigation #2320

brittnylapierre opened this issue Oct 25, 2024 · 7 comments

Comments

@brittnylapierre
Copy link

brittnylapierre commented Oct 25, 2024

Edit: This issue was moved from the Cookbook Recipes repo. We are looking for technical specifications writers to help us determine the best solution in the IIIF context for detailed structural navigation. See the big comment I added for detailed context: #2320 (comment)

Recipe Name

Structural navigation using Ranges: 2 Ways

Use case

You can enhance your manifest by adding Ranges that reference specific parts of canvases. By using the label field to describe these references with text, you can create structured navigation similar to a tagged PDF.
Learn more about structural navigation here.

Way 1: Attaching Annotations to a Range via a supplementary AnnotationCollection
Example manifest: https://upcdn.io/kW15cD4/raw/html-annots-3-anns-in-range1.json

Pros

  • most OCR/pdf to IIIF manifest workflows create annotations
  • paging abilities
  • can make HTML annotations for rendering (HTML is a very accessible format)

Cons

  • It looks like viewers aren't set up to display the supplementary annotations under the range titles like how tagged PDFs display structured nav

Example of what the viewers should support - displaying annotations with the range titles:
image

Way 2: Using ranges with label fields, referencing positional canvas URLs
Example manifest: https://upcdn.io/kW15cD4/raw/partial-ranges-nav.json
Pros

  • simpler to set up
  • supported by viewers already

Cons

  • current ocr and pdf->manifest workflows usually use annotations to transcribe text

Theseus screenshot:
image

@brittnylapierre
Copy link
Author

Related: IIIF/cookbook-recipes#28

@brittnylapierre brittnylapierre changed the title Structural Navigation with Ranges: 2 ways Structural Navigation with Ranges: 2 Ways Oct 25, 2024
@kirschbombe
Copy link
Contributor

kirschbombe commented Oct 25, 2024

Hi, @brittnylapierre - Thanks for creating the new issue. This is useful for documenting some of the use cases that we should be thinking about for Presi 4 and accessibility. It might be helpful if you could begin the use case with a bit more about what you are trying to achieve and why, more of the problem you are trying to solve rather that the proposed solution. For example, is this to assist screen readers? How is this meant to enhance accessibility?

@brittnylapierre
Copy link
Author

Thank you! I will edit it - but for a short answer it is for screen reader accessibility, yes :)

@glenrobson
Copy link
Member

Questions:

  • What functonality does tagged PDFs give with a screen reader?
  • What would happen if there are two table of contents one for chapters and one for accessiability?
  • Desired outcome of having this data in a manifest.

@brittnylapierre
Copy link
Author

brittnylapierre commented Oct 27, 2024

Desired outcome of having this data in a manifest

Enable IIIF viewer developers to provide a universal experience of exploring IIIF content, regardless of a user's mode of access. (Ideally, having content rendered as hierarchical, accessible HTML with text-based data.)

All of this background info results in people using screen readers having to download a tagged PDF to access the content in manifests for library and archive print materials, instead of being able to use the content from object pages on access websites directly in IIIF viewers, like other users.

What is the ideal way for IIIF to support very detailed, hierarchical, screen reader friendly representations of IIIF images within manifests?

What functionality does tagged PDFs give with a screen reader?

About screen readers:

  1. Screen reader users navigate content linearly, moving from one element to the next using arrow keys or the tab key.
  2. HTML is the most compatible format for screen reader technology. Lots of screen reader tech are able to effectively scan HTML - allowing users various hot-keys and helpful tools for navigating web pages more efficiently.

Through IIIF, some adopters currently use ranges for table of contents ability, and annotations created from OCR and AI solutions for displaying content of their IIIF images in screen readable text.

Some adopters also use the rendering field to provide alternative representations of their content, for example, tagged PDFs. Tagged PDFs are very labor intensive to produce, and still not as ideal as HTML for screen reader technology.

With all of their faults, tagged PDFs, similar to current annotation based OCR-correction workflows, offer digitization librarians GUI based technologies to do their tagging and text correction, and that is why they are commonly used.

Tagged PDFs are also used for accessible representations, because in theory, they provide the following:

  1. Logical Reading Order: Tagged PDFs establish a clear structure, enabling screen readers to read content in the intended order; poorly tagged documents may lead to confusion or inaccessibility.
  2. Efficient Navigation: Tags make it possible for users to quickly jump between headings and sections, similar to skimming a document visually, with their screen reading software.
  3. Element Identification: Tags indicate the type of content (paragraphs, lists, tables), providing context for screen reader users.
  4. Accessible Tables: Proper tagging helps screen readers interpret complex tables by identifying rows, columns, and headers.
  5. Alternative Text for Images: Figure tags include alt text, ensuring that non-text elements are described for users relying on assistive technologies.

An interesting thing to note is that tags in PDFs do not alter the visual appearance but augment a structure in PDFs to make them more compatible with assistive technologies.

Essentially, tagged PDFs are PDFs which are marked up with HTML-like tags for screen reader compatibility.

More on tagged PDFs.
More on HTML vs Tagged PDFs

To read more about how AI is being used to create document structures in scanned images and PDFs, see: Azure Doc Layout Intelligence Most AI services are similar to this.

What would happen if there are two table of contents one for chapters and one for accessibility?

For this question, I think we would need a way for tools developers to know which one was which, to handle them differently in their GUI. I think someone during the meeting we had discussed the possibility of adding the 'provides' property to ranges in V4, since the accessibilityFeatures property on schema.org supports both table of contents and structured navigation.

Other thoughts/considerations
A non-native IIIF solution would be to recommend people to use HTML files in the rendering field to support accessibility, and tools developers to ensure that they display this HTML in the browser for end-users, but this does not support our communities current OCR correction workflows, or the idea that they can use the content search api, and is not a IIIF-native solution.

Some things IIIF API creators could consider when thinking about how to support the functionality tagged PDFs provide to end users, natively in IIIF are…

  1. How can we implement a hierarchical document structure, with pagination, to support better accessibility of the pages displaying IIIF content?
  2. In addition to the higher level hierarchical document structure, how can we add detailed hierarchical screen-reader friendly markup to different parts of the document, with pagination?
  3. What is the best way to provide detailed descriptions of complex visual elements, ensuring that screen reader users receive equivalent information to sighted users, but also including this in the hierarchical document structure?

Other pain-points for current range and annotation based solutions include the following:

  1. Manifest with a long list of ranges can be cumbersome to navigate with screen readers and keyboard tabbing, requiring excessive keystrokes and potentially causing fatigue or frustration if these ranges are not paginated.
  2. Manifests with OCR annotations don’t provide a hierarchical display of content, which can be challenging for screen reader users. Without proper structuring, this can lead to excessive keystrokes for navigation, similar to websites with many unlabeled links. Users also don’t get the same contextual flow as they would with more structured documents, which makes it hard to understand their current location within the document itself.

More IIIF Considerations I can think of/were discussed when we met:

  1. The spec currently restricts the HTML that can be used in IIIF - to augment IIIF image content with highly structured HTML natively in IIIF, this would need some revision.
  2. Annotations are already the basis of content search API - would make it easy for users to search the accessibility markup of our documents.
  3. Annotations are already at the core of OCR (Optical Character Recognition) creation and correction workflows in many institutions, and tools like Madoc.
  4. Any structured markup developed through layout analysis AI will need to be human corrected through tooling, see above point that tools like Madoc work on the annotation level for this.
  5. In v3 - By using AnnotationCollections in conjunction with Ranges, it's possible to create more manageable and context-rich document structures. Any content that would map to an HTML header would be a range's label, and content within the corresponding section for the header would be found under an AnnotationCollection. This is a good approach for long books, etc. Annotations themselves can have HTML as content which will work great for viewers to render accessible HTML for an element within an image, for example, a table. (Another side note is we will also still need to develop solutions for range label correction that would work within annotation-based OCR correction tools. We would also want some cookbook or 'rule' to say clients should render the range label as a HTML header at the appropriate level: ex h1, h2, h3....)
  6. But for v4 - should we include structuralNavigation as an option for ‘provides’ on annotations, if annotations are to be used in Ranges this way, to allow tools developers to handle these annotations appropriately?
  7. After looking here, it doesn't look like the web annotation model supports any hierarchy type structures for annotations, but could annotations, AnnotationPages, or AnnotationCollections be extended to provide hierarchical functionality, too? Is it wise to keep ranges as a functionality for higher level table of contents structures, and instead support and promote some annotation native hierarchy for detailed document markup? Or does that muddy the waters and limit the use case of the existing range object too much?
  8. Should ranges and range labels (no annotations) be used and promoted for detailed structural navigation even if current digitization and OCR workflows tend to use annotations currently? How could ranges handle complex elements like tables?

Related cookbooks

Thank you! I hope this helps provide more background and context (and not too much all at once!) If the spec writers want to have a Q/A with people who are deep into the accessibility space for print materials let me know and I can also arrange one to support these efforts.

@glenrobson glenrobson transferred this issue from IIIF/cookbook-recipes Nov 5, 2024
@brittnylapierre brittnylapierre changed the title Structural Navigation with Ranges: 2 Ways How to support Structural Navigation Nov 7, 2024
@azaroth42
Copy link
Member

My naive understanding given the above, the use case is:

As a visually challenged user, I want to be able to navigate the textual content depicted in images in a manifest in a different way to a fully sighted user and in particular being able to navigate via parts of the full text down to the individual paragraph level, in order to have a better experience of the content.

The full text content currently lives in annotations that are rendered somehow by the viewer. There's no requirements for any full text, nor the granularity of the text per annotation (hence the text granularity work previously). Also, annotations can have HTML bodies, either embedded or by reference to an external resource with its own URI. Those bodies can be as complex as desired. That would then allow a screen reader to have full access to all of the accessibility features within the HTML without requiring a change to the IIIF navigation. A sighted user would also benefit from the more functional bodies of the annotations (one imagines). So I'm not sure that there's anything that needs to be done in the specs to allow content based navigation, if I've understood correctly?

The HTML elements/tags can and should be updated to make the embedded content more accessible (in a different issue) as that would affect more than just the navigation or the content, but also descriptions and other content within the manifest. Note also that it does NOT affect the body of annotations, which can contain anything you want. In particular the spec says:

Minimal HTML markup MAY be included for processing in the summary property and the value property in the metadata and requiredStatement objects.

Conversely, I would be hesitant to try to duplicate full text content into the navigation structure (per the second screenshot) as for most full text content that would bulk out the manifest to an unusable level. The same reason that we don't include the full text of the items as annotations within the manifest directly.

@brittnylapierre
Copy link
Author

Thank you @azaroth42! The way you articulated the user need is correct.

I think you've covered my uncertainties/questions with your response. I am happy to know that annotation body HTML can be as complex as needed. This fact, plus the pagination features of annotation collections, and the use of annotations in current transcription tools, makes it clear at least to me, that annotations are the way to go to provide structural navigation of documents modeled through IIIF.

Then I think for accessibility based best practices - we can suggest: Using ranges to create a table of contents for the entire document structure, and using HTML annotations to create paginated, structured mark up representations of the document.

Thanks again, if there are no other opinions on the matter I think this issue could be marked as resolved and closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants