The next generation of screen readers

I ask:

Nota prévia: também é preciso saber perguntar :-), mas a resposta é deliciosa...

"Do you have information about the next generation of screen readers for blind people? Currently, most, if not all, screen readers read web pages by interpreting the HTML code. Do you know of any screen readers that recognize the visual layout of the page, including the text and different semantic elements?"

chatGPT answer:

Most current screen readers, like JAWS, NVDA, and VoiceOver, do primarily rely on HTML code to provide meaningful information about the structure of a webpage. They interpret semantic elements like headings, links, buttons, and form fields from the page’s underlying structure, but they don’t "visually recognize" content the way humans see it.

However, there are emerging developments that could push screen readers into the next generation, utilizing AI and computer vision technologies. These advancements aim to provide more holistic, context-aware reading experiences. Here are a few examples of these emerging trends and technologies:

1. AI-Powered Screen Readers

New developments leverage AI, especially deep learning, to enhance screen readers' ability to "understand" the content better. Some examples:

Microsoft Seeing AI: This app uses AI and computer vision to describe the physical environment, including recognizing objects, text, people, and more. While it's not exactly a screen reader, it provides a more nuanced understanding of both physical and digital content, helping to bridge gaps in traditional screen reader technology.
Google’s Lookout: Another AI-powered tool that reads text from physical environments, signs, labels, and more.

2. Text Recognition and Semantic Understanding

Recent advances in Optical Character Recognition (OCR) technology, coupled with AI, allow screen readers to recognize text that may not be coded in HTML, such as within images or dynamically generated content that might otherwise be inaccessible. AI systems are being integrated to:

Detect images with embedded text and read it aloud.
Provide alternative descriptions for images without metadata.
Infer the structure of documents, even if not coded semantically correct, to offer better reading experiences.

For instance, JAWS includes features that integrate OCR capabilities for inaccessible PDFs and images.

3. Computer Vision & Contextual Understanding

Using computer vision, future screen readers might be able to "see" pages in a way similar to human perception. This would allow them to:

Identify different types of elements visually (such as separating charts from text or recognizing embedded videos).
Understand the layout of the content and provide more contextual navigation options.
Recognize and interpret dynamic content or non-standard web elements (like complex JavaScript-based content or heavily stylized content).

NVAccess (NVDA), for example, is experimenting with AI technologies to improve navigation and interaction with complex content, but these features are still relatively early in their development.

4. Semantic Web and Accessibility APIs

Ongoing work in web standards aims to improve the way web content is semantically described, making it easier for screen readers to interpret the content. The Web Accessibility Initiative – Accessible Rich Internet Applications (WAI-ARIA) specifications allow developers to add roles, properties, and states that better convey content structure to assistive technologies. In the future, this could be enhanced with AI and machine learning, allowing screen readers to better infer the meaning of elements not explicitly marked.

What's Next? The next generation of screen readers will likely combine these approaches—AI, OCR, and enhanced semantic web technologies—to better "see" and understand both physical and digital environments, thus offering a more natural, intuitive reading experience. AI-based models, much like the ones used in chatbots or image recognition, will likely play a bigger role in improving contextual understanding, helping screen readers transition from reading raw code to interpreting meaningful, visual representations of web pages.

Are you looking for information on any specific screen reader projects, or are you more curious about these technological trends?

2022/2025 - Book A11Y - Bloco de notas sobre acessibilidade digital

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The next generation of screen readers

1. AI-Powered Screen Readers

2. Text Recognition and Semantic Understanding

3. Computer Vision & Contextual Understanding

4. Semantic Web and Accessibility APIs

Clone this wiki locally