Wrap download_list_html in <p> so VoiceRSS sees it #918

BryceStevenWilley · 2024-01-03T21:50:24Z

Originally found in mplp/docassemble-mlhframework#62, docassemble only sends parts of the page to Voice RSS to be read. It has a fairly complicated set of rules, including the fact that it skips over divs and the contents of any tag that isn't only a string. All that means that the download list buttons and document titles aren't read out.

It's fairly minor, and the voice RSS has a bunch of other issues as well (see mplp/docassemble-mlhframework#62 (comment)), but wrapping the entire download_list_html in paragraph tags stops docassemble from skipping over those buttons. It's a bit hacky, but it's not something that's worth fixing upstream, and this is a fairly simple change.

Originally found in mplp/docassemble-mlhframework#62, docassemble only sends parts of the page to Voice RSS to be read. It has a fairly complicated set of rules, including the fact that it skips over `div`s and the contents of any tag that isn't only a string. All that means that the download list buttons and document titles aren't read out. It's fairly minor, and the voice RSS has a bunch of other issues as well (see mplp/docassemble-mlhframework#62 (comment)), but wrapping the entire `download_list_html` in paragraph tags stops docassemble from skipping over those buttons. It's a bit hacky, but it's not something that's worth fixing upstream, and this is a fairly simple change.

nonprofittechy · 2024-01-03T21:52:34Z

Are other block level elements allowed to be in a <p> tag?

BryceStevenWilley · 2024-01-03T22:09:16Z

Hm, not technically. Browsers seem to handle it by automatically closing the paragraph tag before the div, and ignoring the later closing p tag. But yeah, not to spec.

I don't think there's another way around the issue other than tricking docassemble like this or trying to change the screenreader code extensively, which itself is finicky (I tried to format everything such that the docassemble wouldn't skip the text, but it either skips almost everything, or duplicates what is read due to how it reads text content). I'm convinced that turning off VoiceRSS is the better option, but eh. I'll close this.

nonprofittechy · 2024-01-03T22:28:02Z

Hm, not technically. Browsers seem to handle it by automatically closing the paragraph tag before the div, and ignoring the later closing p tag. But yeah, not to spec.

I don't think there's another way around the issue other than tricking docassemble like this or trying to change the screenreader code extensively, which itself is finicky (I tried to format everything such that the docassemble wouldn't skip the text, but it either skips almost everything, or duplicates what is read due to how it reads text content). I'm convinced that turning off VoiceRSS is the better option, but eh. I'll close this.

I'm open to hearing more about voicerss and other ways it might break things... we've not gotten a lot of favorable feedback on it. Can you say more?

plocket · 2024-01-04T03:27:33Z

We might want to change this just a tad upstream in download_list_html if this is a pervasive problem. We'll then also be able to adjust styles if there's a problem with those.

BryceStevenWilley · 2024-01-04T04:41:01Z

Less that it breaks other things, and more that it's just not going to be great at reading everything on the page. I didn't know how it parsed the page until yesterday when I looked into it.

Docassemble first tries to exclude things that are hidden on the page, but it doesn't use browser semantics, it just finds things that have aria-hidden. This still reads out things that have the HTML attribute hidden, or display: none, both of which screen readers will skip over. For example, it reads the full content of collapsed templates (might be okay), notes that are hidden by show ifs (not okay), and each of the in-interview review widget pages ("Did we help you?", "Thank you for your feedback", "Thank you for your review") despite the latter two being hidden with display: none (less okay).
- this can't be fixed until it starts gathering the styles of all of the elements on the page and includes a full CSS ruleset, which isn't going to happen on the backend where screenreader.py is getting everything.

The relevant part of the existing code looks something like this (paraphrased from screenreader.py):

for element in all_html_tags:
   ...
   if element.name in ['div', 'option']:
       return False
   if element.name in ['img', 'input'] and element.has_attr('alt'):
       return True
   if element.name in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'button', 'textarea', 'note', 'label', 'li']:
       return True
   if element.parent and element.parent.name in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'button', 'textarea', 'note', 'label', 'li']:
       return False
   if element.string:
       return True
   return False

This will read image alt text and all of the text content in p, h1, etc, or if the only content in an HTML tag is text. Otherwise, it skips it. As mentioned, I tried changing this a bit, but it seemed fairly fragile; I couldn't get it to read the "Download" "View" buttons without it also finding a lot of other duplicate or unnecessary content on the page. It looks like that's because we have extra fontawesome icons at the front, so docassemble skips the entire link name (this happens on review screens as well).

There's also the existing issue about the control buttons being read out by the screen reader: Remove "Listen to page" screen reader from tab order docassemble-AssemblyLine#621.

We might want to change this just a tad upstream in download_list_html

I thought about that. We could; changing the divs to tables and putting icons under an additional span might help. But this is just one case, and the more I look into it, the more screenreader.py doesn't behave great in other places as well.

Essentially, I'm not confident that it'll do a good job the more components we use on a page. To fix things would mean we'd start iteratively implement browser accessibility rules, which seems ironic given that we're saying that this isn't an accessibility feature. We could just say it's not intended to read all page content, but why have a feature that doesn't actually read you the full page? There might be some point between where it is now and something much too complicated, but I don't know where that is. All of that pushes me to say it's not worth the effort to polish the feature.

BryceStevenWilley closed this Jan 3, 2024

BryceStevenWilley deleted the voice_rss_download_list branch January 8, 2024 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrap download_list_html in <p> so VoiceRSS sees it #918

Wrap download_list_html in <p> so VoiceRSS sees it #918

BryceStevenWilley commented Jan 3, 2024

nonprofittechy commented Jan 3, 2024

BryceStevenWilley commented Jan 3, 2024

nonprofittechy commented Jan 3, 2024

plocket commented Jan 4, 2024

BryceStevenWilley commented Jan 4, 2024 •

edited

Loading

Wrap download_list_html in <p> so VoiceRSS sees it #918

Wrap download_list_html in <p> so VoiceRSS sees it #918

Conversation

BryceStevenWilley commented Jan 3, 2024

nonprofittechy commented Jan 3, 2024

BryceStevenWilley commented Jan 3, 2024

nonprofittechy commented Jan 3, 2024

plocket commented Jan 4, 2024

BryceStevenWilley commented Jan 4, 2024 • edited Loading

BryceStevenWilley commented Jan 4, 2024 •

edited

Loading