Non "body" tags filtered #77

1JackBlack1 · 2025-02-12T06:27:15Z

There are a few issues in the function apply_content_annotation in filter.php, which can result in serious issues with the page display.

The use of DOMDocument->loadHTML, followed immediately by getting the body does not always work as expected.
loadHTML will only generate a body and put things inside it if it needs to be in the body.

Some locations this filter activates contain content which can exist outside the body, such as user added scripts and HTML comments.
These will not be found in the body of the generated DOMDocument.
If the text only contains such elements, there will be no body and warnings will be thrown, and then it will attempt to replace a > symbol with the annotation.

If there is non body HTML, followed by a single div with the class no-overflow, it wont wrap it, it will alter the non-body HTML, and not touch the single div at all.

This can be very problematic if the first non-body HTML is a HTML comment, as the text replacement will then break the end of comment tag, significantly altering the page contents.

This can be seen for example in an assignment by making the description contain only the html:
This will trigger PHP warnings, and it will convert it to something like
<!-- hi there -- data-ally-richcontent = assign:assign:intro:3478947548 >

I think one possible solution would be to change the code to this:
$doc = local_content::build_dom_doc('<body>' . $text . '</body>'); if (!$doc) { return $text; } $shouldwrap = true; $bodyChildren = $doc->getElementsByTagName('body')->item(0)->childNodes; if($bodyChildren->length === 1) { $node = $bodyChildren->item(0); $shouldwrap = $node->nodeName !== 'div' || strpos($node->getAttribute('class'), 'no-overflow') === false; }
The change to enclose it in body tags will force everything to be in the body, so there will no longer be an issue of no body tag, or the body tag only containing some of the items instead of all.
Setting shouldwrap to true and then updating it will mean by default it will wrap and only when it confirms it shouldn't it wont, which means errors will have it wrap.
Changing it to use nodeName instead of tagName will still produce div for divs, but for text nodes and the comments and the like, instead of being an error it will just report the corresponding node Name.

And here are some test cases which can be used (as a php array of arrays, indicating the input string and if it should be wrapped:
$tests = [ ['', true], ['<div class="no-overflow">Test</div>', true], ['<div class="no-overflow">Test</div>', false], ['<p class="no-overflow">Test</p>', true], ['<div>Test</div>', true], ['<div class="no-overflow">Test</div><p>Test</p>', true], ];
With the current code, the first 2 tests fail and the other 4 pass. The first three also produce warnings.
With my suggested changes, all 6 tests pass.

The text was updated successfully, but these errors were encountered:

gthomas2 · 2025-02-14T19:41:36Z

Many thanks for your clear explanation of the issue, suggestion for fixing it and test examples. This should be included in the next release. Or if you would like to create a PR to master I will try to get it merged for you. (I don't have the ability to merge PRs but I can recommend that they should be)

1JackBlack1 · 2025-02-17T20:58:42Z

I will try to make a PR when I have some time.

In the meantime, I thought of another example which causes a problem:
<div title="Hover > Get text" class="no-overflow">Hover for more</div>

In this case, the > in the attribute gets replaced, also breaking the HTML

I think a simple way to possibly fix that is to change the pattern that is matched to /<div/mU

However, this would mean if for whatever reason the annotation is applied multiple times, to the same input text, the last annotation that is applied will take precedence rather than the first annotation that is applied.
I'm not sure if this would ever happen.
If it does, it can be addressed by also using $node->getAttribute('data-ally-richcontent') to see if it has already been annotated.

Would this be a breaking change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non "body" tags filtered #77

Non "body" tags filtered #77

1JackBlack1 commented Feb 12, 2025

gthomas2 commented Feb 14, 2025

1JackBlack1 commented Feb 17, 2025

Non "body" tags filtered #77

Non "body" tags filtered #77

Comments

1JackBlack1 commented Feb 12, 2025

gthomas2 commented Feb 14, 2025

1JackBlack1 commented Feb 17, 2025