Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non "body" tags filtered #77

Open
1JackBlack1 opened this issue Feb 12, 2025 · 2 comments
Open

Non "body" tags filtered #77

1JackBlack1 opened this issue Feb 12, 2025 · 2 comments

Comments

@1JackBlack1
Copy link

There are a few issues in the function apply_content_annotation in filter.php, which can result in serious issues with the page display.

The use of DOMDocument->loadHTML, followed immediately by getting the body does not always work as expected.
loadHTML will only generate a body and put things inside it if it needs to be in the body.

Some locations this filter activates contain content which can exist outside the body, such as user added scripts and HTML comments.
These will not be found in the body of the generated DOMDocument.
If the text only contains such elements, there will be no body and warnings will be thrown, and then it will attempt to replace a > symbol with the annotation.

If there is non body HTML, followed by a single div with the class no-overflow, it wont wrap it, it will alter the non-body HTML, and not touch the single div at all.

This can be very problematic if the first non-body HTML is a HTML comment, as the text replacement will then break the end of comment tag, significantly altering the page contents.

This can be seen for example in an assignment by making the description contain only the html:
This will trigger PHP warnings, and it will convert it to something like
<!-- hi there -- data-ally-richcontent = assign:assign:intro:3478947548 >

I think one possible solution would be to change the code to this:
$doc = local_content::build_dom_doc('<body>' . $text . '</body>'); if (!$doc) { return $text; } $shouldwrap = true; $bodyChildren = $doc->getElementsByTagName('body')->item(0)->childNodes; if($bodyChildren->length === 1) { $node = $bodyChildren->item(0); $shouldwrap = $node->nodeName !== 'div' || strpos($node->getAttribute('class'), 'no-overflow') === false; }
The change to enclose it in body tags will force everything to be in the body, so there will no longer be an issue of no body tag, or the body tag only containing some of the items instead of all.
Setting shouldwrap to true and then updating it will mean by default it will wrap and only when it confirms it shouldn't it wont, which means errors will have it wrap.
Changing it to use nodeName instead of tagName will still produce div for divs, but for text nodes and the comments and the like, instead of being an error it will just report the corresponding node Name.

And here are some test cases which can be used (as a php array of arrays, indicating the input string and if it should be wrapped:
$tests = [ ['<!--hi there-->', true], ['<!--hi there--><div class="no-overflow">Test</div>', true], ['<div class="no-overflow">Test</div>', false], ['<p class="no-overflow">Test</p>', true], ['<div>Test</div>', true], ['<div class="no-overflow">Test</div><p>Test</p>', true], ];
With the current code, the first 2 tests fail and the other 4 pass. The first three also produce warnings.
With my suggested changes, all 6 tests pass.

@gthomas2
Copy link
Contributor

Many thanks for your clear explanation of the issue, suggestion for fixing it and test examples. This should be included in the next release. Or if you would like to create a PR to master I will try to get it merged for you. (I don't have the ability to merge PRs but I can recommend that they should be)

@1JackBlack1
Copy link
Author

I will try to make a PR when I have some time.

In the meantime, I thought of another example which causes a problem:
<div title="Hover > Get text" class="no-overflow">Hover for more</div>

In this case, the > in the attribute gets replaced, also breaking the HTML

I think a simple way to possibly fix that is to change the pattern that is matched to /<div/mU

However, this would mean if for whatever reason the annotation is applied multiple times, to the same input text, the last annotation that is applied will take precedence rather than the first annotation that is applied.
I'm not sure if this would ever happen.
If it does, it can be addressed by also using $node->getAttribute('data-ally-richcontent') to see if it has already been annotated.

Would this be a breaking change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants