-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser remove the single < (less than) character from given html string #250
Comments
In an earlier investigation I noted that a small tweak to the library seems to fix the issue, though I haven't fully tested the change. From comments elsewhere:
|
hi, if the solution mentioned there works, can you provide a PR? |
I have not fully tested the proposed solution (and, honestly, only have a high-level understanding of the code) so can't say if there are any potential ill effects. But I'm happy to put together a PR. |
Hi, do you find a solution ? Tks |
Per the spec: > Parse error. Switch to the data state. Emit a U+003C LESS-THAN SIGN character token. Reconsume the current input character. https://www.w3.org/TR/2014/REC-html5-20141028/syntax.html#tag-open-state fixes Masterminds#250
I created a PR with a change I think will address this issue with regard to the data state (content parsing). There is a similar problem with tag/attribute parsing, though I think that could reasonably be addressed through a separate issue. |
When parsing a html string with single use of
<
, it removes it from the parsed value that being returned . For examplethe print of
$html5->saveHTML($dom)
should return asbut instead it return as
see the missing encoded
<
of<
character .This is a continuation of symfony/symfony#57597 where it is impacting the sanitization process of html-sanitizer
The text was updated successfully, but these errors were encountered: