-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #166 #182
Fixes #166 #182
Conversation
@@ -152,6 +156,10 @@ public function hasErrors() | |||
*/ | |||
public function parse($input, array $options = array()) | |||
{ | |||
if (isset($options['normalize']) && $options['normalize']) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (! empty($options['normalize']) {
would be more concise.
@goetas any thoughts on this? |
Hi and thanks for working on this. I'm generally against the feature for the following reasons:
|
https://github.com/tgalopin/html-sanitizer is built on top of this library and is meant to sanitize external html |
What's the status of the issue marked as bug? If you're against fixing it? I also think there's so many people building there own implementations to work around this issue that it would benefit from a solution built into the library. That ensures collaboration and a higher quality of implementation to work around theb issue. I don't think that html-santizer is relevant in the context of the bug. |
@goetas below are copied from the issue, so I really think this should be reopened.
|
I understand.
<html>Hello, This is a test.<br />Does it work this time?</html>
What we need is to hook into some class in https://github.com/Masterminds/html5-php/tree/master/src/HTML5/Parser and add on the fly the I'm not against solving this issue if is a valid html5 document. @bytestream one of the reason for rejecting this PR is also that builds a parser/tokenizer within a library that has already a parser. |
This PR aims to resolve #166. It adds missing
<html>
<head>
and<body>
to the input HTML so fragments are parsed correctly.There's several different solutions:
This class resolves cases that don't work correctly in the above.
It wouldn't mimic
DOMDocument
behaviour as written here; #166 (comment). That would require adding some context so it knows that<title>
should go in<head>
. I figure the input HTML is already invalid so it's an edge case that would require a lot more computation to handle.What do you think?