Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readable_html doesn't properly escape text strings #62

Open
Tracked by #61
vkryukov opened this issue Nov 12, 2024 · 1 comment
Open
Tracked by #61

readable_html doesn't properly escape text strings #62

vkryukov opened this issue Nov 12, 2024 · 1 comment

Comments

@vkryukov
Copy link
Contributor

vkryukov commented Nov 12, 2024

For example, the following excerpt from test case 001

"<blockquote><p>Drinking game for web devs:<br/>(1) Think of a noun<br/>(2) Google &quot;&lt;noun&gt;.js&quot;<br/>(3) If a library with that name exists - drink</p>— Shay Friedman (@ironshay) <a href=\"https://twitter.com/ironshay/statuses/370525864523743232\">August 22, 2013</a></blockquote>" 
|> Readability.article 
|> Readability.readable_html

returns

"<div><div><p>Drinking game for web devs:<br/>(1) Think of a noun<br/>(2) Google \"<noun>.js\"<br/>(3) If a library with that name exists - drink</p>— Shay Friedman (@ironshay) <a href=\"https://twitter.com/ironshay/statuses/370525864523743232\">August 22, 2013</a></div></div>"

Notice the angle brackets around noun and not HTML escaped.

This also causes a problem with HTML code fragment down on the page.

@vkryukov
Copy link
Contributor Author

I suspect that this commit to make tests pass should be reverted: bbe8f6a.

If you think about it, you do want encoding of special symbols, otherwise you can get an html with a different meaning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant