-
-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skipping tables #245
Comments
Telegraph lacks support for HTML tables too. |
Any way to extract text from tables? Or make an image? |
I don't think it will be readable, especially when there are rowspan/columnspan or cells with long text.
It is possible... technically, and impossible... practically. There does be a table-to-image converter module powered by matplotlib in RSStT, but it can handle neither rowspan/columnspan nor cells with long text. To render an HTML table "perfectly", a browser or a browser-like renderer is required, some projects, e.g. wkhtmltopdf and html2image, can achieve that. The problem is, however, that rendering is not the only consideration: security is always more important than that. Passing untrusted HTML to a browser or a browser-like renderer could result in RCE (Remote Code Execution) or DoS (Denial of Service), two famous kinds of vulnerability. |
I guess in our case, contents data in text is more vital than readability. Also i have another script in use, which converts emails to telegram messages and there i have found the following part, which converts html to plaintext, ignoring all tags
|
Not everyone will be pleased when such an unreadable chuck messes up their messages.
I know exactly how to convert a table to plain text or an image, but the problem is not "how to" but "should we". At least for me, converting a table into plain text is never what I want. I will keep looking for the possibility of converting it to an image securely. |
Ah, i understand now, sorry! |
I've seen you made tables! Thats great! Thank you very much! |
Hi!
Bot is skipping tables in RSS posts and all their content.
E.g. here is the code of RSS:
and the bot is just skipping all the contents.
The text was updated successfully, but these errors were encountered: