Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS: support for pseudo elements ::before & ::after #345

Merged
merged 11 commits into from
Jun 5, 2020

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Jun 4, 2020

See individual commit messages for details.

GIF decoding: avoid crash on some images

See koreader/koreader#6215

Top progress bar: avoid re-computing when not needed

See koreader/koreader#6191

Top progress bar: allow external filling of marks

Will allow frontend to feed crengine with TOC markers, so we can have the same markers in the top bar and in the bottom bar. See koreader/koreader#5848 (comment), #335.

CSS/Text: properly inherit and handle text-align-last

getRenderedWidths(): fix handling of text-indent

Reorder some flags to make the sets clearer

(chores)

CSS: support more white-space named values

Least fancy CSS value... but needed to properly size some table cells and floats (some :before pseudo elements are styles with floats: left; white-space: pre-wrap)
image
Each colored segment is an element with white-space: nowrap:
image
image
References:
https://www.w3.org/TR/CSS2/text.html#white-space-prop
https://developer.mozilla.org/en-US/docs/Web/CSS/white-space
https://www.xul.fr/en/css/white-space.php

Text: fix standalone BR not making an empty line (rework)

Needed by next commit, where pseudo element content is handled as generated content, which might not be PRE.

CSS: support for pseudo elements ::before & ::after

Just a bit more fancy that white-space, but not by much :)
Handle content: with tokens: none, strings "blah", attributes attr(blah), open-quote/close-quote.
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/content
https://developer.mozilla.org/en-US/docs/Web/CSS/string
https://www.w3.org/TR/CSS2/syndata.html#parsing-errors

image

Often used for cosmetics in books, but sometimes it can reveal some content, that it retrieves from the element attribute - like this paragraph number:
image
Or some references:
image
(The pseudo element added text will not be part of text selection - Firefox does the same by default.)

CSS: content: open-quote support via TextLangMan

Each language can have its own two pairs of quotes to use with open/close-quote.
image
I'm not happy with French getting the same quotes in both pairs ! « » « »
@NiLuJe : what do you think, should we ignore HTML5 suggestion and use « » “ ” like Italian ?

CSS/Text selection: adds a few "-cr-hint:" tweaks

Might be helpful in some cases, see koreader/koreader#6223 (comment)


This change is Reviewable

With some image, we would be writting outside rev_buf
array bounds.
That's supposed to be driven by some other data, and
should not happen - so there might be a bug somewhere
else and we might have crap image data.
Anyway, avoid this crash.
koreader/koreader#6215
LVDocView::getSectionBounds(), used to compute marks
to show in the top progress bar, which isn't cheap,
could be called (with KOReader) on each page turn.
Have it be trashed only when a re-rendering is really
done.
Note that m_imageCache might be used by some frontends,
and not by others.
crengine builds its top progress bar markers from the
start of each DocFragments (each html file in an EPUB).
This will allow KOReader to manage it and fill it with
markers made from the TOC, similarly to its bottom bar.
Also properly measure table captions as they are just
like erm_final nodes.
Only white-space 'normal' and 'pre' was supported, other
values were ignored and handled as 'normal'.
This adds support (possibly limited or approximated) for:
'nowrap', 'pre-line', 'pre-wrap' and 'break-spaces'.
Fix pre & nowrap handling in text formatting and
rendered width measuring.
Rework 89af063: we might want our added content to get space
collapsing. We have to provide LTEXT_FLAG_PREFORMATTED when
we don't want that.
@NiLuJe
Copy link
Member

NiLuJe commented Jun 4, 2020

@poire-z: Yep, I'm all for using actual fancy double quotes in French (never been a fan of the chevrons myself ;)).

@poire-z
Copy link
Contributor Author

poire-z commented Jun 4, 2020

Well, I like the « » :) but ok, we'll have “ ” for 2nd level open-quote (so, we'll probably never see them :)

// Also, when "text-align-last: justify", Firefox does justify the last
// (or single) line.
if ( last ) { // Last line of paragraph, or single line paragraph
// https://drafts.csswg.org/css-text-3/#text-align-last-property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting, thanks.

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Handle parsing of '::before', '::after' (CSS3), ':before'
and ':after (CSS2) in selectors.
Properly check if they should be generated or not, and if
yes, insert a new internal element in the DOM: pseudoElem.

Handle needed added CSS property: 'content:'. Parse original
values and store a pre-computed string in style->content,
ready to be used to get the final generated content for
a node. Supports string, attributes, open/close-quote.

Replaces <Q> specific handling in the code with:
  q::before { content: open-quote; }
  q::after  { content: close-quote; }
Get the right quote chars for each language,
and ensure nested quote levels (per lang_cfg).
One can use "-cr-hint: text-selection-inline",
"text-selection-block" and "text-selection-skip" to
target some elements and tweak how their text will
appear (or not) in user text selection.
Might be useful to exclude the content of ruby
annotations (<ruby><rt>) from text selection when
providing it to dict lookup or translation.
@poire-z
Copy link
Contributor Author

poire-z commented Sep 3, 2021

Some issue related to white-space: pre, but actually not at all (I think), mentionning it here as there were recent tweaks to white-space: pre in this PR .

<h1>PRE and newlines</h1>
<pre style="background-color: yellow">no newline char before and after</pre>
<pre style="background-color: yellow"> space char before and after </pre>
<pre style="background-color: yellow">  2 space chars before and after  </pre>
<pre style="background-color: yellow">   3 space chars before and after   </pre>

<pre style="background-color: yellow">
newline char before and after
</pre>

<pre style="background-color: yellow">

Blank line above and below

</pre>

<pre style="background-color: yellow">
<code>newline char before and after, in code</code>
</pre>
<pre style="background-color: yellow">
<table>newline char before and after, in table</table>
</pre>

<div style="background-color: yellow; white-space: pre">
newline char before and after, in a DIV white-space: pre
</div>

Firefox | KOReader:
image

We render the leading \n in PRE, while Firefox doesn't. But if in a DIV white-space:pre, we both render it !
I think I tracked the difference down to these:

https://html.spec.whatwg.org/multipage/grouping-content.html#the-pre-element

In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped.

https://html.spec.whatwg.org/multipage/syntax.html#element-restrictions

For historical reasons, certain elements have extra restrictions beyond even the restrictions given by their content model. [...]
A single newline may be placed immediately after the start tag of pre and textarea elements. This does not affect the processing of the element. The otherwise optional newline must be included if the element's contents themselves start with a newline (because otherwise the leading newline in the contents would be treated like the optional newline, and ignored).

So, I guess that for PRE and TEXTAREA, I should just remove a leading \n in its first text content ?
(What does This does not affect the processing of the element mean ? Just: "you should ignore it" ?)

Thought about fixing that in the text rendering code (where we already remove leading and collapsed spaces), but we'd need to go up to the element to check if it's a PRE - which we never do, we're fine with flags and CSS. Or we'd need a FLAG to say it comes from a PRE or TEXTAREA (but there is only 1bit left in our 32 bits slot for flags...).

Or I should implement this in the XML and HTML parsers and do it only for a domVersionRequested greater than a bumped DOM_VERSION_CURRENT (because removing one char would break highlights made in a PRE).
After all, this comes from the HTML specs, not from the CSS text and white-space rendering hints such as https://drafts.csswg.org/css-text/. Thoughts ?

Witnessed in some O'Reilly Rust book (spent some time looking for an inexistant evil padding-top :)

image

@Frenzie
Copy link
Member

Frenzie commented Sep 3, 2021

(What does This does not affect the processing of the element mean ? Just: "you should ignore it" ?)

I think so, yes.

Thoughts ?

I can't judge the advantages and disadvantages nearly as well as you can. I would intuitively think the parser is the more logical place to do it and that it sounds a bit out of place in the text renderer, but I assume you mentioned it first for a reason.

@poire-z
Copy link
Contributor Author

poire-z commented Sep 4, 2021

but I assume you mentioned it first for a reason.

Not really, I was just thinking out loud :) and that was my first thought.
I'll go with handling this in the XML parser, makes a smaller and cleaner change.

@poire-z
Copy link
Contributor Author

poire-z commented Jul 28, 2023

Just a note/hint, for reference:

I was about half way into implementing a -cr-hint: nowrap-before (and nowrap-after for consistency) that I could set on the element wrapping a footnote link to avoid these kind of result:
image

A bit tedious as we can have inner elements, with space characters outside, at, or inside the targeted element, so the need to handle/propagate the "nowrap" on each of them....

And when almost done, lightbulb moment... this works just as well (and possibly better than what I could have done with my kludge added into lvtextfm.cpp):
image
image

It works as good as what I've been getting with my -cr-hint: nowrap-before with my simple test cases (before I went with trickier ones), so I'm happily abandonning that -cr-hint: nowrap-before idea :)

(Just have to remember 2060 2060 2060 2060 2060 ...)

@poire-z
Copy link
Contributor Author

poire-z commented Aug 10, 2023

(Just have to remember 2060 2060 2060 2060 2060 ...)

I guess I won't :/ I solved this exact same issue the exact same way 3 years ago..... #337 (comment)

@poire-z
Copy link
Contributor Author

poire-z commented Aug 19, 2024

For info and reference (in case of probable future bad memory, and me going back to this issue):

Got some other bothering case with publisher bad job and footnote link numbers where all made like this:
<a href=..>123</a>[SPACE]&nbsp;&raquo;
image
which got me tons of bothering closing quotes at start of line:
image

which is what should happen, per specs:
https://unicode.org/reports/tr14/#GL

In particular, when NO-BREAK SPACE follows SPACE, there is a break opportunity after the SPACE and the NO-BREAK SPACE will go as visible space onto the next line.

(except that we explicitely don't make that no-break-space visible at start of next line, which is usually for the best, as it would double the bothersomeness in my case).

Using a.apnfEpub2::after { content: '\2060'; } (suggested above and in our CSS ☰) would not help).

Possible solution I found:
a.apnfEpub2::after { content: '('; } (which acting as an opening parens, prevents the break after a space following it.
image

It's ugly and big, so let's make it small and invisible:
a.apnfEpub2::after { content: '('; font-size: 1px; color:transparent} (or visibility:hidden)
image

(The blank space gets bigger, but it's more bearable than the closing quote at start of line.)

(Somehow, using a narrower ASCII ' or " does not work, even if https://unicode.org/reports/tr14/#QU says, being ambiguously opening/closing, they should act as both, so working as an opening parens. May be more complex rules, or some libunibreak stuff, dunno.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants