CSS: support for pseudo elements ::before & ::after #345

poire-z · 2020-06-04T16:07:39Z

See individual commit messages for details.

`GIF decoding: avoid crash on some images`

`Top progress bar: avoid re-computing when not needed`

`Top progress bar: allow external filling of marks`

Will allow frontend to feed crengine with TOC markers, so we can have the same markers in the top bar and in the bottom bar. See koreader/koreader#5848 (comment), #335.

`CSS/Text: properly inherit and handle text-align-last`

`getRenderedWidths(): fix handling of text-indent`

`Reorder some flags to make the sets clearer`

(chores)

`CSS: support more white-space named values`

Least fancy CSS value... but needed to properly size some table cells and floats (some :before pseudo elements are styles with floats: left; white-space: pre-wrap)

Each colored segment is an element with white-space: nowrap:

References:
https://www.w3.org/TR/CSS2/text.html#white-space-prop
https://developer.mozilla.org/en-US/docs/Web/CSS/white-space
https://www.xul.fr/en/css/white-space.php

`Text: fix standalone BR not making an empty line (rework)`

Needed by next commit, where pseudo element content is handled as generated content, which might not be PRE.

`CSS: support for pseudo elements ::before & ::after`

Just a bit more fancy that white-space, but not by much :)
Handle content: with tokens: none, strings "blah", attributes attr(blah), open-quote/close-quote.
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/content
https://developer.mozilla.org/en-US/docs/Web/CSS/string
https://www.w3.org/TR/CSS2/syndata.html#parsing-errors

Often used for cosmetics in books, but sometimes it can reveal some content, that it retrieves from the element attribute - like this paragraph number:

Or some references:

(The pseudo element added text will not be part of text selection - Firefox does the same by default.)

`CSS: content: open-quote support via TextLangMan`

Each language can have its own two pairs of quotes to use with open/close-quote.

I'm not happy with French getting the same quotes in both pairs ! « » « »
@NiLuJe : what do you think, should we ignore HTML5 suggestion and use « » “ ” like Italian ?

`CSS/Text selection: adds a few "-cr-hint:" tweaks`

Might be helpful in some cases, see koreader/koreader#6223 (comment)

This change is

With some image, we would be writting outside rev_buf array bounds. That's supposed to be driven by some other data, and should not happen - so there might be a bug somewhere else and we might have crap image data. Anyway, avoid this crash. koreader/koreader#6215

LVDocView::getSectionBounds(), used to compute marks to show in the top progress bar, which isn't cheap, could be called (with KOReader) on each page turn. Have it be trashed only when a re-rendering is really done. Note that m_imageCache might be used by some frontends, and not by others.

crengine builds its top progress bar markers from the start of each DocFragments (each html file in an EPUB). This will allow KOReader to manage it and fill it with markers made from the TOC, similarly to its bottom bar.

Also properly measure table captions as they are just like erm_final nodes.

Only white-space 'normal' and 'pre' was supported, other values were ignored and handled as 'normal'. This adds support (possibly limited or approximated) for: 'nowrap', 'pre-line', 'pre-wrap' and 'break-spaces'. Fix pre & nowrap handling in text formatting and rendered width measuring.

Rework 89af063: we might want our added content to get space collapsing. We have to provide LTEXT_FLAG_PREFORMATTED when we don't want that.

NiLuJe · 2020-06-04T16:14:07Z

@poire-z: Yep, I'm all for using actual fancy double quotes in French (never been a fan of the chevrons myself ;)).

poire-z · 2020-06-04T16:25:33Z

Well, I like the « » :) but ok, we'll have “ ” for 2nd level open-quote (so, we'll probably never see them :)

Frenzie · 2020-06-04T16:58:11Z

crengine/src/lvtextfm.cpp

+        // Also, when "text-align-last: justify", Firefox does justify the last
+        // (or single) line.
+        if ( last ) { // Last line of paragraph, or single line paragraph
+            // https://drafts.csswg.org/css-text-3/#text-align-last-property


Very interesting, thanks.

Frenzie

Nice!

Handle parsing of '::before', '::after' (CSS3), ':before' and ':after (CSS2) in selectors. Properly check if they should be generated or not, and if yes, insert a new internal element in the DOM: pseudoElem. Handle needed added CSS property: 'content:'. Parse original values and store a pre-computed string in style->content, ready to be used to get the final generated content for a node. Supports string, attributes, open/close-quote. Replaces <Q> specific handling in the code with: q::before { content: open-quote; } q::after { content: close-quote; }

Get the right quote chars for each language, and ensure nested quote levels (per lang_cfg).

One can use "-cr-hint: text-selection-inline", "text-selection-block" and "text-selection-skip" to target some elements and tweak how their text will appear (or not) in user text selection. Might be useful to exclude the content of ruby annotations (<ruby><rt>) from text selection when providing it to dict lookup or translation.

poire-z · 2021-09-03T16:48:11Z

Some issue related to white-space: pre, but actually not at all (I think), mentionning it here as there were recent tweaks to white-space: pre in this PR .

<h1>PRE and newlines</h1>
<pre style="background-color: yellow">no newline char before and after</pre>
<pre style="background-color: yellow"> space char before and after </pre>
<pre style="background-color: yellow">  2 space chars before and after  </pre>
<pre style="background-color: yellow">   3 space chars before and after   </pre>

<pre style="background-color: yellow">
newline char before and after
</pre>

<pre style="background-color: yellow">

Blank line above and below

</pre>

<pre style="background-color: yellow">
<code>newline char before and after, in code</code>
</pre>
<pre style="background-color: yellow">
<table>newline char before and after, in table</table>
</pre>

<div style="background-color: yellow; white-space: pre">
newline char before and after, in a DIV white-space: pre
</div>

Firefox | KOReader:

We render the leading \n in PRE, while Firefox doesn't. But if in a DIV white-space:pre, we both render it !
I think I tracked the difference down to these:

https://html.spec.whatwg.org/multipage/grouping-content.html#the-pre-element

In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped.

https://html.spec.whatwg.org/multipage/syntax.html#element-restrictions

For historical reasons, certain elements have extra restrictions beyond even the restrictions given by their content model. [...]
A single newline may be placed immediately after the start tag of pre and textarea elements. This does not affect the processing of the element. The otherwise optional newline must be included if the element's contents themselves start with a newline (because otherwise the leading newline in the contents would be treated like the optional newline, and ignored).

So, I guess that for PRE and TEXTAREA, I should just remove a leading \n in its first text content ?
(What does This does not affect the processing of the element mean ? Just: "you should ignore it" ?)

Thought about fixing that in the text rendering code (where we already remove leading and collapsed spaces), but we'd need to go up to the element to check if it's a PRE - which we never do, we're fine with flags and CSS. Or we'd need a FLAG to say it comes from a PRE or TEXTAREA (but there is only 1bit left in our 32 bits slot for flags...).

Or I should implement this in the XML and HTML parsers and do it only for a domVersionRequested greater than a bumped DOM_VERSION_CURRENT (because removing one char would break highlights made in a PRE).
After all, this comes from the HTML specs, not from the CSS text and white-space rendering hints such as https://drafts.csswg.org/css-text/. Thoughts ?

Witnessed in some O'Reilly Rust book (spent some time looking for an inexistant evil padding-top :)

Frenzie · 2021-09-03T21:17:25Z

(What does This does not affect the processing of the element mean ? Just: "you should ignore it" ?)

I think so, yes.

Thoughts ?

I can't judge the advantages and disadvantages nearly as well as you can. I would intuitively think the parser is the more logical place to do it and that it sounds a bit out of place in the text renderer, but I assume you mentioned it first for a reason.

poire-z · 2021-09-04T06:28:28Z

but I assume you mentioned it first for a reason.

Not really, I was just thinking out loud :) and that was my first thought.
I'll go with handling this in the XML parser, makes a smaller and cleaner change.

poire-z · 2023-07-28T14:57:41Z

Just a note/hint, for reference:

I was about half way into implementing a -cr-hint: nowrap-before (and nowrap-after for consistency) that I could set on the element wrapping a footnote link to avoid these kind of result:

A bit tedious as we can have inner elements, with space characters outside, at, or inside the targeted element, so the need to handle/propagate the "nowrap" on each of them....

And when almost done, lightbulb moment... this works just as well (and possibly better than what I could have done with my kludge added into lvtextfm.cpp):

It works as good as what I've been getting with my -cr-hint: nowrap-before with my simple test cases (before I went with trickier ones), so I'm happily abandonning that -cr-hint: nowrap-before idea :)

(Just have to remember 2060 2060 2060 2060 2060 ...)

poire-z · 2023-08-10T15:00:21Z

(Just have to remember 2060 2060 2060 2060 2060 ...)

I guess I won't :/ I solved this exact same issue the exact same way 3 years ago..... #337 (comment)

poire-z · 2024-08-19T19:15:29Z

For info and reference (in case of probable future bad memory, and me going back to this issue):

Got some other bothering case with publisher bad job and footnote link numbers where all made like this:
<a href=..>123</a>[SPACE] »

which got me tons of bothering closing quotes at start of line:

which is what should happen, per specs:
https://unicode.org/reports/tr14/#GL

In particular, when NO-BREAK SPACE follows SPACE, there is a break opportunity after the SPACE and the NO-BREAK SPACE will go as visible space onto the next line.

(except that we explicitely don't make that no-break-space visible at start of next line, which is usually for the best, as it would double the bothersomeness in my case).

Using a.apnfEpub2::after { content: '\2060'; } (suggested above and in our CSS ☰) would not help).

Possible solution I found:
a.apnfEpub2::after { content: '('; } (which acting as an opening parens, prevents the break after a space following it.

It's ugly and big, so let's make it small and invisible:
a.apnfEpub2::after { content: '('; font-size: 1px; color:transparent} (or visibility:hidden)

(The blank space gets bigger, but it's more bearable than the closing quote at start of line.)

(Somehow, using a narrower ASCII ' or " does not work, even if https://unicode.org/reports/tr14/#QU says, being ambiguously opening/closing, they should act as both, so working as an opening parens. May be more complex rules, or some libunibreak stuff, dunno.)

poire-z added 8 commits June 4, 2020 16:44

Top progress bar: allow external filling of marks

43366af

crengine builds its top progress bar markers from the start of each DocFragments (each html file in an EPUB). This will allow KOReader to manage it and fill it with markers made from the TOC, similarly to its bottom bar.

CSS/Text: properly inherit and handle text-align-last

ff81068

getRenderedWidths(): fix handling of text-indent

865ad23

Also properly measure table captions as they are just like erm_final nodes.

Reorder some flags to make the sets clearer

204ed9b

Text: fix standalone BR not making an empty line (rework)

c29fea4

Rework 89af063: we might want our added content to get space collapsing. We have to provide LTEXT_FLAG_PREFORMATTED when we don't want that.

Frenzie reviewed Jun 4, 2020

View reviewed changes

Frenzie approved these changes Jun 4, 2020

View reviewed changes

poire-z added 3 commits June 4, 2020 19:22

CSS: content: open-quote support via TextLangMan

07f428f

Get the right quote chars for each language, and ensure nested quote levels (per lang_cfg).

poire-z force-pushed the pseudo_before_after branch from d8ce6c9 to 71923f2 Compare June 4, 2020 17:25

poire-z mentioned this pull request Jun 4, 2020

TextLangMan for text typography by language, use libunibreak #337

Merged

poire-z merged commit 90565ff into koreader:master Jun 5, 2020

poire-z deleted the pseudo_before_after branch June 5, 2020 05:58

This was referenced Jun 5, 2020

bump crengine: support for pseudo elements ::before/after koreader/koreader-base#1112

Merged

bump crengine: support for pseudo elements ::before/after koreader/koreader#6236

Merged

poire-z mentioned this pull request Jul 20, 2020

FB2: fix various issues #357

Merged

poire-z mentioned this pull request Aug 7, 2020

Linebreaking at dashes #364

Closed

poire-z mentioned this pull request Sep 6, 2021

CSS: parse/skip at-rules, support @media, @supports #454

Merged

poire-z mentioned this pull request Mar 10, 2024

Book style tweak: add more suggestions in "CSS ≡" koreader/koreader#11533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSS: support for pseudo elements ::before & ::after #345

CSS: support for pseudo elements ::before & ::after #345

poire-z commented Jun 4, 2020 •

edited by Frenzie

Loading

NiLuJe commented Jun 4, 2020 •

edited

Loading

poire-z commented Jun 4, 2020

Frenzie Jun 4, 2020

Frenzie left a comment

poire-z commented Sep 3, 2021 •

edited

Loading

Frenzie commented Sep 3, 2021

poire-z commented Sep 4, 2021

poire-z commented Jul 28, 2023

poire-z commented Aug 10, 2023

poire-z commented Aug 19, 2024 •

edited

Loading

CSS: support for pseudo elements ::before & ::after #345

CSS: support for pseudo elements ::before & ::after #345

Conversation

poire-z commented Jun 4, 2020 • edited by Frenzie Loading

GIF decoding: avoid crash on some images

Top progress bar: avoid re-computing when not needed

Top progress bar: allow external filling of marks

CSS/Text: properly inherit and handle text-align-last

getRenderedWidths(): fix handling of text-indent

Reorder some flags to make the sets clearer

CSS: support more white-space named values

Text: fix standalone BR not making an empty line (rework)

CSS: support for pseudo elements ::before & ::after

CSS: content: open-quote support via TextLangMan

CSS/Text selection: adds a few "-cr-hint:" tweaks

NiLuJe commented Jun 4, 2020 • edited Loading

poire-z commented Jun 4, 2020

Frenzie Jun 4, 2020

Choose a reason for hiding this comment

Frenzie left a comment

Choose a reason for hiding this comment

poire-z commented Sep 3, 2021 • edited Loading

Frenzie commented Sep 3, 2021

poire-z commented Sep 4, 2021

poire-z commented Jul 28, 2023

poire-z commented Aug 10, 2023

poire-z commented Aug 19, 2024 • edited Loading

poire-z commented Jun 4, 2020 •

edited by Frenzie

Loading

`GIF decoding: avoid crash on some images`

`Top progress bar: avoid re-computing when not needed`

`Top progress bar: allow external filling of marks`

`CSS/Text: properly inherit and handle text-align-last`

`getRenderedWidths(): fix handling of text-indent`

`Reorder some flags to make the sets clearer`

`CSS: support more white-space named values`

`Text: fix standalone BR not making an empty line (rework)`

`CSS: support for pseudo elements ::before & ::after`

`CSS: content: open-quote support via TextLangMan`

`CSS/Text selection: adds a few "-cr-hint:" tweaks`

NiLuJe commented Jun 4, 2020 •

edited

Loading

poire-z commented Sep 3, 2021 •

edited

Loading

poire-z commented Aug 19, 2024 •

edited

Loading