Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#712] Optional Content (aka PDF Layers) implementation #819

Open
wants to merge 7 commits into
base: open-dev-v1
Choose a base branch
from

Conversation

stechio
Copy link

@stechio stechio commented Mar 25, 2022

This is an implementation of the missing Optional Content / PDF Layers functionality (#712).

In order to introduce the support to layers, I tried to leverage the existing code base as much as possible:

  • on input, the inclusion of HTML contents into layers is defined via non-inheritable extension CSS properties (-fs-ocg-* for contents belonging to optional content groups and -fs-ocm-* for contents belonging to optional content memberships).
    NOTE: An alternative implementation could decouple these definitions from common rulesets to extension at-rules (@-fs-ocg for optional content groups and @-fs-ocm for optional content memberships), achieving a bit tidier stylesheets at the cost of dedicated structures beside existing standard at-rules (such as @font-face and @import rules) in com.openhtmltopdf.css.sheet.Stylesheet.

  • on output, during page painting, existing tagging calls (see PdfBoxFastOutputDevice.startStructure(..) and endStructure(..)) are intercepted, because of their semantically-compatible granularity, to inject layers inside the content stream. To avoid ghost layer fragments (not all tagging calls wrap actual contents!), layer injection is lazily applied on actual content painting calls.

Layer types:

  • simple layer (aka group):

    • identity (-fs-ocg-id), which is used for reference (as parent of other groups or member of memberships).

    • label (-fs-ocg-label), which maps to PDOptionalContentGroup.getName() (see Name entry in the Optional Content Group dictionary) and is displayed in the viewer's layer tree.

    • visibility (-fs-ocg-visibility={visible|hidden}), which maps to PDOptionalContentProperties.isGroupEnabled(..) (see BaseState, ON, OFF entries of D entry of the document's Optional Content Configuration dictionary).

    • parent (-fs-ocg-parent={%ocg-id%}), which maps to Order entry of D entry of the document's Optional Content Configuration dictionary for nesting into the viewer's layer tree -- unfortunately, arbitrary nesting seems not to be natively supported by currently-used PDFBox version (2.0), as adding a group via PDOptionalContentProperties.addGroup(..) automatically builds a flat list inside Order entry instead.

  • compound layer (aka membership):

    • identity (-fs-ocm-id), which is used for internal reference.

    • visibility policy (-fs-ocm-visible={all-visible|all-hidden|any-visible|any-hidden}), which maps to PDOptionalContentMembershipDictionary.getVisibilityPolicy() (see P entry of Optional Content Membership dictionary).

    • members (-fs-ocm-ocgs={%ocg-id%...}), which map to PDOptionalContentMembershipDictionary.getOCGs() (see OCGs entry of Optional Content Membership dictionary).

For the sake of consistency, each content inherits the full layer hierarchy of its ancestor nodes. For example,

<div class="ocg2">
    <p class="ocg1">This is a layered block inside another layered block (OCG 2/OCG 1).</p>
</div>

that paragraph element is rendered in the following way inside the content stream (NOTE: layer resource name assignment is an implementation detail internal to the PDF library (PDFBox); for clarity, here we assume that /oc2 maps to layer ocg2 and /oc1 maps to layer ocg1):

/OC /oc2 BDC
/OC /oc1 BDC
. . .
(This is a layered content block inside another layered block \(OCG 2/OCG 1\)) Tj
. . .
EMC
EMC

@stechio
Copy link
Author

stechio commented Mar 25, 2022

[PR commit: 59c4701725978aeabefeab0cc80723debdffe471]

Here it is a demonstration of its use (see generating code below):

  • Initial state (note that contents in layer "OCG 2" are hidden):

initial_1 initial_2

  • All layers visible (note that contents in layer "OCG 2" are displayed):

allVisible_1 allVisible_2

  • Membership's visibility policy (note that, hiding layers "OCG 2" and "OCG 3", the pink paragraph in case 9 is displayed):

initial_1 ocm_2

Users can obviously toggle each layer interacting with their viewers.

Generated PDF:
712-ocg.pdf

Source HTML:
712-ocg.html

Generating code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.apache.pdfbox.pdmodel.PDDocument;

import com.openhtmltopdf.pdfboxout.PdfBoxRenderer;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class LayerCase {
    public static void main(String[] args) throws Exception {
        try (PDDocument document = new PDDocument()) {
            try (PdfBoxRenderer renderer = new PdfRendererBuilder()
                    .usePDDocument(document)
                    .testMode(true)
                    .withFile(new File("712-ocg.html"))
                    .buildPdfRenderer()) {
                renderer.createPDFWithoutClosing();
            }
            try (OutputStream os = new FileOutputStream("712-ocg.pdf")) {
                document.save(os);
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant