LibPDF: Remove incorrect page tree heuristic

add_page_tree_node_to_page_tree() had a heuristic where it would decide that a page tree node where number of items in the /Kids array was equal of the number of total children stored in /Count in that same node must be the final non-leaf level of the page tree. That seems sensible, but it is e.g. possible to have a page tree node that has one kid and one page tree node total below it. Such a node isn't useful, but it can exist, and it does exist e.g. in 0000037.pdf page 37: ``` 505 0 obj << /Type /Pages /Count 37 /Kids [ 503 0 R 504 0 R ] >> endobj 504 0 obj << /Type /Pages /Count 1 /Parent 505 0 R /Kids [ 467 0 R ] >> endobj 467 0 obj << /Type /Pages /Count 1 /Parent 504 0 R /Kids [ 464 0 R ] >> endobj ``` Object 504 is useless here and object 505 could just reference object 467 directly, but it doesn't. Furthermore, conditionally_parse_page_tree_node() already handles all this correctly, so we can just remove this special case. With this, we render page 37 of 0000037.pdf correctly (previously, we thought that the page didn't have any contents.) Add such a silly internal page tree node to colorspaces.pdf, so that we have an automated test for this slightly subtle behavior.
nico · Mar 3, 2025 · 570404e · 570404e
1 parent 05b4337
commit 570404e
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 17 deletions.
diff --git a/Tests/LibPDF/colorspaces.pdf b/Tests/LibPDF/colorspaces.pdf
diff --git a/Userland/Libraries/LibPDF/Document.cpp b/Userland/Libraries/LibPDF/Document.cpp
@@ -346,25 +346,15 @@ PDFErrorOr<void> Document::build_page_tree()
 PDFErrorOr<void> Document::add_page_tree_node_to_page_tree(NonnullRefPtr<DictObject> const& page_tree)
 {
     auto kids_array = TRY(page_tree->get_array(this, CommonNames::Kids));
-    auto page_count = page_tree->get(CommonNames::Count).value().get<int>();
 
-    if (static_cast<size_t>(page_count) != kids_array->elements().size()) {
-        // This page tree contains child page trees, so we recursively add
-        // these pages to the overall page tree
-
-        for (auto& value : *kids_array) {
-            auto reference_index = value.as_ref_index();
-            auto maybe_page_tree_node = TRY(m_parser->conditionally_parse_page_tree_node(reference_index));
-            if (maybe_page_tree_node) {
-                TRY(add_page_tree_node_to_page_tree(maybe_page_tree_node.release_nonnull()));
-            } else {
-                m_page_object_indices.append(reference_index);
-            }
+    for (auto& value : *kids_array) {
+        auto reference_index = value.as_ref_index();
+        auto maybe_page_tree_node = TRY(m_parser->conditionally_parse_page_tree_node(reference_index));
+        if (maybe_page_tree_node) {
+            TRY(add_page_tree_node_to_page_tree(maybe_page_tree_node.release_nonnull()));
+        } else {
+            m_page_object_indices.append(reference_index);
         }
-    } else {
-        // We know all of the kids are leaf nodes
-        for (auto& value : *kids_array)
-            m_page_object_indices.append(value.as_ref_index());
     }
 
     return {};