Skip to content

Commit

Permalink
LibPDF: Remove incorrect page tree heuristic
Browse files Browse the repository at this point in the history
add_page_tree_node_to_page_tree() had a heuristic where it would decide
that a page tree node where number of items in the /Kids array was
equal of the number of total children stored in /Count in that same
node must be the final non-leaf level of the page tree.

That seems sensible, but it is e.g. possible to have a page tree node
that has one kid and one page tree node total below it. Such a node
isn't useful, but it can exist, and it does exist e.g. in 0000037.pdf
page 37:

```
505 0 obj
<<
  /Type /Pages
  /Count 37
  /Kids [ 503 0 R 504 0 R ]
>>
endobj

504 0 obj
<<
  /Type /Pages
  /Count 1
  /Parent 505 0 R
  /Kids [ 467 0 R ]
>>
endobj

467 0 obj
<<
  /Type /Pages
  /Count 1
  /Parent 504 0 R
  /Kids [ 464 0 R ]
>>
endobj
```

Object 504 is useless here and object 505 could just reference object
467 directly, but it doesn't.

Furthermore, conditionally_parse_page_tree_node() already handles all this
correctly, so we can just remove this special case.

With this, we render page 37 of 0000037.pdf correctly (previously, we
thought that the page didn't have any contents.)

Add such a silly internal page tree node to colorspaces.pdf, so that
we have an automated test for this slightly subtle behavior.
  • Loading branch information
nico committed Mar 3, 2025
1 parent 05b4337 commit 570404e
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 17 deletions.
Binary file modified Tests/LibPDF/colorspaces.pdf
Binary file not shown.
24 changes: 7 additions & 17 deletions Userland/Libraries/LibPDF/Document.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -346,25 +346,15 @@ PDFErrorOr<void> Document::build_page_tree()
PDFErrorOr<void> Document::add_page_tree_node_to_page_tree(NonnullRefPtr<DictObject> const& page_tree)
{
auto kids_array = TRY(page_tree->get_array(this, CommonNames::Kids));
auto page_count = page_tree->get(CommonNames::Count).value().get<int>();

if (static_cast<size_t>(page_count) != kids_array->elements().size()) {
// This page tree contains child page trees, so we recursively add
// these pages to the overall page tree

for (auto& value : *kids_array) {
auto reference_index = value.as_ref_index();
auto maybe_page_tree_node = TRY(m_parser->conditionally_parse_page_tree_node(reference_index));
if (maybe_page_tree_node) {
TRY(add_page_tree_node_to_page_tree(maybe_page_tree_node.release_nonnull()));
} else {
m_page_object_indices.append(reference_index);
}
for (auto& value : *kids_array) {
auto reference_index = value.as_ref_index();
auto maybe_page_tree_node = TRY(m_parser->conditionally_parse_page_tree_node(reference_index));
if (maybe_page_tree_node) {
TRY(add_page_tree_node_to_page_tree(maybe_page_tree_node.release_nonnull()));
} else {
m_page_object_indices.append(reference_index);
}
} else {
// We know all of the kids are leaf nodes
for (auto& value : *kids_array)
m_page_object_indices.append(value.as_ref_index());
}

return {};
Expand Down

0 comments on commit 570404e

Please sign in to comment.