Fix rendering docs to PDF #737

SouthEndMusic · 2023-11-03T19:56:37Z

Fix $\TeX$ and figure issues for rendering docs to PDF.

The docs are now modified so that all figures are included with HTML syntax in a way that figure numbers and references are preserved. However, I could only make this work in a way that hardcodes the figure numbers, both in the captions and the references.

SouthEndMusic · 2023-11-03T20:06:56Z

I used this script for the conversion (mainly by chatGPT):

import re
import os
import fnmatch

def process_qmd_file(qmd_file_path):
    # Read the content of the .qmd file
    with open(qmd_file_path, 'r', encoding='utf-8') as file:
        content = file.read()

    # Define a regular expression pattern to match image references
    image_pattern = r'!\[(.*?)\]\((.*?)\)(\{([^}]*)\})?'

    # Find all image references and build a mapping of labels to figure numbers
    figure_counter = 1
    figure_mapping = {}  # Store the mapping of labels to figure numbers

    def process_image(match):
        nonlocal figure_counter
        caption = match.group(1)
        image_path = match.group(2)
        label = match.group(4)
        
        # If label exists, add it to the mapping
        if label:
            label = label[1:]
            figure_mapping[label] = figure_counter
        
        # Create the figure caption
        if len(caption) > 0:
            figure_caption = f'alt="Figure {figure_counter}: {caption}"'
        else:
            figure_caption = ''

        figure_counter += 1
        
        # Replace the image reference with HTML description including caption
        html_image = f'<figure id="{label}" style="max-width: 100%;"><img src="{image_path}" {figure_caption} style="max-width: 100%;"><figcaption>{figure_caption}</figcaption></figure>'
        return html_image

    content = re.sub(image_pattern, process_image, content)

    # Define a regular expression pattern to match figure references
    figure_pattern = r'@fig-([a-zA-Z0-9-]+)'

    # Find all figure references and process them
    def process_figure(match):
        label = f"fig-{match.groups()[0]}"
        figure_number = figure_mapping.get(label)
        if figure_number is not None:
            # Replace the figure reference with HTML style reference
            html_reference = f'<a href="#{label}">Figure {figure_number}</a>'
            return html_reference
        else:
            return match.group(0)  # If label not found, return original match

    content = re.sub(figure_pattern, process_figure, content)

    # Write the updated content back to the .qmd file
    with open(qmd_file_path, 'w', encoding='utf-8') as file:
        file.write(content)


def find_files_by_extension(root_dir, file_extension):
    matches = []
    for root, dirnames, filenames in os.walk(root_dir):
        for filename in fnmatch.filter(filenames, f'*.{file_extension}'):
            matches.append(os.path.join(root, filename))
    return matches

for qmd_file_path in find_files_by_extension("Ribasim/docs", "qmd"):
    print(qmd_file_path)
    process_qmd_file(qmd_file_path)

visr · 2023-11-03T20:19:20Z

@SouthEndMusic I looked around a bit in the quarto-cli issue tracker and found quarto-dev/quarto-cli#5537, which looks similar. Do you think that's the problem, certain characters in URLs, as fixed in quarto-dev/quarto-cli@3a861a8?

Would be good to try with a Quarto 1.4 pre-release from https://quarto.org/docs/download/
Only downside is that these pre-releases are not in conda-forge so we cannot use them via pixi.

visr · 2023-11-03T20:29:35Z

By the way I confirm that quarto render docs --to pdf is no longer throwing an error, which is a big step forward. Probably more work is needed to make one single PDF though. I get per part one PDF:

index.pdf
modflow-demo.pdf

And the images don't seem to work yet for me.

Perhaps we can also learn from the iMOD Documentation here, they have separate YAML files: https://github.com/Deltares/iMOD-Documentation/blob/main/docs/_quarto-manual.yml. Wonder if they ran into similar issues.

This reverts commit ae305cf.

visr · 2023-11-03T20:41:19Z

Just testing single pages with quarto render modflow.qmd --to pdf and quarto render modflow-demo.qmd --to pdf using the quarto 1.4 pre-release seems to work well with both LaTeX and figures. This on this branch plus git revert ae305cf59af0fabc41910c7a1dec214a3ffbdfc3 to leave out ae305cf.

modflow.pdf
modflow-demo.pdf

SouthEndMusic · 2023-11-06T05:40:26Z

@visr I wondered why in my MWE it does work with the quarto logo, and the only thing I could think of was indeed that some urls are not recognised by quarto as such. This could be tested by looking at the $\TeX$ version of the MWE.

I also wondered whether there is syntax to tell Quarto explicitly that the source is a URL, and I found that the syntax here is slightly different from what we use, so that is also worth a check. Edit: ah no, that is for when the image itself is a link.

visr

As discussed with @SouthEndMusic, we'll merge this, with the HTML image workarounds removed. This issue is already fixed upstream in Quarto, so we prefer to just wait for the release of quarto 1.4.

Fix $\TeX$ and figure issues for rendering docs to PDF. The docs are now modified so that all figures are included with HTML syntax in a way that figure numbers and references are preserved. However, I could only make this work in a way that hardcodes the figure numbers, both in the captions and the references. --------- Co-authored-by: Martijn Visser <[email protected]>

SouthEndMusic added 3 commits November 3, 2023 11:56

Some fixes for rendering docs to PDF

cc46261

Images with HTML in docs

ae305cf

last fix

e7dc467

SouthEndMusic requested a review from visr November 3, 2023 19:56

Revert "Images with HTML in docs"

4d500ef

This reverts commit ae305cf.

visr approved these changes Nov 6, 2023

View reviewed changes

visr merged commit 6bd31ba into main Nov 6, 2023
15 checks passed

visr deleted the tex_docs_changes branch November 6, 2023 12:05

visr mentioned this pull request Nov 6, 2023

Include docs PDF in release assets #666

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rendering docs to PDF #737

Fix rendering docs to PDF #737

SouthEndMusic commented Nov 3, 2023

SouthEndMusic commented Nov 3, 2023

visr commented Nov 3, 2023

visr commented Nov 3, 2023

visr commented Nov 3, 2023

SouthEndMusic commented Nov 6, 2023 •

edited

Loading

visr left a comment

Fix rendering docs to PDF #737

Fix rendering docs to PDF #737

Conversation

SouthEndMusic commented Nov 3, 2023

SouthEndMusic commented Nov 3, 2023

visr commented Nov 3, 2023

visr commented Nov 3, 2023

visr commented Nov 3, 2023

SouthEndMusic commented Nov 6, 2023 • edited Loading

visr left a comment

Choose a reason for hiding this comment

SouthEndMusic commented Nov 6, 2023 •

edited

Loading