Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rendering docs to PDF #737

Merged
merged 4 commits into from
Nov 6, 2023
Merged

Fix rendering docs to PDF #737

merged 4 commits into from
Nov 6, 2023

Conversation

SouthEndMusic
Copy link
Collaborator

Fix $\TeX$ and figure issues for rendering docs to PDF.

The docs are now modified so that all figures are included with HTML syntax in a way that figure numbers and references are preserved. However, I could only make this work in a way that hardcodes the figure numbers, both in the captions and the references.

@SouthEndMusic SouthEndMusic requested a review from visr November 3, 2023 19:56
@SouthEndMusic
Copy link
Collaborator Author

I used this script for the conversion (mainly by chatGPT):

import re
import os
import fnmatch

def process_qmd_file(qmd_file_path):
    # Read the content of the .qmd file
    with open(qmd_file_path, 'r', encoding='utf-8') as file:
        content = file.read()

    # Define a regular expression pattern to match image references
    image_pattern = r'!\[(.*?)\]\((.*?)\)(\{([^}]*)\})?'

    # Find all image references and build a mapping of labels to figure numbers
    figure_counter = 1
    figure_mapping = {}  # Store the mapping of labels to figure numbers

    def process_image(match):
        nonlocal figure_counter
        caption = match.group(1)
        image_path = match.group(2)
        label = match.group(4)
        
        # If label exists, add it to the mapping
        if label:
            label = label[1:]
            figure_mapping[label] = figure_counter
        
        # Create the figure caption
        if len(caption) > 0:
            figure_caption = f'alt="Figure {figure_counter}: {caption}"'
        else:
            figure_caption = ''

        figure_counter += 1
        
        # Replace the image reference with HTML description including caption
        html_image = f'<figure id="{label}" style="max-width: 100%;"><img src="{image_path}" {figure_caption} style="max-width: 100%;"><figcaption>{figure_caption}</figcaption></figure>'
        return html_image

    content = re.sub(image_pattern, process_image, content)

    # Define a regular expression pattern to match figure references
    figure_pattern = r'@fig-([a-zA-Z0-9-]+)'

    # Find all figure references and process them
    def process_figure(match):
        label = f"fig-{match.groups()[0]}"
        figure_number = figure_mapping.get(label)
        if figure_number is not None:
            # Replace the figure reference with HTML style reference
            html_reference = f'<a href="#{label}">Figure {figure_number}</a>'
            return html_reference
        else:
            return match.group(0)  # If label not found, return original match

    content = re.sub(figure_pattern, process_figure, content)

    # Write the updated content back to the .qmd file
    with open(qmd_file_path, 'w', encoding='utf-8') as file:
        file.write(content)


def find_files_by_extension(root_dir, file_extension):
    matches = []
    for root, dirnames, filenames in os.walk(root_dir):
        for filename in fnmatch.filter(filenames, f'*.{file_extension}'):
            matches.append(os.path.join(root, filename))
    return matches

for qmd_file_path in find_files_by_extension("Ribasim/docs", "qmd"):
    print(qmd_file_path)
    process_qmd_file(qmd_file_path)

@visr
Copy link
Member

visr commented Nov 3, 2023

@SouthEndMusic I looked around a bit in the quarto-cli issue tracker and found quarto-dev/quarto-cli#5537, which looks similar. Do you think that's the problem, certain characters in URLs, as fixed in quarto-dev/quarto-cli@3a861a8?

Would be good to try with a Quarto 1.4 pre-release from https://quarto.org/docs/download/
Only downside is that these pre-releases are not in conda-forge so we cannot use them via pixi.

@visr
Copy link
Member

visr commented Nov 3, 2023

By the way I confirm that quarto render docs --to pdf is no longer throwing an error, which is a big step forward. Probably more work is needed to make one single PDF though. I get per part one PDF:

index.pdf
modflow-demo.pdf

And the images don't seem to work yet for me.

Perhaps we can also learn from the iMOD Documentation here, they have separate YAML files: https://github.com/Deltares/iMOD-Documentation/blob/main/docs/_quarto-manual.yml. Wonder if they ran into similar issues.

@visr
Copy link
Member

visr commented Nov 3, 2023

Just testing single pages with quarto render modflow.qmd --to pdf and quarto render modflow-demo.qmd --to pdf using the quarto 1.4 pre-release seems to work well with both LaTeX and figures. This on this branch plus git revert ae305cf59af0fabc41910c7a1dec214a3ffbdfc3 to leave out ae305cf.

modflow.pdf
modflow-demo.pdf

@SouthEndMusic
Copy link
Collaborator Author

SouthEndMusic commented Nov 6, 2023

@visr I wondered why in my MWE it does work with the quarto logo, and the only thing I could think of was indeed that some urls are not recognised by quarto as such. This could be tested by looking at the $\TeX$ version of the MWE.

I also wondered whether there is syntax to tell Quarto explicitly that the source is a URL, and I found that the syntax here is slightly different from what we use, so that is also worth a check. Edit: ah no, that is for when the image itself is a link.

Copy link
Member

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @SouthEndMusic, we'll merge this, with the HTML image workarounds removed. This issue is already fixed upstream in Quarto, so we prefer to just wait for the release of quarto 1.4.

@visr visr merged commit 6bd31ba into main Nov 6, 2023
15 checks passed
@visr visr deleted the tex_docs_changes branch November 6, 2023 12:05
@visr visr mentioned this pull request Nov 6, 2023
3 tasks
visr added a commit that referenced this pull request Nov 13, 2023
Fix $\TeX$ and figure issues for rendering docs to PDF.

The docs are now modified so that all figures are included with HTML
syntax in a way that figure numbers and references are preserved.
However, I could only make this work in a way that hardcodes the figure
numbers, both in the captions and the references.

---------

Co-authored-by: Martijn Visser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants