Skip to content

Commit

Permalink
Merge: pyMuPDF (PR #2186)
Browse files Browse the repository at this point in the history
  • Loading branch information
rbeezer committed Jul 17, 2024
2 parents ba2ee89 + 7e2afd0 commit 3cd15de
Show file tree
Hide file tree
Showing 15 changed files with 39 additions and 161 deletions.
1 change: 0 additions & 1 deletion doc/guide/appendices/cli-v1vsv2.xml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
7 changes: 6 additions & 1 deletion doc/guide/appendices/python.xml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
</cd>
</p>

<p>With <c>lxml</c>, you have a collection of Python routines that interface with the same base libraries for <init>XSL</init> processing as the <c>xsltproc</c> executable. A second library is <c>requests</c> which moderates communications with online servers and is necessary to communicate with WeBWorK servers and with a YouTube server that provides thumbnail images for static versions of videos. The <c>pdfCropMargins</c> package provides a tool that will crop images during their production with the <c>pretext</c> script. Finally, <c>playwright</c> uses a Chromium headless browser to take static screenshots of interactive elements of your project.</p>
<p>With <c>lxml</c>, you have a collection of Python routines that interface with the same base libraries for <init>XSL</init> processing as the <c>xsltproc</c> executable. A second library is <c>requests</c> which moderates communications with online servers and is necessary to communicate with WeBWorK servers and with a YouTube server that provides thumbnail images for static versions of videos. The <c>pdfCropMargins</c> package provides a tool that will crop images during their production with the <c>pretext</c> script. The <c>pyMuPDF</c> library then converts the <init>PDF</init> that was cropped to <init>SVG</init> and <init>PNG</init> images. Finally, <c>playwright</c> uses a Chromium headless browser to take static screenshots of interactive elements of your project.</p>

<p>Note that right after you install <c>playwright</c> then you want to run<cd>
<cline> playwright install</cline>
Expand Down Expand Up @@ -117,6 +117,11 @@
<cell><c>playwright</c></cell>
<cell><p>Automatic screenshots of interactive elements</p></cell>
</row>
<row>
<cell><c>pyMuPDF</c></cell>
<cell><p>Convert images to SVG and PNG</p></cell>
</row>
</init>
</tabular>
</table>
</section>
Expand Down
58 changes: 1 addition & 57 deletions doc/guide/appendices/windows-cli.xml
Original file line number Diff line number Diff line change
Expand Up @@ -144,62 +144,6 @@
One thing to keep in mind: with MikTeX, you <em>must</em> run an initial update from the MikTeX pacakage manager before it will work correctly.
</p>
</li>

<li>
<title>pdf2svg</title>
<p>
If your book contains <latex /> images such as TikZ, the <pretext />-CLI will process those images using whatever version of <latex /> you installed,
and then it needs <c>pdf2svg</c> to convert the output to <init>SVG</init> format for use in <init>HTML</init>.
</p>

<p>
There is no Windows <q>installer</q> for thos program. Instead, it is available as a GitHub repository.
The easiest way to get it onto your computer is to use the newly-installed Git Bash terminal.
The first time you open Git Bash, your <em>working directory</em> might be the root <c>c:</c> drive.
Type <c>pwd</c> to find out what folder you're in.
It doesn't matter where you install <c>pdf2svg</c>, as long as you're aware of where it is.
If you want to change to a different folder, you can do so using the <c>cd</c> command.
</p>

<p>
An important note of caution: avoid using directories such as <c>Program Files</c> that have spaces in the name.
This can cause problems with some of the scripts used by the <pretext />-CLI.
For this example, we will choose to place the program in our own user directory.
In Git Bash, type <c>cd Users/Sean</c> (where you will replace <q>Sean</q> with your own user name, as long as it doesn't contain spaces).
</p>

<p>
Next, go to the <url href="https://github.com/jalios/pdf2svg-windows" visual="github.com/jailos/pdf2svg-windows">pdf2svg repository on GitHub</url>.
Click on the <q>Code</q> button, and then click on the clipboard icon to copy the <init>URL</init>, as shown below:
</p>

<image width="85%" source="pdf2svg-download.png">
<description>Screenshot of the GitHub page for the pdf2svg project, showing where to click to copy the project URL to the clipboard</description>
</image>

<p>
Next, in Git Bash, type <c>git clone</c> and then right-click to paste the <init>URL</init> you copied,
or use the keyboard shortcut <kbd>Shift</kbd>-<kbd>Insert</kbd>. Hit <kbd>Enter</kbd>.
</p>

<p>
You will now have the necessary software downloaded to your computer.
In my example, it is now available in <c>C:\Users\Sean\pdf2svg-windows</c>.
To complete the installation, we need to add the program to the Windows environment variables.
Hit the <kbd>Windows</kbd> key, and type <q>path</q>. Open the suggested program.
</p>

<p>
As shown below, click on the <q>Environment Variables...</q> button, then click on the line beginning with <q>Path</q>,
and click on the <q>Edit</q> button.
You can then click on <q>Browse</q>, and find the <c>pdf2svg-windows</c> folder you just downloaded.
</p>

<p>
You should now have a working installation of <c>pdf2svg</c>.
To confirm, type <c>which pdf2svg</c> in Git Bash. It should display the path to the <c>pdf2svg</c>program. (You may have to restart Git Bash first.)
</p>
</li>
</dl>
</p>

Expand Down Expand Up @@ -365,7 +309,7 @@
Finally, we are ready to install the <pretext />-CLI. This is perhaps the easiest step of the whole process.
From the Git Bash terminal, first type <c>which python</c> to confirm that Python has been successfully added to the PATH.
You should see the path to your Python program if things are working correctly.
If you don't, you may need to reinstall Python, or you can manually add it as we did for <c>pdf2svg</c>.
If you don't, you may need to reinstall Python, or you can manually add it.
</p>

<p>
Expand Down
39 changes: 0 additions & 39 deletions doc/guide/appendices/windows.xml
Original file line number Diff line number Diff line change
Expand Up @@ -264,45 +264,6 @@
<p>Congratulations, you have successfully installed ImageMagick.</p>
</section>

<section xml:id="section-installing-pdf2svg">
<title>Installing <c>pdf2svg</c></title>
<introduction>
<p>The installation procedure uses <c>git</c>. Open Git Bash and change to your root directory:<cd>cd /c</cd>Clone the repository into <c>C:\pdf2svg</c>:<cd>git clone https://github.com/jalios/pdf2svg-windows.git pdf2svg</cd></p>
</introduction>
<subsection xml:id="subsection-change-path-environment-variable-pdf2svg">
<title>Change PATH environment variable</title>
<p>We need to add the <c>pdf2svg</c> program to the Windows PATH. This is similar to what is done above, in <xref ref="change-path-environment-variable-xsltproc" />.</p>
<list xml:id="steps-change-path-environment-variable-pdf2svg">
<title>Path Environment Variable for <c>pdf2svg</c></title>
<ol>
<li>
<p>Open the Start menu and start typing <q>Edit the system environment variables</q>. Select this option when it becomes visible.</p>
</li>
<li>
<p>Click the Environment Variables button near the bottom of the dialog.</p>
</li>
<li>
<p>In the bottom part of the dialog labeled <q>System environment variables</q>, look for a variable named <c>PATH</c>. You may need to scroll.</p>
</li>
<li>
<p>If you do find the <c>PATH</c> variable, select it and click the Edit... button.</p>
</li>
<li>
<p>You should see a dialog with two text fields. Your variable name should be <c>PATH</c>.</p>
</li>
<li>
<p>Place the cursor in the existing value and press the End key, so that the cursor moves to the back of the line. The <c>PATH</c> string is a <c>;</c>-delimited list of full path names, so append the string <c>C:\pdf2svg\dist-32bits;</c> or <c>C:\pdf2svg\dist-64bits;</c> (note the semicolon) to the existing value. </p>
</li>
<li>
<p>Click OK to save changes.</p>
</li>
</ol>
</list>
</subsection>
<conclusion>
<p>Congratulations, you have successfully installed <c>pdf2svg</c>.</p>
</conclusion>
</section>

<section xml:id="section-jing">
<title>Installing jing</title>
Expand Down
25 changes: 1 addition & 24 deletions doc/guide/author/author-faq.xml
Original file line number Diff line number Diff line change
Expand Up @@ -59,30 +59,7 @@
<li>
<title>How do I install <c>pdf2svg</c>?</title>
<p>
<c>pdf2svg</c> is necessary for TikZ diagrams in HTML.
</p>

<p>
On Debian or Ubuntu Linux:
<c>sudo apt-get install pdf2svg</c>.
</p>

<p>
To install <c>pdf2svg</c> on a Mac, you will need to install MacPorts.
Read the directions carefully, since you will need to install Xcode
(available from the Mac App Store) first.
Make sure that the command line tools are installed by running Xcode.
After Xcode is installed read the directions to install Macports.
Once MacPorts is installed run the following command to install <c>pdf2svg</c>:
<c>sudo port install pdf2svg</c>.
Be patient as this will take a few minutes.
To get rid of any intermediate build files, run the command
<c>sudo port clean --all all</c>.
Again be patient.
</p>

<p>
If you have trouble with MacPorts, try HomeBrew.
As of July 2024, you no longer need <c>pdf2svg</c> to process latex-images; we now use the python <c>pyMuPDF</c> library instead.
</p>
</li>

Expand Down
1 change: 0 additions & 1 deletion doc/guide/author/processing.xml
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,6 @@
latex="latex"
pdflatex="pdflatex"
xelatex="xelatex"
pdfsvg="pdf2svg"
asy="asy"
sage="sage"
pdfpng="convert"
Expand Down
2 changes: 1 addition & 1 deletion doc/guide/author/topics.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3236,7 +3236,7 @@ displayed line, and there are no <c>\\</c>s. Use <c>\amp</c> to mark the alignm

<p>As an instructor, you might want to recycle images from a text for a classroom presentation, a project handout, or an examination question. As an author, you can elect to make images files available through links in the HTML version, and it is easy and flexible to produce those links automatically.</p>

<p>First, it is your responsibility to manufacture the files. For making different formats, the <c>pretext</c> script can sometimes help (<xref ref="pretext-script" />). The Image Magick <c>convert</c> command is a quick way to make raster images in different formats, while the <c>pdf2svg</c> executable is good for converting vector graphics <init>PDF</init>s into <init>SVG</init>s. Also, to make this easy to specify, different versions of the same image must have identical paths and names, other than the suffixes. Finally, the case and spelling of the suffix in your <pretext/> source must match the filename (<eg /> <c>jpg</c> versus <c>JPEG</c>). OK, those are the ground rules.</p>
<p>First, it is your responsibility to manufacture the files. For making different formats, the <c>pretext</c> script can sometimes help (<xref ref="pretext-script" />). The Image Magick <c>convert</c> command is a quick way to make raster images in different formats, while the <c>pdf2svg</c> executable is good for converting vector graphics <init>PDF</init>s into <init>SVG</init>s (although now the <c>pretext</c> script uses the <c>pyMuPDF</c> library for these tasks instead). Also, to make this easy to specify, different versions of the same image must have identical paths and names, other than the suffixes. Finally, the case and spelling of the suffix in your <pretext/> source must match the filename (<eg /> <c>jpg</c> versus <c>JPEG</c>). OK, those are the ground rules.</p>

<p>For links for a single image, add the <attr>archive</attr> attribute to the <tag>image</tag> element, such as<cd>
<cline>&lt;image ... archive="pdf svg"&gt;</cline>
Expand Down
1 change: 0 additions & 1 deletion doc/guide/project.ptx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
1 change: 0 additions & 1 deletion examples/minimal/project.ptx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
1 change: 0 additions & 1 deletion examples/sample-article/project.ptx
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
1 change: 0 additions & 1 deletion examples/showcase/project.ptx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
1 change: 0 additions & 1 deletion examples/webwork/sample-chapter/project.ptx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@
<latex>latex</latex>
<pdflatex>pdflatex</pdflatex>
<xelatex>xelatex</xelatex>
<pdfsvg>pdf2svg</pdfsvg>
<asy>asy</asy>
<sage>sage</sage>
<pdfpng>convert</pdfpng>
Expand Down
2 changes: 1 addition & 1 deletion pretext/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Example: TikZ code for graphics images can be extracted and written
into "standalone" files with XSL, and then this script will continue
on to apply LaTeX to the files, creating a PDF, then optionally
convert these PDFS into other formats, e.g. creating SVG images
via the pdf2svg utility.
via the pyMuPDF library.

`pretext.cfg`
-------------
Expand Down
2 changes: 1 addition & 1 deletion pretext/pretext.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,13 @@
# "pdfcrop" key is only useful for options
# 2022-06-24 Remove use of "pdfcrop" for options
# 2022-06-28 Remove pageres key, in favor of pyppeteer package
# 2024-06-30 Remove pdfsvg key, in favor of pyMuPDF library


[executables]
latex = latex
pdflatex = pdflatex
xelatex = xelatex
pdfsvg = pdf2svg
asy = asy
mermaid = mmdc
sage = sage
Expand Down
58 changes: 28 additions & 30 deletions pretext/pretext.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ def sage_conversion(
shutil.copy2(sageout, dest_dir)

def latex_image_conversion(
xml_source, pub_file, stringparams, xmlid_root, dest_dir, outformat, method, pyMuPDF=False
xml_source, pub_file, stringparams, xmlid_root, dest_dir, outformat, method, pyMuPDF=True
):
# stringparams is a dictionary, best for lxml parsing

Expand Down Expand Up @@ -577,21 +577,20 @@ def latex_image_conversion(
if outformat == "svg" or outformat == "all":
if pyMuPDF:
# create svg using pymupdf:
log.info("converting {} to {}".format(latex_image_pdf, latex_image_svg))
with fitz.Document(latex_image_pdf) as doc:
svg = doc.load_page(0).get_svg_image()
with open(latex_image_svg, "w") as f:
f.write(svg)
shutil.copy2(latex_image_svg, dest_dir)
# classic way to produce svg, using pdf2svg:
latex_image_svg = "classic-" + latex_image_svg
pdfsvg_executable_cmd = get_executable_cmd("pdfsvg")
# TODO why this debug line? get_executable_cmd() outputs the same debug info
log.debug("pdfsvg executable: {}".format(pdfsvg_executable_cmd[0]))
svg_cmd = pdfsvg_executable_cmd + [latex_image_pdf, latex_image_svg]
log.info(
"converting {} to {}".format(latex_image_pdf, latex_image_svg)
)
subprocess.call(svg_cmd)
else:
pdfsvg_executable_cmd = get_executable_cmd("pdfsvg")
# TODO why this debug line? get_executable_cmd() outputs the same debug info
log.debug("pdfsvg executable: {}".format(pdfsvg_executable_cmd[0]))
svg_cmd = pdfsvg_executable_cmd + [latex_image_pdf, latex_image_svg]
log.info(
"converting {} to {} using {}".format(latex_image_pdf, latex_image_svg, svg_cmd)
)
subprocess.call(svg_cmd)
if not os.path.exists(latex_image_svg):
log.error(
"There was a problem converting {} to svg and {} was not created".format(
Expand All @@ -602,27 +601,27 @@ def latex_image_conversion(
if outformat == "png" or outformat == "all":
if pyMuPDF:
# create high-quality png using pymupdf:
log.info("converting {} to {}".format(latex_image_pdf, latex_image_png))
with fitz.Document(latex_image_pdf) as doc:
png = doc.load_page(0).get_pixmap(dpi=300, alpha=True)
png.save(latex_image_png)
shutil.copy2(latex_image_png, dest_dir)
# classic method: create high-quality png, presumes "convert" executable
latex_image_png = "classic-" + latex_image_png
pdfpng_executable_cmd = get_executable_cmd("pdfpng")
# TODO why this debug line? get_executable_cmd() outputs the same debug info
log.debug("pdfpng executable: {}".format(pdfpng_executable_cmd[0]))
png_cmd = pdfpng_executable_cmd + [
"-density",
"300",
latex_image_pdf,
"-quality",
"100",
latex_image_png,
]
log.info(
"converting {} to {}".format(latex_image_pdf, latex_image_png)
)
subprocess.call(png_cmd)
else:
pdfpng_executable_cmd = get_executable_cmd("pdfpng")
# TODO why this debug line? get_executable_cmd() outputs the same debug info
log.debug("pdfpng executable: {}".format(pdfpng_executable_cmd[0]))
png_cmd = pdfpng_executable_cmd + [
"-density",
"300",
latex_image_pdf,
"-quality",
"100",
latex_image_png,
]
log.info(
"converting {} to {} using command {}".format(latex_image_pdf, latex_image_png, png_cmd)
)
subprocess.call(png_cmd)
if not os.path.exists(latex_image_png):
log.error(
"There was a problem converting {} to png and {} was not created".format(
Expand Down Expand Up @@ -663,7 +662,6 @@ def latex_image_conversion(
image_list = "\n " + "\n ".join(failed_images)
raise ValueError(msg + image_list)


#############################################
#
# Binary Source Files to Base 64 in XML Files
Expand Down

0 comments on commit 3cd15de

Please sign in to comment.