I am currently working on a new version of Stirling PDF with the option to run operations on the client. The older builds of pdfcpu-wasm I have are outdated and don't work with latest tech anymore. To be exact js running in strict mode, wich is a requirement set by vite and therefore Strling PDF.
Follow guide to compile pdfcpu-wasm (worked only on linux/wsl from me)
Guide: wcchoi/go-wasm-pdfcpu (First two codeblocks)
We'll neet to navigate to the files we've just built cd .\pdfcpu\cmd\pdfcpu\
and copy over all relevant wasm_execs from our local go installation cp /usr/local/go/misc/wasm/* ./
.
This setup should already run (albeit very limited) in node and you can check if it does by running node wasm_exec_node.js pdfcpu.wasm
.
To check if it runs correctly in the browser as well we first need to modify some of the files we just copied. "wasm_exec.html" should be updated to fetch the correct wasm file:
WebAssembly.instantiateStreaming(fetch("pdfcpu.wasm"), go.importObject).then((result) => {
And we need a way to serve the files to the browser. For testing I used the static_server.js-script (aolde/static_server.js) as done in the previously mentioned guide and adding wasm to the mime types to allow it serving .wasm.
"wasm": "application/wasm",
After running node static_server.js
we are able to access pdfcpu at http://localhost:8888/wasm_exec.html in a browser.
In order to run it in the browser and to stop node from writing files to disk we need an in-memory filesystem compatible with nodes fs like memfs.
Just like wasm_exec_node.js is used to inject node-fs specific functions, we will create a new file for memfs.
wasm_exec_memfs.js
import memfs from 'https://cdn.jsdelivr.net/npm/[email protected]/+esm';
globalThis.fs = memfs.fs;
import "./wasm_exec.js";
After some modifications to wasm_exec.html we should be good to go. We'll remove <script src="wasm_exec.js"></script>
and replace it with an import statement at the beginning of the next script tag import "./wasm_exec_memfs.js";
importing our new wasm_exec using memfs.
Testing this you will see, that the console output is broken now. To get the console working again, we will use the guide from the beginning again: Guide: wcchoi/go-wasm-pdfcpu.
Memfs does not seem to forward STDOUT/STDERR of the wasm module. We will need to add this code to wasm_exec_memfs.js in order to do that:
const encoder = new TextEncoder("utf-8");
const decoder = new TextDecoder("utf-8");
let outputBuf = "";
globalThis.fs.writeSyncOriginal = globalThis.fs.writeSync;
globalThis.fs.writeSync = function(fd, buf) {
if (fd === 1 || fd === 2) {
outputBuf += decoder.decode(buf);
const nl = outputBuf.lastIndexOf("\n");
if (nl != -1) {
console.log(outputBuf.substr(0, nl));
outputBuf = outputBuf.substr(nl + 1);
}
return buf.length;
} else {
return globalThis.fs.writeSyncOriginal(...arguments);
}
};
globalThis.fs.writeOriginal = globalThis.fs.write;
globalThis.fs.write = function(fd, buf, offset, length, position, callback) {
if (fd === 1 || fd === 2) {
if (offset !== 0 || length !== buf.length || position !== null) {
throw new Error("not implemented");
}
const n = this.writeSync(fd, buf);
callback(null, n, buf);
} else {
return globalThis.fs.writeOriginal(...arguments);
}
};
Note
Hi, it's me from the future, this is probably fixed in a future version of pdfcpu, so if you don't run into this issue, you can skip this part.
Error:
pdfcpu: config dir problem: permissions is numeric, got: 0xF0C3
Debugging Steps:
- Tried running wasm in a different runtime: Wasmtime - Not comparable because it needs wasm was compiled for wasip1 not for js: StackOverflow
- Bug report on github
Two days later...
Disabling the config-dir should solve the issue. Thanks @henrixapp!
var ConfigPath string = "disable"
With this out of our way, let's continue!
To pass files into wasm we need to write them to memfs. Writing should be as simple as writing the buffer of the file but in a browser environment we don't have a buffer readily available, so let's install one.
Inside of "wasm_exec.html" we will firstly import a buffer library.
import { Buffer } from 'https://cdn.jsdelivr.net/npm/[email protected]/+esm'
Then we will add arguments to our pdfcpu process:
const go = new Go();
go.argv = ['pdfcpu.wasm', 'validate', '/input.pdf'];
Give the user the option to select a file
<input type="file" id="fileInput" />
And lastly write the selected file to wasmfs before executing the wasm binary.
window.run = async () => {
console.clear();
let buffer = await document.getElementById("fileInput").files[0].arrayBuffer();
await globalThis.fs.promises.writeFile("/input.pdf", Buffer.from(buffer));
await go.run(inst);
inst = await WebAssembly.instantiate(mod, go.importObject); // reset instance
}
After that we are ready to run the command to validate a pdf-file.
We'll see the validation pass for the first time 🎉!
wasm_exec_memfs.js:17 validating(mode=relaxed) /input.pdf ...
wasm_exec_memfs.js:17 validation ok
Awesome, now lets try something that would actually write some data. Like wcchoi lets try extracting the first page:
go.argv = ['pdfcpu.wasm', 'trim', '-pages', '1', '/input.pdf', '/output.pdf'];
Downloading the file (as well as some cleanup) is as simple as:
globalThis.fs.promises.unlink("/input.pdf");
const result = await globalThis.fs.promises.readFile("/output.pdf");
var blob = new Blob([result], {type: "application/pdf"});
var objectUrl = URL.createObjectURL(blob);
window.open(objectUrl);
globalThis.fs.promises.unlink("/output.pdf");
We now have a fully functioning version of pdfcpu inside our browser!
Let's setup the same thing in nodejs.
Create a new file called wasm_exec_memfs_node.js and paste the contents from wasm_exec_memfs.js with one small change: MemFS can now be imported directly after installing it via npm i memfs
import memfs from 'memfs';
Create another file calld run_wasm_node.js and use the contents of wasm_exec.html's script tag as a base. Next we'll mix it with changes from wasm_exec_node.js to load the wasm module correctly and use nodes available libraries where possible.
import "./wasm_exec_memfs_node.js";
import fs from "node:fs";
const go = new Go();
go.argv = ['pdfcpu.wasm', 'trim', '-pages', '1', '/input.pdf', '/output.pdf'];
WebAssembly.instantiate(fs.readFileSync("pdfcpu.wasm"), go.importObject).then(async (result) => {
process.on("exit", (code) => { // Node.js exits if no event handler is pending
if (code === 0 && !go.exited) {
// deadlock, make Go print error and stack traces
go._pendingEvent = { id: 0 };
go._resume();
}
});
const buffer = fs.readFileSync('./input.pdf');
await globalThis.fs.promises.writeFile("/input.pdf", buffer);
await go.run(result.instance);
globalThis.fs.promises.unlink("/input.pdf");
const pdfcpu_result = await globalThis.fs.promises.readFile("/output.pdf");
fs.writeFileSync("./output.pdf", pdfcpu_result);
globalThis.fs.promises.unlink("/output.pdf");
}).catch((err) => {
console.error(err);
});
NodeJS will now read the file ./input.pdf and write the first page of that file to ./output.pdf when node .\run_wasm_node.js
is executed.
It should now finally be possible to integrate this into StirlingPDF-v2!