Skip to content

Create versioned releases of pandoc-wasm #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 69 commits into
base: master
Choose a base branch
from

Conversation

johanneswilm
Copy link

Hey @TerrorJack @tweag @amesgen

This is the kind of thing I was thinking of in #9 . The changes are basically:

  • Include a patch to the pandoc sources and link to the original pandoc repository. No need to maintain a fork of the pandoc sources. Only the patch may need to be adjusted every now and then.

  • Publish versioned release files (for example by using {pandoc-version}.{pandoc-wasm-version}. Currently that would be 3.6.3.0.

Ideally, these releases would also be pushed to something like npm or some other repository for wasm files.

I understand a lot is going on in the wasm space and you expect changes on how this is done in the not too distant future. But meanwhile, these releases already serve a purpose so it would make sense to distribute them.

repository: haskell-wasm/pandoc
ref: wasm
repository: jgm/pandoc
ref: main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to put a version number on these builds, shouldn't they also be pinned to a specific commit (probably a release tag) for the underlying Pandoc version too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque Yes, you are right. I am trying to figure out how to best do this. It should probably be easy to get it to build with the most recent pandoc version and we probably need another digit to the number to show the version of the pandoc-wasm package. So 3.6.3.x instead of just 3.6.3. If you have a proposal of how to do the versioning in the most standard complying and simple way - I'm all for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a segment will make it hard to parse because Pandoc follows PVP and has a variable number of segments. I think you're going to need a different segment operator that plays nice with distro versioning (I think + is the most robust option). The question is probably does it make more sense to version this project first, then append the relevant Pandoc version, or the other way around? e.g. pandoc-wasm-3.6.3+0.1 or pandoc-wasm-0.1+3.6.3. I think the latter probably makes more sense but it depends on the expected release channels and use workflows I guess. I don't really have a handle on that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque OK, as someone working mainly with JS/TS in browsers, I would expect it to become available in npm. But I'm open to others needing it other places - maybe? I like both of your versioning proposals, and unless there is reason not to do so, I'd go for the second one then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque This seems to now be working with tagged versions, etc. . Only thing remaining is put it it on npm. As I assume that @TerrorJack or @tweaf or @alerque will want to have control over the npm repository after merging this PR (or writing something similar), I will not add that part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque It looks like npm does not allow this versioning scheme. It only allows 3 digit semver. So I split the two numbering apart. It's still really easy to update the pandoc version though.

@johanneswilm
Copy link
Author

@georgestagg It looks like your archived pandoc-wasm package is registered with npm. Could we maybe put this in its place?

@georgestagg
Copy link

georgestagg commented Feb 11, 2025

Yes, you have my permission. Which npm username should I invite as a maintainer for the package?

@johanneswilm
Copy link
Author

@georgestagg I'll only do it if none of those who really developed this package will do it. My username is johanneswilm. But it would be better if @alerque @TerrorJack or someone else here would do it.I know very little about haskell and wasm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you used haskell-wasm/pandoc#1 to create this, makes sense 👍

Now that I see the minimized diff again, I realize that the only remaining thing that is actually patched in the Haskell code here is the addition of the wasm_main export, such that the same .wasm file can be used both as a command module (e.g. locally via wasmtime) and as a reactor (for the web app). While this is quite neat, having separate .wasm files for these would have the following advantages:

  • Apart from the removal of -threaded (which could be upstreamed behind a arch(wasm32) conditional), no pandoc patching would be necessary, one could just build stock pandoc-cli (of course with an appropriate cabal.project for various dependencies).
  • In the separate build that exposes an FFI, one could actually use the full Wasm JSFFI which is more convenient.

@johanneswilm In your use case, do you use the .wasm file as a command or a reactor module?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amesgen That sounds promising. Yes, I took your patch in order to make the diff smaller.

My use case is this: I have created an open source word processor similar to Google Docs/Microsoft Word 365 Online, etc. for a specific niche market. I have written a number of export filters myself to common formats (like DOCX, ODT, HTML, EPUB, JATS, etc.). These are written in JS and all run in the end users browser. I have even written an import filter for ODT files that works the same way.

Now I have a number of users wanting to import and export from other, more exotic formats. So I've created import/export filters in JS to the pandoc internal json format. And I then let the client send the pandoc json to a server where pandoc is run in server mode, converting the json to one of the other formats and sending it back to the client.

This is a bit problematic as these conversions take up processing power on the server and it's quite complex to deploy pandoc to a number of different architectures for which there are no pre-compiled binaries. There could even be security issues about sending various files back and forth.

So my idea is to instead use this, if the user clicks on export to or import from an "exotic" format for the first time, the browser will download the pandoc.wasm file (to cache for future use) and then to do the conversion in the users own browser instead - taking the users own processing power and not that of the server.

I assume you are referring to the terms "reactor" and "command" as they are defined here [1]. Given that I only want to execute the conversion based on one input and then to close down again (only caching the binary so it doesn't have to be downloaded again), I assume this corresponds to pandoc-cli more than pandoc-server.

I haven't yet tried whether it is actually possible. Based on the web demo it seems like it should though.

I don't know the who-is-who of the haskell-wasm world either. So I don't know which one of all of you can make any decisions here and who would be a good candidate to maintain an npm package of pandoc-wasm. I do maintain several open source packages, but those are written in languages that I use daily (like JS/TS or Python). So if one of you would want to step forward and do this, I'd be in favor.

[1] WebAssembly/WASI#13 (comment)

@johanneswilm johanneswilm requested a review from alerque March 24, 2025 22:19
@johanneswilm
Copy link
Author

@TerrorJack I had to pin ghc-wasm-meta to a specific git commit as your recent changes there seem to have broken the pandoc wasm build. By pinning ghc-wasm-meta, I was able to build it with pandoc 3.6.4 (latest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants