Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move structured import/export name information into the import/export string #263

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

lukewagner
Copy link
Member

This PR fleshes out the idea in #253 to move the structured information of imports/exports into the import/export name string, maintaining the same validation structure, but now inside a single quoted strong. The motivation for this change is to have a simple canonical string for all import/export names that contains all the structured information, without requiring host and toolchain implementations to define their own ad hoc string-serialization conventions. The rationale here is basically the same as the earlier design choice to put method/constructor/etc annotations inside the name strings, with the net effect of making the way externally-meaningful name information is represented in components somewhat more regular.

Some specific details:

  • This PR is intended to be backwards-compatible at the binary format level, assuming that the 5 new "implementation import" cases added by Allow imports to name implementations #222 haven't been implemented or used yet. This binary compatibility is achieved by allowing a now-"junk" byte that was previously used to discriminate (interface ...) imports.
  • The PR moves the existing kebab-name/label grammar to the "Imports and Exports" section so that the whole group can be defined in one block and also use a lexical grammar that's more explicit about whitespace and metacharacters. This ended up surfacing a spec-bug in which labels weren't being quoted.
  • As there are now a fair number of "<___name>"s in the grammar, it seemed odd to have <name> refer to the kebab-name case. Moreover, <name> wasn't really a "kebab-name" anyway, since it also contains method/constructor/static annotations (with embedded square brackets and periods). Thus, this PR proposes to rename this production to <plainname>, attempting to distinguish plain names from all the other kinds of names that carry more semantic information, but happy to hear other thoughts on how to call this.
  • Embedded URLs and registry names are deliminted by angle-brackets (not parens, as discussed in Inferring import kind from string syntax #253), mostly because (1) parens are allowed in URLs, (2) double-quotes require extra escaping in WAT, which seems like it'll be annoying, (3) other than double-quotes, angle-brackets are the RFC-recommended way to delimit URLs. The other implementation imports follow the same scheme as URLs just for symmetry.

Copying from this PR, an example showing the new import/export names is:

(component
  (import "custom-hook" (func (param string) (result string)))
  (import "wasi:http/handler" (instance
    (export "request" (type $request (sub resource)))
    (export "response" (type $response (sub resource)))
    (export "handle" (func (param (own $request)) (result (own $response))))
  ))
  (import "url=<https://mycdn.com/my-component.wasm>" (component ...))
  (import "relative-url=<./other-component.wasm>,integrity=<sha256-X9ArH3k...>" (component ...))
  (import "locked-dep=<my-registry:[email protected]>,integrity=<sha256-H8BRh8j...>" (component ...))
  (import "unlocked-dep=<my-registry:imagemagick@{>=1.0.0}>" (instance ...))
  (import "integrity=<sha256-Y3BsI4l...>" (component ...))
  ... impl
  (export "wasi:http/handler" (instance $http_handler_impl))
  (export "get-JSON" (func $get_json_impl (result string)))
)

Copy link
Collaborator

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me 👍

design/mvp/Binary.md Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Collaborator

To confirm, these two binary rules:

importdecl    ::= in:<importname> ed:<externdesc>         => (import in ed)
exportdecl    ::= en:<exportname> ed:<externdesc>         => (export en ed)

should change to refer to importname' and exportname', right?

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Oct 24, 2023
This commit removes the `wasmparser::ComponentExternName` type and
updates wasmparser to handle WebAssembly/component-model#263. Names are
now always stored as a strings and are not discriminated in the binary
format. This additionally refactors the internals of `wasmparser` to
handle parsed names differently and updates tools like `wit-component`
to handle the new API changes.

Test aren't passing yet but things are compiling at least.
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Oct 24, 2023
@alexcrichton
Copy link
Collaborator

One other piece of feedback from bytecodealliance/wasm-tools#1262, an implementation of this (and a few other PRs to the component model). Currently URLs are defined as "stuff that doesn't contain < or >", but integrity-metadata is defined with the w3c definition. It might make sense to do the same thing in both situations? Either require both are w3c-defined valid or both are something-nonempty-someone-else-parses-this.

I'll admit though I know nothing about integrity-metadata in w3c, if it's a simple format that's easy to parse then may as well "just" do that too.

@lukewagner
Copy link
Member Author

@alexcrichton That's a good question and I was wondering about the same options for the same reason. Looking at the EBNF of integrity-metadata (here) and recursing through all the definitions, it does seem like integrity-metadata is much much simpler than a URL, but a good deal more complicated than, say, an interfacename, so it's in a sortof grey area where I could go either way.

But another reason in favor of having the Component Model fully parse integrity-metadata is that, unlike URLs, which are pretty universal (ignoring the many awful edge cases), there are, I believe, a couple of different hash text formats out there. E.g., OCI uses something different. Thus, it seems quite possible that different parts of the ecosystem might end up adopting slightly-different hash formats (accidentally or intentionally) if the common validation rules baked into the core-est tooling don't fix one.

@alexcrichton
Copy link
Collaborator

Ok makes sense to me! I'll dig in and implement that

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Oct 27, 2023
This commit removes the `wasmparser::ComponentExternName` type and
updates wasmparser to handle WebAssembly/component-model#263. Names are
now always stored as a strings and are not discriminated in the binary
format. This additionally refactors the internals of `wasmparser` to
handle parsed names differently and updates tools like `wit-component`
to handle the new API changes.

Test aren't passing yet but things are compiling at least.
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Oct 27, 2023
Copy link
Contributor

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I'm reviewing the implementation side of this currently as well.

@lukewagner
Copy link
Member Author

Awesome, thanks to you and Alex for the feedback. I'll merge now since there don't seem to be any open issues and we can file follow-up issues if needed as usual.

@lukewagner lukewagner merged commit 823beba into main Oct 27, 2023
2 checks passed
@lukewagner lukewagner deleted the stringify-external-names branch October 27, 2023 21:32
alexcrichton added a commit to bytecodealliance/wasm-tools that referenced this pull request Oct 28, 2023
#1262)

* Update parsing of component model names

This commit removes the `wasmparser::ComponentExternName` type and
updates wasmparser to handle WebAssembly/component-model#263. Names are
now always stored as a strings and are not discriminated in the binary
format. This additionally refactors the internals of `wasmparser` to
handle parsed names differently and updates tools like `wit-component`
to handle the new API changes.

Test aren't passing yet but things are compiling at least.

* Update `*.wat` parsing and some tests

Also update `wasm-encoder`'s API for the new AST.

* Update to allow import/export overlap

Handles WebAssembly/component-model#259

* Update wit-component test expectations

* Fix tests

* Update wasm-encoder to produce "old" binaries

In wasm-encoder produce the "old" encoding by default rather than the
new encoding. This can get removed once enough tooling supports the
"new" encoding.

* Implement parsing of new-style names

Specified in WebAssembly/component-model#263

* Add comments

* Update some TODO

* Disallow some kinds of strings in exports

* Update parsing of integrity metadata

* Review feedback
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Sep 3, 2024
Starting in WebAssembly/component-model#263 the component model binary
specification was updated in a technically breaking way to encode
binaries differently. This was intended to be rolled out in a manner
that minimized breakage however so bytecodealliance#1262
implemented validation where the prefix byte which changed was actually
ignored if it was 0 or 1. The encoder at the time still emitted 1 as the
prefix byte, however. The intention was that once the validator had
percolated enough the encoder would switch to using 0.

Unfortunately this change was basically forgotten about until now with
WebAssembly/component-model#391, but now's probably "enough time passed"
so the encoder is updated to emit a 0x00 leading byte in this situation
instead of the historical 0x01 byte. This will start the transition
period to eventually removing the validator's acceptance of the 0x01
byte.
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Sep 3, 2024
Starting in WebAssembly/component-model#263 the component model binary
specification was updated in a technically breaking way to encode
binaries differently. This was intended to be rolled out in a manner
that minimized breakage however so bytecodealliance#1262
implemented validation where the prefix byte which changed was actually
ignored if it was 0 or 1. The encoder at the time still emitted 1 as the
prefix byte, however. The intention was that once the validator had
percolated enough the encoder would switch to using 0.

Unfortunately this change was basically forgotten about until now with
WebAssembly/component-model#391, but now's probably "enough time passed"
so the encoder is updated to emit a 0x00 leading byte in this situation
instead of the historical 0x01 byte. This will start the transition
period to eventually removing the validator's acceptance of the 0x01
byte.
github-merge-queue bot pushed a commit to bytecodealliance/wasm-tools that referenced this pull request Sep 9, 2024
* Proceed to the next step in the "remove `interface`" transition

Starting in WebAssembly/component-model#263 the component model binary
specification was updated in a technically breaking way to encode
binaries differently. This was intended to be rolled out in a manner
that minimized breakage however so #1262
implemented validation where the prefix byte which changed was actually
ignored if it was 0 or 1. The encoder at the time still emitted 1 as the
prefix byte, however. The intention was that once the validator had
percolated enough the encoder would switch to using 0.

Unfortunately this change was basically forgotten about until now with
WebAssembly/component-model#391, but now's probably "enough time passed"
so the encoder is updated to emit a 0x00 leading byte in this situation
instead of the historical 0x01 byte. This will start the transition
period to eventually removing the validator's acceptance of the 0x01
byte.

* Update test expectation
sunfishcode added a commit to sunfishcode/cargo-component that referenced this pull request Nov 15, 2024
Update to the latest wit-bindgen, wasm-tools, warg, and
wasi-preview1-component-adapter-provider crates.

In particular, this updates to the wasm-encoder that always uses a
leading zero byte, following WebAssembly/component-model#263.

Fixes bytecodealliance#350.
calvinrp pushed a commit to bytecodealliance/cargo-component that referenced this pull request Nov 15, 2024
* Update to the latest wit-bindgen etc.

Update to the latest wit-bindgen, wasm-tools, warg, and
wasi-preview1-component-adapter-provider crates.

In particular, this updates to the wasm-encoder that always uses a
leading zero byte, following WebAssembly/component-model#263.

Fixes #350.

* Update the tests for the new wit-component API.

* rustfmt

* When assigning a world a new name, update `pkg.worlds` too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants