-
Notifications
You must be signed in to change notification settings - Fork 148
Implement emscripten libc environment #163
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #163 +/- ##
==========================================
- Coverage 69.54% 69.51% -0.03%
==========================================
Files 43 43
Lines 5007 5009 +2
==========================================
Hits 3482 3482
- Misses 1231 1233 +2
Partials 294 294
Continue to review full report at Codecov.
|
@@ -0,0 +1,114 @@ | |||
package emlibc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be internal? I imagine some embedded users would want to call GetEnv()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was trying to decide how to handle that so i put it in there as more of an afterthought.
I may be overthinking it but...
should we be laying the ground work for supporting multiple 'environments' (EM, WASI, the other 15 competing specifications that are sure to come)
Should they be built in like i've started, or should they themselves be external wasm files that use a common libc like api that we develop internally.
So at the moment I'm implementing it as a ResolveFunc.
I was thinking about making the ReadModule signature variadic
ReadModule(r io.Reader, resolvePath ...ResolveFunc) (*Module, error)
but i'm not sure if thats the correct way to go about it so that someone could call
wasm.ReadModule(buf,EMLibc,WASI,FileImporter,WAPM)
and we go through each in turn to try and resolve
some of that can be figured out later...but I was questioning if I was hooking in at the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm, good point ... We should have a think about this.
I think the 'resolver' method is probably the right way to go. I'm not a fan of inventing our own internal API, as thats another abstraction layer to maintain and might affect performance depending on implementation.
puts := func(proc *exec.Process, v int32) int32 { | ||
|
||
buf := []byte{} | ||
temp := make([]byte, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try var temp [1]byte
and reference it like temp[:]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i like that...I couldn't decide if it would be better if we had more access to the underlaying []byte, I guess at some point during this process maybe we need to implement a couple other methods on exec.Process for memory management, that may be where we can implement things like alloc/free etc. and could also have a method that returns a reader so we could use bufio.Readers for some of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that may be where we can implement things like alloc/free etc
Hmmmm. How do things like rust/Go handle this? Does everyone who compiles wasm ship their own malloc/free implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats been one of the most confusing things about this as I have been learning WASM. From what I can gather it currently ships inside the glue code from compiling. So its actually provided by the compiler 'runtime'...like emscripten or LLVM.
Here is some of the code out of the emscripten js file that provides the environment to run the wasm
var _free = Module["_free"] = function() {
return Module["asm"]["_free"].apply(null, arguments)
};
var _main = Module["_main"] = function() {
return Module["asm"]["_main"].apply(null, arguments)
};
var _malloc = Module["_malloc"] = function() {
return Module["asm"]["_malloc"].apply(null, arguments)
};
var _memcpy = Module["_memcpy"] = function() {
return Module["asm"]["_memcpy"].apply(null, arguments)
};
var _memset = Module["_memset"] = function() {
return Module["asm"]["_memset"].apply(null, arguments)
};
and then its called from the wasm
call $_printf
i32.const 4
call $_malloc
local.set 4
local.get 1
local.get 4
i32.store
local.get 1
Really seems like it should have been part of the MVP spec to me...that seems pretty basic, but it appears to me that is how its done...
But its an area of WASM i'm still trying to learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've learned the most about wasm from playing around with https://github.com/intel/wasm-micro-runtime
They support building with emscripten and clang. Its a pretty clean implementation but targeted more at resource constrained environments.
But to support emscripten (and llvm) they have a libc wrapper and it is the most concise place i've found for figuring out what calls are needed to support the compiler runtimes.
In the following code you can see their "env" implementation.
It covers the basics for libc calls from emscripten or llvm...and from most of the code i've tried against it works without issue. You can clearly see malloc and free operating inside the memory buffer. So thats what i'm basing my info off of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWICT, wasm is intended just to implement a very minimal 'CPU', and design decisions like memory allocation are to be handled by the calling code. The only memory management features in the wasm specification are these two opcodes:
- memory.grow - increase the size of the memory buffer.
- memory.current - return the current size of the memory buffer.
In all the wasm I've seen, malloc/free are all implemented in wasm shipped by the application.
Do any other wasm interpreters provide a 'libc' layer like this, or is the libc layer always shipped with the application code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what you say from the spec
yes wasmer appears to provide a 'emscripten compatibility layer' and wasm-micro-runtime does as well.
I think it is kind of a grey area at the moment but the problem I see it that since the linear memory buffer is used from both the host and the wasm module then someone has to be authoritative
i.e. if I call into a wasm module from the host and want to pass in a string, I allocate it in the buffer and send a pointer.
If they reply with another string they basically do the same.
So do we assume the linear memory is stateless? So in each iteration the current 'owner' has full use of the buffer...if not someone has to manage the memory. I think that since GC is planned in the post MVP then I think the responsibility for memory management is best handled in the runtime host.
Again these are my very very unqualified opinions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to point out, if you build for a browser then the libc code is in the .js glue file generated by emscripten, so the browser does not directly provide it.
but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.
that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally found the wasmer code...I remembered seeing it but it took me a minute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.
that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that
Lets do that - (as in, lets do what wasmer and the other native runtimes are doing, and provide an identical API).
@@ -144,7 +144,8 @@ func (m *Module) ExecInitExpr(expr []byte) (interface{}, error) { | |||
if globalVar == nil { | |||
return nil, InvalidGlobalIndexError(index) | |||
} | |||
lastVal = globalVar.Type.Type | |||
return m.ExecInitExpr(globalVar.Init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this give us an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could...it happened to me while playing with it...could try and detect it. Do we need this here...I couldn't get it to work before, and if you just return nil,nil without an error it panics in Module.populateLinearMemory() if we really do want to return nil (as it did before) we can either specify an error to return if the stack is empty or deal with a possible null in the error building
The problem comes from reflect.TypeOf(val).Kind() when val is nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could modify ExecInitExpr
(or make a new unexported one) to take an argument, which represents the current context of execution. When we recurse into evaluating the global init expression, we can call it with the global as an argument. That way, we can check before we enter an infinite loop by seeing if we are part of the same function.
But this is a NP problem, we could probably not address this.
@@ -95,6 +97,9 @@ func run(w io.Writer, fname string, verify bool) { | |||
} | |||
|
|||
func importer(name string) (*wasm.Module, error) { | |||
if name == "env" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 'env' import reserved?
Could someone legitimately create a wasm file named 'env' and we break them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not found it in the WASM spec...it looks to me like its just a 'convention' used internally by emscripten, and possible adopted by LLVM in their code (I'll research more).
That is why it has to be optional
The above was a temporary for the POC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha :) plz move it to a flag or something before we merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I'm sorry to be clear this pull request was meant to start the conversation not be syntactically correct...I'll work on a better implementation now that we have nailed down a few things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh, my bad, I didnt realise :O
In that case LGTM.
As i've been working through this I decided to try going a different route to learn how the emscripten internals work so I created a different project at github.com/sampaioletti/wagoja it basically uses goja to create a node like environment and ties it back to wagon so that the emscripten generated scripts will work I was able to get it working over the weekend with a basic example. It was a nightmare (: but it is working for that limited case shown in the example folder and it definitely helped me understand what a emscripten libc implementation will need to look like. That repo relies on another branch in my sampaioletti/wagon fork called 'wagoja' I made the changes required to make this work and started working on a few of the other things we've been discussing in there (like the module builder). I'm going to play with wagoja (sorry i hate naming projects) and begin to start replacing it with functionality implemented in wagon. Feel free to poke around if your interested...mostly hack work..but it took a lot of playing to get it to function correctly. |
Been playing with adding in some of the libc functions that compiling with emscripten requires, here is a working POC for discussion. Not pretty but I was trying to see how difficult it would be.
I have a few questions:
Not sure if i'm understanding how the init_expr should work. To get the code to work I had to modify the getGlobal case
https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/init_expr.go#L138
I have created a Global for __memory_base
https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/internal/emlibc/resolver.go#L61
and it is used from here.
https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/internal/emlibc/test/puts.wast#L4
without the modification the inti_expr stack is empty so it returns nil,nil which causes a panic
https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/init_expr.go#L157
Also a quick look at my internal/emlibc/resolver.go would be appreciated...just to see if i'm missing any important concepts..I'm relatively new to WASM so I'm having to learn as I go.
Thanks for the input.