Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance compared to non-GC structs #51

Open
Jamesernator opened this issue Oct 12, 2024 · 2 comments
Open

Performance compared to non-GC structs #51

Jamesernator opened this issue Oct 12, 2024 · 2 comments

Comments

@Jamesernator
Copy link

So the last couple days I wrote some JS to decode a binary format, to break things down I decided to use a pattern of having individual functions return a { bytesRead: number, value: T } object. When running this I noticed that the majority (80%) of the runtime was spent in the garbage collector for these objects.

I confirmed that these specific allocations were the cause by eliminating some of the loops that used these objects (GC time dropped to 40% of time), however I'm less happy with the resulting pattern used in the code to avoid allocations.

A question I have is, will engines be able to optimize structs to have at least comparable performance to non-GC-ed structs (e.g. multi-value) in some at least some cases (for example in cases where such structs can be detected to be used linearly)?

@acutmore
Copy link

I suspect if engines could optimise out the allocation for a struct being used to return multiple values that are immediately consumed by the caller, then they would also be able to do that for a plain object. Perhaps the fixed-layout aspects of structs make this optimisation slightly easier, but I would imagine the escape analysis would be similar.

@syg
Copy link
Collaborator

syg commented Oct 18, 2024

I'm skeptical it's feasible to do escape analysis to do scalar replacement and stack allocation in anything lower than the optimizing JIT tier. At non-optimizing tiers, without type feedback, you'd have to do it optimistically and also compile in a slow path that can work at runtime, which means this optimization doesn't always pay for itself, which asks the question how do you tune it?

Concretely: your example involves a utility function returning object literals, which means that to stack allocate it, that utility function itself needs to be compiled differently (either to return scalar components via registers, or directly write them into some stack slots in the caller). But to do that, we'd need to know all possible callers of that utility functions use the return object in a non-escaping way (it's hard/intractable for a VM to know this, given JS's dynamism). If we don't, we'd need to build in a dynamic deopt path. On the caller side, every call of this function now needs to know that it can either do some stack-allocation thing, or deopt and return a heap object, so there's complexity on the caller side too.

At the optimizing tier, as @acutmore says the fixed-layout aspect of it at least tells you how much stack space you need. But my hunch is at that tier, if we have done enough analysis to be able to perform scalar replacement, the compiler can also treat non-fixed-layout objects as fixed layout if they happen to be used locally as fixed-layout objects anyway.

Concretely: in the optimizing tier, we'd do escape analysis after inlining. So for your example with the utility function, the problem becomes more tractable: once that utility function is inlined, it's an intra-procedural instead of inter-procedural optimization.

IOW, I don't think structs really help much with the escape analysis problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants