Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: ensure successful dynamic import returns the same result #46662

Merged
merged 13 commits into from
Jul 26, 2023

Conversation

aduh95
Copy link
Contributor

@aduh95 aduh95 commented Feb 14, 2023

According to https://tc39.es/ecma262/#sec-HostLoadImportedModule, if import() is called multiple times with the same (referrer, specifier) pair, then it must fulfil return the same object (or reject).
https://tc39.es/proposal-import-attributes/#sec-HostLoadImportedModule takes it further by adding assertions attributes to the mix.

This PR adds a cache layer for dynamic imports to ensure Node.js meets the spec.
EDIT: I should clarify here that the cache layer is also populated by static imports

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Feb 14, 2023
@aduh95 aduh95 force-pushed the dynamic-import-complications branch from 8ec5944 to 9bf820c Compare April 16, 2023 10:08
Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two issues with this:

  1. It's unclear what problem this solves. Is there a bug report? A spec violation that can be demonstrated in a repro?
  2. We already have a module cache: ModuleMap. It feels wrong to create a second one. If the first one behaves incorrectly, it should be fixed or replaced. Having two seems likely to cause problems.

@devsnek
Copy link
Member

devsnek commented Apr 17, 2023

Node's module map should be a stricter version of the one specified in ecma262, specifically because we resolve the canonical path for a module, which means it isn't just stable within each file, but across the entire graph of files.

@aduh95 aduh95 mentioned this pull request Jun 2, 2023
4 tasks
@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

  1. Is there […] a spec violation that can be demonstrated in a repro?

Yes, see the test I've added

2. We already have a module cache: ModuleMap. It feels wrong to create a second one. If the first one behaves incorrectly, it should be fixed or replaced

Having two seems necessary, there are simply two different things, the first one is behaving correctly AFAICT.

@jkrems
Copy link
Contributor

jkrems commented Jun 2, 2023

According to https://tc39.es/ecma262/#sec-HostLoadImportedModule, if import() is called multiple times with the same (referrer, specifier) pair, then it must fulfil return the same object (or reject).

Isn't the same thing true for import('x') and import 'x' from the same referrer? Why do we need a secondary cache for dynamic import() specifically unless that invariant is broken right now?

@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

Isn't the same thing true for import('x') and import 'x' from the same referrer?

The same thing is true yes, this PR ensures that both static and dynamic imports return the same thing.

@GeoffreyBooth
Copy link
Member

Migrating #46826 (comment) over here:

const importVal1 = await import('module-name');

setTimeout(() => {
  const importVal2 = await import('module-name');

  console.log(importVal1 === importVal2);
}, 999);

The ECMAScript spec enforces that the above code always logs true

A simplified version of the above is this:

import('data:text/javascript,export default 333').then(async first => {
  const second = await import('data:text/javascript,export default 333')
  console.log(first === second)
})

In Node 20 for me, this logs true. If you replace the data URIs with ./file.mjs and create a file.mjs containing export default 333, it still returns true. So it would seem that we’re already spec compliant, at least for the common use case where there are no I/O issues.

I looked at the test in this PR, and it involves filesystem mutations. Please correct me if I’m misreading this, but it seems like what you’re testing is an initial import() succeeding, then deleting the imported file, and the second import currently fails because the file is missing but it should have succeeded because the initial success should have been cached? Is that the intent of the test, and the change that this PR is intending to make?

@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

Please correct me if I’m misreading this, but it seems like what you’re testing is an initial import() succeeding, then deleting the imported file, and the second import currently fails because the file is missing but it should have succeeded because the initial success should have been cached? Is that the intent of the test, and the change that this PR is intending to make?

That's correct.

@jkrems
Copy link
Contributor

jkrems commented Jun 2, 2023

The same thing is true yes, this PR ensures that both static and dynamic imports returns the same thing.

Why isn't the ModuleMap cache ensuring that? If it doesn't, we should delete it so we have one cache that works instead of one that works and one that doesn't. Is the problem that the ModuleMap doesn't contain entries for failed loads? If so, can we add that to the module map cache?

@GeoffreyBooth
Copy link
Member

Is the problem that the ModuleMap doesn’t contain entries for failed loads? If so, can we add that to the module map cache?

Per the spec we’re not supposed to caching failures, only successes. So if you tried to import a nonexistent file, then created the file and tried again to import it, the second attempt should succeed. https://tc39.es/ecma262/#sec-HostLoadImportedModule (emphasis added):

If this operation is called multiple times with the same (referrer, specifier) pair and it performs FinishLoadingImportedModule(referrer, specifier, payload, result) where result is a normal completion, then it must perform FinishLoadingImportedModule(referrer, specifier, payload, result) with the same result each time.

But I also feel like this should be achievable with our current ModuleMap cache without needing to create a new cache. Is the issue that we’re checking the filesystem for every import before looking to see if a module is already in our cache?

@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

But I also feel like this should be achievable with our current ModuleMap cache without needing to create a new cache. Is the issue that we’re checking the filesystem for every import before looking to see if a module is already in our cache?

Yes that's the issue, defaultResolve is making fs operation even if a previous resolution has succeeded in the same module. I don't think we have another option than caching all the successful resolve operations.

@GeoffreyBooth
Copy link
Member

Yes that’s the issue, defaultResolve is making fs operation even if a previous resolution has succeeded in the same module.

Where is defaultResolve making filesystem calls? I read through it and I’m not seeing them.

Perhaps it was doing so before #47824?

@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

Of course it's doing fs operations, it needs to in order to resolve symlinks and node_modules deps.

Here's a few examples, defaultResolve is calling moduleResolve here:

url = moduleResolve(
specifier,
parentURL,
conditions,
isMain ? preserveSymlinksMain : preserveSymlinks,
);

Which itself is calling packageResolve and finalizeResolution:

resolved = packageResolve(specifier, base, conditions);

return finalizeResolution(resolved, base, preserveSymlinks);

And here where they make the FS call:

const stats = internalModuleStat(toNamespacedPath(StringPrototypeEndsWith(path, '/') ?
StringPrototypeSlice(path, -1) : path));

const stat = internalModuleStat(toNamespacedPath(StringPrototypeSlice(packageJSONPath, 0,
packageJSONPath.length - 13)));

@GeoffreyBooth
Copy link
Member

Within the error path of one of these failed filesystem calls, could we look in ModuleMap to see if the requested module was cached? That way we would avoid both creating a new cache and slowing down resolution in the default case where the filesystem hasn’t changed under our feet.

I’m also not convinced that this is a real problem. What is the use case where import calls need to continue working correctly after files were deleted from disk? Even if that’s technically the correct behavior per spec, it’s such an extreme edge case that it doesn’t feel like something we should add a performance regression to achieve.

@bmeck
Copy link
Member

bmeck commented Jun 2, 2023 via email

@GeoffreyBooth
Copy link
Member

Beware. Read up on why the web does cache errors.

I think what we’re discussing is the opposite though, caching the success so that subsequent imports continue to succeed even if the referenced files have been deleted.

@aduh95
Copy link
Contributor Author

aduh95 commented Jun 2, 2023

I’m also not convinced that this is a real problem. What is the use case where import calls need to continue working correctly after files were deleted from disk?

I think that's off topic: if you don't like the spec, you should take it to TC39, but until then it seems pointless to discuss this in this PR.

Within the error path of one of these failed filesystem calls, could we look in ModuleMap to see if the requested module was cached?

I think it's still a problem if the resolution ends up on a different symlink (so we never end up on the error path yet we are breaking the spec).

Read up on why the web does cache errors

Caching the errors would simplify the implementation, it's certainly something we can think about.

@devsnek
Copy link
Member

devsnek commented Jun 2, 2023

I haven't read the code of this PR yet but I'm generally in favor of the behavior the spec requires here. We spent a great deal of time coming to the exact semantics we wanted to preserve and anyone who feels differently may want to review our prior discussions on this (you can check the dynamic import proposal repo and tc39 notes from that time) before pushing too hard.

@aduh95 aduh95 force-pushed the dynamic-import-complications branch 2 times, most recently from cc1b1b8 to 6fbabe9 Compare June 10, 2023 17:30
Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aduh95 and I discussed this. We came to the conclusion that as far as we were aware, our ESM loader implementation was caching the results of the entire module resolution and loading process but not the individual results of the resolve and load steps. This is a problem for the spec compliance issue that Antoine pointed out, where a resolution success needs to be cached even if subsequent attempts at resolving the same input fail; without a resolution cache, we have no way of skipping repeated attempts at resolving the same input and therefore adhering to the spec.

I suggested therefore creating two caches: a “resolve cache” and a “load cache,” to correspond with the two phases/hooks of module resolution and loading. This should both fix the spec compliance issue as well as improve performance whenever an application imports the same thing twice (which is probably rare, as all of the input would need to be identical including parent URL, and it’s uncommon to have two import statements in the same file for the same thing). This PR creates the new “resolve cache” and renames the former “module map” to be the “load cache.”

One thing that bothered me about the former module map, and is still problematic in this version, is that the values of the module map/load cache are sometimes promises, sometimes “module jobs,” etc rather than a one-to-one mapping of the output of the load hook. And in this PR, the value of the resolve cache is the same promise or module job as the value of the module map/load cache. I would think that the value of the resolve cache should be whatever gets returned by resolve: the absolute URL of the module and its assertions/attributes. Then that value would be the key for the load cache.

I see the sense in making the value of the resolve cache the same as the value of the load cache, to skip the second lookup and therefore save a tiny bit of performance. It just breaks the mental model of the purposes of these two caches, if the resolve cache isn’t really caching the results of the resolve step. And the load cache isn’t really caching the result of the load step either, as a module job is a much larger object than simply the module source that gets returned from load.

Is what we really want here a single big cache where module jobs (or the promises that will resolve into module jobs) can be looked up by two keys, either the input for resolve or the input for load? Is that the most performant data structure for what we’re trying to achieve here?

For me what makes the most sense mentally in terms of my ease in following the logic would be something like this:

  • A “resolve cache” where the keys are the serialized inputs to resolve and the values are the successful results of those resolve calls (and failures aren’t cached)
  • A “load cache” where the keys are the serialized inputs to load and the values are the successful results of those load calls (and failures aren’t cached)

And do we even need a data structure to hold the pending promises of the “module jobs,” the process of loading the module sources from disk or network? I suppose we probably do, to avoid parallel load calls trying to read the same file from disk unnecessarily, which would mean yet a third cache where the keys are the same as those of the load cache and the values are the promises, or a boolean to tell you that it’s still pending, or something.

Just writing this all out it makes me feel like creating all these extra data structures would be bad for performance, so I’m not trying to say that I’m advocating for this architecture; just that it feels like what would make the most sense in terms of me being able to follow the logic as a reader. Is there some version of refactoring the module map that we have, with creating the resolution cache, that comes a bit closer to this mental model?

Comment on lines 75 to 76
const promises = this.module.link(async (specifier, attributes) => {
const job = this.loader.getModuleJob(specifier, url, attributes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should convert assertions to attributes in its own PR, and do all the references at once, rather than piecemeal as part of changing the modified lines of this PR. The conversion PR can also address whatever tests need to change as a result of the new proposal.

lib/internal/modules/esm/module_map.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/module_map.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/module_map.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/module_map.js Outdated Show resolved Hide resolved
@@ -8,22 +8,22 @@ import { describe, it } from 'node:test';
describe('ESM: ensure initialization happens only once', { concurrency: true }, () => {
it(async () => {
const { code, stderr, stdout } = await spawnPromisified(execPath, [
'--experimental-import-meta-resolve',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added? I thought we removed this flag, or were going to do so immediately after landing off-thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was forgotten when the test was added, runmain.js use import.meta.resolve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so this is just an unrelated change? Should this be in its own PR?

*/
assert.strictEqual(resolveHookRunCount, 2);
assert.strictEqual(stdout.match(/resolve passthru/g)?.length, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a problem? Resolutions are happening twice as many times now as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments above, it's related to import.meta.resolve being called twice in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you didn’t change this test, just the assertions for it to pass. Doesn’t that mean that the changes in this PR break the previous test? Why should this test need to change based on cache refactoring?

Or was the test just buggy before because it was missing the required --experimental-import-meta-resolve flag, and now that it has the flag it should’ve had all along, the assertions need to change accordingly? If so, this feels like an unrelated fix that should be its own PR.

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
@GeoffreyBooth GeoffreyBooth self-requested a review June 11, 2023 01:47
@GeoffreyBooth GeoffreyBooth dismissed their stale review June 11, 2023 01:49

replaced with newer review

@GeoffreyBooth GeoffreyBooth added the module Issues and PRs related to the module subsystem. label Jun 11, 2023
@aduh95 aduh95 added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Jul 26, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jul 26, 2023
@nodejs-github-bot nodejs-github-bot merged commit 053511f into nodejs:main Jul 26, 2023
@nodejs-github-bot
Copy link
Collaborator

Landed in 053511f

@aduh95 aduh95 deleted the dynamic-import-complications branch July 26, 2023 21:28
Ceres6 pushed a commit to Ceres6/node that referenced this pull request Jul 27, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
rluvaton pushed a commit to rluvaton/node that referenced this pull request Jul 28, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
pluris pushed a commit to pluris/node that referenced this pull request Aug 6, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
pluris pushed a commit to pluris/node that referenced this pull request Aug 7, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this pull request Aug 14, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this pull request Aug 14, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
UlisesGascon pushed a commit to UlisesGascon/node that referenced this pull request Aug 14, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
RafaelGSS pushed a commit that referenced this pull request Aug 15, 2023
PR-URL: #46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
@UlisesGascon UlisesGascon mentioned this pull request Aug 15, 2023
RafaelGSS pushed a commit that referenced this pull request Aug 17, 2023
PR-URL: #46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
targos pushed a commit to targos/node that referenced this pull request Nov 11, 2023
PR-URL: nodejs#46662
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
@targos targos added the backported-to-v18.x PRs backported to the v18.x-staging branch. label Nov 23, 2023
targos pushed a commit that referenced this pull request Nov 23, 2023
PR-URL: #46662
Backport-PR-URL: #50669
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
PR-URL: nodejs/node#46662
Backport-PR-URL: nodejs/node#50669
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
sercher added a commit to sercher/graaljs that referenced this pull request Apr 25, 2024
PR-URL: nodejs/node#46662
Backport-PR-URL: nodejs/node#50669
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backported-to-v18.x PRs backported to the v18.x-staging branch. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. esm Issues and PRs related to the ECMAScript Modules implementation. lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants