-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 12053- CesiumJS should not use push.apply #12176
base: main
Are you sure you want to change the base?
Conversation
Thank you for the pull request, @Tim-Quattrochi! ✅ We can confirm we have a CLA on file for you. |
Thanks @Tim-Quattrochi! @javagl Are you available to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the "pattern" for changing push.apply
into a for-loop (mentioned in one inlined comment) :
One of the (faaar too many) quirks of JavaScript is: It is not possible to know what some piece of code is doing just by reading it. Never. Never ever. Period.
Many people might think that this code will print 42
:
function exampleFunction(exampleObject) {
exampleObject.exampleProperty = 42;
console.log(exampleObject.exampleProperty);
}
Maybe it does. Maybe it doesn't. It could also start playing a funny cat video and format your hard drive.
Specifcally, for the push.apply
case:
When calling push.apply
and passing in an array, then this array will be used directly.
When changing this to a for-loop, one has to be careful to also use the array directly (i.e. the minimize the "change in behavior" that could result from this refactoring).
Consider the following dummy example:
// A dummy class that has a "getSources" function that returns an array
function ExampleClass() {
this.sources = ["A", "B", "C"];
};
ExampleClass.prototype.getSources = function() {
console.log("getSources is called, and returns ", this.sources);
return this.sources;
}
// Use the getSources with push-apply
function exampleFunctionWithPushApply(exampleObject) {
const targets = [];
targets.push.apply(targets, exampleObject.getSources());
console.log("push.apply targets: ", targets);
}
// Use getSources with a for-loop in the wrong way
function exampleFunctionWithForLoopWrong(exampleObject) {
const targets = [];
for (let i=0; i<exampleObject.getSources().length; i++) {
targets.push(exampleObject.getSources()[i]);
}
console.log("for-loop A targets: ", targets);
}
// Use getSources with a for-loop in the right way
function exampleFunctionWithForLoopRight(exampleObject) {
const targets = [];
const sources = exampleObject.getSources();
for (let i=0; i<sources.length; i++) {
targets.push(sources[i]);
}
console.log("for-loop B targets: ", targets);
}
const exampleObject = new ExampleClass();
console.log("Running with push.apply");
exampleFunctionWithPushApply(exampleObject);
console.log("Running with for-loop (wrong)");
exampleFunctionWithForLoopWrong(exampleObject);
console.log("Running with for-loop (right)");
exampleFunctionWithForLoopRight(exampleObject);
It defines an object with some getSources
function that returns an array (and prints a message saying when getSources
is called).
This array is then used...
- once in
push.apply
- once in a for-loop (the wrong way)
- once in a for-loop (the right way)
The output when running this with node Example.js
will be
Running with push.apply
getSources is called, and returns [ 'A', 'B', 'C' ]
push.apply targets: [ 'A', 'B', 'C' ]
Running with for-loop (wrong)
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
getSources is called, and returns [ 'A', 'B', 'C' ]
for-loop A targets: [ 'A', 'B', 'C' ]
Running with for-loop (right)
getSources is called, and returns [ 'A', 'B', 'C' ]
for-loop B targets: [ 'A', 'B', 'C' ]
One can see that in the "wrong" case, getSources
is called 7 times. Specifically, in
for (let i=0; i<exampleObject.getSources().length; i++)
it is called for each loop iteration, once for the length check, and once for the access.
And since nobody knows what getSources
is doing, this might be arbitrarily expensive. Maybe in reality, it has to read these "sources" from the hard disc, or via some network request or whatnot. (Or it plays 7 cat videos, FWIW 🙂 )
Generally speaking: The arrays that had originally been passed to push.apply
as the last parameter should be "pulled out" into a local variable (as shown in exampleFunctionWithForLoopRight
).
One could even make a case to apply the pattern that is often used in the CesiumJS source code, for exactly this reason, namely, to even pull out the length
. So instead of
const sources = exampleObject.getSources();
for (let i=0; i<sources.length; i++) {
...
it could be
const sources = exampleObject.getSources();
const sourcesLength = sources.length;
for (let i=0; i<sourcesLength; i++) {
...
(but one has to be careful to not mix up any "lengths" there, when multiple arrays are involved in that context)
In some cases, it might be possible to pre-allocate space in the target array. Instead of
const sources = ...;
const n = sources.length;
const targets = [];
for (let i=0; i<n; i++) {
targets.push(doSomethingWith(sources[i]));
}
it could be
const sources = ...;
const n = sources.length;
const targets = Array(n);
for (let i=0; i<n; i++) {
targets[i] = doSomethingWith(sources[i]);
}
But... unless this is trivially possible, it may be a "premature optimization": It would be necessary to really carefully check that the size is known and fixed beforehand.
Not to distract from the main point here, but:
I have actually been changing this pattern… because:
Having said that, I agree that |
@jjhembd I agree that not storing the Reliable benchmarks can be difficult, and the JIT is just one very large black box. E.g. does it only store that EDIT: I already mentioned in another comment that we might consider updating the coding guide to no longer recommend the |
@javagl If we were to keep the current pattern, how would you ensure it stays DRY? My thought was to create a utility function or perhaps a method, but I wanted to get your opinion. Repeating the length variable and source for each conditional seems like an anti-pattern to me. |
From quickly scrolling over the changes, it seems like the inlined comments have been addressed. I'll do a (probably final) review pass early tomorrow (and just for completeness, try out the "artificial" model that I created to provoke the error). About storing the @jjhembd Do you think that new/modified code should consistently use the pattern without storing the length? The strong(est) case that I could make against storing the length is when there are "many" arrays and loops in one method, and one has to distinguish the names (i.e. when it cannot be It would be great if these considerations could be obsolete: I think that the |
@javagl Yes it was, sorry I '@' the wrong person. I was just referring to the fact that I have to repeat myself a lot. It was a general best practice question, not necessarily something I am introducing. For example, I have to create a variable to point at the source, create a target, create a variable to hold the length of the source. This can happen a lot in a method that has more conditions. An example is here: packages/engine/Source/Scene/Cesium3DTileBatchTable.js lines 375, 376, then again 382, 387. Would it be good practice to create a function to:
Again, this is just a general question and I am not suggesting Cesium do this.
I don't believe I refactored those after changes requested. I can have them changed and pushed to this PR before you do a review, and yes I will absolutely try out the artificial model that you created. |
I misunderstood that first. I thought that you referred to a function that can "always" replace such a for-loop, in the most generic form. But now I see that you only referred to the And indeed, such a function could make sense, and it could be pretty straightforward, like
This would turn a code block like the one that you linked to from this
into this
One could even pre-allocate the space in the target array, to avoid repeated allocations/resize operations. Now... I'm a bit torn:
But even at the risk of making most of the changes here obsolete, I'd vote for introducing such a function, because it could tremendously increase readability and maintanability (and performance), with little effort and bascially zero risk. I just threw together some ""benchmark"", to compare a few things here. (I'm not a "JavaScript performance testing expert", so this has to be taken with a huge grain of salt...). The benchmark creates arrays of different sizes, repeatedly puts the contents of these source arrays into target arrays, and prints the timing information. The approaches that are compared here are
(Some results are given below) The code: function runSingleTest(sourceSize, testFunction) {
const sourceArray = Array(sourceSize);
const targetArray = [];
const startMs = performance.now();
try {
testFunction(sourceArray, targetArray);
} catch (errror) {
return undefined;
}
const endMs = performance.now();
const durationMs = endMs - startMs;
return durationMs;
}
function runTest(runs, sourceSize, name, testFunction) {
let sumMs = 0.0;
for (r = 0; r < runs; r++) {
const ms = runSingleTest(sourceSize, testFunction);
if (ms === undefined) {
console.log(`sourceSize ${sourceSize}, name ${name}, ERROR`);
return;
}
sumMs += ms;
}
const averageMs = sumMs / runs;
console.log(`sourceSize ${sourceSize}, name ${name}, averageMs ${averageMs}`);
}
function usePushApply(sourceArray, targetArray) {
targetArray.push.apply(targetArray, sourceArray);
}
function useSpread(sourceArray, targetArray) {
targetArray.push(...sourceArray);
}
function useForEachLoop(sourceArray, targetArray) {
for (const element of sourceArray) {
targetArray.push(element);
}
}
function useForLoop(sourceArray, targetArray) {
for (let i = 0; i < sourceArray.length; i++) {
targetArray.push(sourceArray[i]);
}
}
function useForLoopPreallocated(sourceArray, targetArray) {
const t = targetArray.length;
targetArray.length += sourceArray.length;
for (let i = 0; i < sourceArray.length; i++) {
targetArray[t + i] = sourceArray[i];
}
}
function test() {
const runs = 100;
for (let n = 100; n <= 1000000; n *= 10) {
runTest(runs, n, "pushApply", usePushApply);
runTest(runs, n, "spread", useSpread);
runTest(runs, n, "forEach", useForEachLoop);
runTest(runs, n, "for", useForLoop);
runTest(runs, n, "forAlloc", useForLoopPreallocated);
}
}
try {
test();
} catch (error) {
console.log(error);
} Some results (a summary will be given below): On NodeJS:
On Chrome in jsfiddle
On Firefox in jsfiddle:
For small arrays, it does not seem to matter in terms of performance. For arrays of size >~150000, the For large arrays, the index-based The "index-based- tl;dr (too late for that, I know 🙂 ) I'd second your suggestion to introduce a function like this function pushAll(source, target) {
const t = target.length;
target.length = t + source.length;
for (let i=0; i<source.length; i++){
target[t + i] = source[i];
}
} That can be used in all places where the contents of one array has to be added to another (i.e. at least all places that are currently affected by this PR - and probably some more, where this already was done with a manual, inlined To be confirmed by the CesiumJS core squad. |
I could have emailed you about it, I was genuinely curious about your opinion on that. Not only for this project, but for my own development structure. I appreciate your thorough responses. |
As long as there's not an existing method which does the equivalent (it sounds like we want the |
I always feel like a complete noob when I have to do searches that lead to things like https://stackoverflow.com/questions/4156101/copy-array-items-into-another-array . But ... inside that usual stack overflow noise, there are some hints that seem to confirm that So unless someone disagrees, there will probably be some function for this. This would reside on the same level as We could now talk about details.... consider this as brainstorming:
|
I've only been half following this thread so if I missed the answer already then ignore me.
Is there a reason we can't replace arr1.push.apply(arr1, arr2);
// could just be
arr1 = arr1.concat(arr2); This probably wouldn't work in a helper function but I don't think it would even need the helper function if we went this route. With a very naive test this is faster than even the pre-allocated solution above, it just requires a slight rework to how we're using these variables and maybe converting some from
|
@jjspace I'm kind of surprised that Somehow related to the
But I didn't propose this, because it would raise several tricky follow-up questions. First of all: There certainly are several places where But as a general recommendation, the concern that I have with Imagine that there is some code like this (dummy/pseudocode):
(where So far, so good. It adds a bunch of sources to the Now,
We can talk about details of the inner part. And great care is necessary to not accidentally omit that last line. But even then: You simply never know whether at some point in time, some part of the code, for some reason did And this array will then never be updated. So my two cents:
@Tim-Quattrochi Sorry, this discussion may have detailed a bit 😁 What started as an "uncontroversial, pattern-based boilerplate code change" now touches some of the guts of the language and library. But I think that the potential improvement is worth these considerations. |
My biggest concern about always using |
It's not entirely clear how to proceed here. This is a PR that affects "many" files, and there has been a linting update in the meanime, which affects ... "all files", basically - meaning that there are a bunch of conflicts. There are instructions for updating such a diverged branch. But I think that it might be easier to...
But that's only a gut feeling right now. @Tim-Quattrochi if you instead want to update the branch with the linter changes and extend this PR, that should also be possible. |
Can you elaborate on what you mean by linter changes? |
CesiumJS uses prettier for automated code formatting. Before committing something, running Now, there are two options:
Option 1. requires a bit of twiddling with |
Hi @Tim-Quattrochi and @javagl, what is remaining in this PR? Is it only to merge in main and account for prettier? |
@ggetz The last thing here was that we considered adding an
For 1, dealing with prettier and conflicts can be a pain in the back, so I think that 2. should be easier. The decision is left to @Tim-Quattrochi for now (but if Tim does not have the bandwidth to do either, then I'd put 2. on my TODO list - I think that the suggestion by Tim (to introduce such a function to foster DRY) is a good one, and shouldn't be sooo much effort), |
Hi sorry, I need to lint and possibly a helper. I put it on draft because I was dealing with hurricane Milton. |
Understandable! Hope you are safe! @javagl If you do have the bandwidth to help out here, that would be appreciated. |
@Tim-Quattrochi We can set up a call whenever it suits you. (EDIT: Unless we can sort out the questions via comments here, that's also fine...) |
Description
Change
push.apply
to use for loops to avoid exceeding the maximum call stack.Issue number and link
#12053
Testing plan
Author checklist
CONTRIBUTORS.md
CHANGES.md
with a short summary of my change