Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving source files between folders in large repo (~8GB) causes Javascript heap out of memory error #2880

Closed
VivekMChawla opened this issue May 21, 2024 · 12 comments
Labels
bug Issue or pull request that identifies or fixes a bug

Comments

@VivekMChawla
Copy link

Originally posted by @ntaylorcertinia in #2682 (reply in thread)

Summary

We were testing the new Source Mobility (BETA) feature with one of our larger repos and got a FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory.

The repo we used is ~8GB in size and has enough metadata that it requires breaking the push into multiple stages to completely deploy all source in the project. The push in question, when run without Source Mobility enabled, was pushing ~1700 files.

Steps Followed:

  1. Installed the nightly CLI build
  2. Set SF_BETA_TRACK_FILE_MOVES=true
  3. Merged develop (our main source branch) into my local project.
    • This merge moved several types of metadata files into new folders (Apex Classes, LWCs, Custom Labels, and more)
  4. Attempt to push source

NOTE: The new source decomposition feature was not used during this test.

Expected Result

The source push succeeds within 5-10 minutes.

Actual Result

The CLI seemed to hang for 10-15 minutes, then terminated with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

We ran the command again with --max-old-space-size=8000 to give it a bit more memory and see if it could run. It ran for about an hour and then threw the same error.

Additional Tests

  • Manually moved a single labels file from one package directory to another and tried the push again to see if the issue stemmed from the size of the push or the size of the repo. Unfortunately, we got the same error.
  • Tested with a smaller repo (fewer files). This worked quite nicely so it seems that primarily it is just large repos that don't work currently.
    • Note that the push had a noticeable delay before it started, which also happened if a lot of files have changed.
    • It just looks like it is hanging as there is no visible feedback to the user.
    • It would be good to put a message like "Scanning changes" here to make it clear to the user something is happening.

Additional Information

<--- Last few GCs --->

[34733:0x140008000]   569555 ms: Scavenge 3989.5 (4132.9) -> 3986.5 (4132.9) MB, 29.42 / 0.00 ms  (average mu = 0.761, current mu = 0.643) allocation failure; 
[34733:0x140008000]   569760 ms: Scavenge 3990.8 (4132.9) -> 3988.8 (4132.9) MB, 39.04 / 0.00 ms  (average mu = 0.761, current mu = 0.643) allocation failure; 
[34733:0x140008000]   570021 ms: Scavenge 3994.0 (4132.9) -> 3991.3 (4148.9) MB, 71.96 / 0.00 ms  (average mu = 0.761, current mu = 0.643) allocation failure; 

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x104e5d0ac node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 2: 0x104fe2ff4 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 3: 0x1051b753c v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 4: 0x1051b6018 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 5: 0x1051ac830 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 6: 0x1051ad090 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 7: 0x1051919c4 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 8: 0x1051889cc v8::internal::FactoryBase<v8::internal::Factory>::NewConsString(v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::String>, int, bool, v8::internal::AllocationType) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
 9: 0x105188498 v8::internal::FactoryBase<v8::internal::Factory>::NewConsString(v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::String>, v8::internal::AllocationType) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
10: 0x1050054ac v8::String::Concat(v8::Isolate*, v8::Local<v8::String>, v8::Local<v8::String>) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
11: 0x104d87428 node::UVException(v8::Isolate*, int, char const*, char const*, char const*, char const*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
12: 0x104e62d34 node::fs::FSReqAfterScope::Reject(uv_fs_s*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
13: 0x104e642ec node::fs::AfterScanDir(uv_fs_s*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
14: 0x104e4ca9c node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
15: 0x10582a6f8 uv__work_done [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
16: 0x10582e148 uv__async_io [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
17: 0x105840220 uv__io_poll [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
18: 0x10582e70c uv_run [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
19: 0x104d816f0 node::SpinEventLoopInternal(node::Environment*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
20: 0x104e9c918 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
21: 0x104e9c698 node::NodeMainInstance::Run() [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
22: 0x104e1bd1c node::Start(int, char**) [/Users/nicholastaylor/.nvm/versions/node/v20.12.0/bin/node]
23: 0x1920a20e0 start [/usr/lib/dyld]
[1]    34733 abort      sfdx project deploy start -c
@VivekMChawla VivekMChawla added the investigating We're actively investigating this issue label May 21, 2024
Copy link

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

Copy link

Hello @VivekMChawla 👋 It looks like you didn't include the full Salesforce CLI version information in your issue.
Please provide the output of version --verbose --json for the CLI you're using (sf or sfdx).

A few more things to check:

  • Make sure you've provided detailed steps to reproduce your issue.
    • A repository that clearly demonstrates the bug is ideal.
  • Make sure you've installed the latest version of Salesforce CLI. (docs)
    • Better yet, try the rc or nightly versions. (docs)
  • Try running the doctor command to diagnose common issues.
  • Search GitHub for existing related issues.

Thank you!

@github-actions github-actions bot added more information required Issue requires more information or a response from the customer and removed investigating We're actively investigating this issue labels May 21, 2024
@ntaylorcertinia
Copy link

Did a bit of digging into what seems to be consuming the resources here and I suspect that it is the git.walk() found here https://github.com/forcedotcom/source-tracking/blob/bde6cfc2104b22bb8209a93a9dc7aa86af58e9f4/src/shared/localShadowRepo.ts#L399

This is recursively searching the entire repo including all of the non-package related code that is used for tooling, etc. If this could be pared down to only scan the package directories I suspect that would result in a significant reduction in time and heap usage.

@mshanemc mshanemc added the bug Issue or pull request that identifies or fixes a bug label May 23, 2024
Copy link

git2gus bot commented May 23, 2024

This issue has been linked to a new work item: W-15840593

@cristiand391 cristiand391 removed the more information required Issue requires more information or a response from the customer label May 30, 2024
@mshanemc
Copy link
Contributor

@ntaylorcertinia Can you try this with the new nightly release? I think it's fixed based on local test performance but I don't have your repo to confirm it.

@ntaylorcertinia
Copy link

@mshanemc I have given it a simple test moving one file and it looks promising, no heap issue and only a few second hang before it starts pushing. The feature itself seems to work as well.

Will give it more of a full test tomorrow

@ntaylorcertinia
Copy link

@mshanemc giving it a bit more of a test today still seem to be running into some issues, the heap problem seems fixed;

  1. Move 1 file and push - hangs for a few seconds the pushes successfully
  2. Move 1600 files and push - hangs for ~2 minutes then pushes successfully (this is where the lack of any logging in the terminal is noticed most as it seems to do nothing before the 2 minutes is up)
  3. Move 1 file, make a change in that file, and push - this has the same hang for a few seconds at the start, the push then begins gets to 2/2 files pushed (the 1 moved/changed class file and the .xml file for that) but then just hangs there forever never seeming to end. I tried this on 3 different orgs and all showed the same result

@mshanemc
Copy link
Contributor

  1. It's kinda tricky to emit terminal output from inside the library to the CLI plugin (because this same code might be running as part of VSCode extensions, in --json mode, etc). Not saying it can't be done, but just harder than it sounds. There's definitely logs happening in there, though.

  2. let me check that out. If the file contents don't match there shouldn't be anything else new happening--all this new logic should exit and move on to the deploy.

@ntaylorcertinia
Copy link

Had a bit more of a play for scenario 3 today to make sure it wasn't an org issue on Friday it is definitely a weird one. Left the deploy that wasn't finishing running async over the weekend and it never did finish. Cancelled it this morning and can't push to that org anymore so it is functionally broken as can't interact with it anymore. Tried out a few more combinations to try narrow down what might be happening, didn't get any further but it might help to list out exactly what I am doing

  1. Change a unit test file adding System.debug('Test');, and push. This works and the file pushes successfully.
  2. Move the unit test file from its current folder (called force-app/test/unit/classes) to another folder (force-app/test/unit-other/classes) and push. This works and doesn't actually push to the org as expected, though I do get Warning: Glob alValueSet, Start_End_Time__gvs, returned from org, but not found in the local project in the output and a progress bar which I think is related to Deploying project to a new scratch org generate warnings for GlobalValueSet and fail to track them #2870
  3. Remove the debug from the file and move it back to its original folder. The deploy never completes and the org is completely unable to be pushed to even if I cancel the deploy from the org. This also happens if the file is moved to another folder than the original one

Turning off the file move tracking and following the same steps

  1. Change a unit test file adding System.debug('Test');, and push. This works and the file pushes successfully.
  2. Move the unit test file from its current folder (called force-app/test/unit/classes) to another folder (force-app/test/unit-other/classes) and push. Deletes the file from the org as expected with the new feature turned off.
  3. Push again, puts the file back in the org
  4. Remove the debug and push again. Finishes successfully

@mshanemc
Copy link
Contributor

@ntaylorcertinia nightly is ready for you to try the "move, then edit" scenario.

Thanks for trying that...the original design for this on the Discussion (comparing hashes, etc) wasn't going to solve that, and expecting people to run deploy preview in between the move and the edit isn't a good experience.

@ntaylorcertinia
Copy link

Was on PTO last week so just got around to giving this a test, move and edit seems fixed now. And the performance seems a lot better than it was before as well which is great 🎉

@jshackell-sfdc
Copy link
Collaborator

This issue was fixed in version 2.46.6 (June 19, 2024).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue or pull request that identifies or fixes a bug
Projects
None yet
Development

No branches or pull requests

5 participants