Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sf data is unable to retrieve huge jobs #2799

Closed
mnunezdm opened this issue Apr 2, 2024 · 2 comments
Closed

sf data is unable to retrieve huge jobs #2799

mnunezdm opened this issue Apr 2, 2024 · 2 comments
Labels
investigating We're actively investigating this issue validated Version information for this issue has been validated

Comments

@mnunezdm
Copy link

mnunezdm commented Apr 2, 2024

Summary

When trying to retrieve A LOT of records using bulk data, the cli just dies. I think that the reason for this is that it relies entirely in loading the data in memory. In this cases, one solution could be to rely on the filesystem to store the data retrieved

Steps To Reproduce

Create a bulk job that retrieves millions of records (+5M) and a lot of fields
Try to download it using sf data resume

Expected result

Retrive all the data

Actual result

The process just dies

❯ sf data query resume --bulk-query-id 750Jz00000K1nZRIAZ --target-org rep-prod-plat --result-format csv > cases.csv
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 unpipe listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(Use node --trace-warnings ... to show where the warning was created)
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 finish listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit

System Information

zsh

{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.35.6",
  "nodeVersion": "node-v20.11.1",
  "osVersion": "Darwin 23.4.0",
  "rootPath": "/Users/miguelnunezdiaz-montes/.local/share/sf/client/2.35.6-3a4a215",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.0.13 (core)",
    "@oclif/plugin-commands 3.2.2 (core)",
    "@oclif/plugin-help 6.0.20 (core)",
    "@oclif/plugin-not-found 3.1.1 (core)",
    "@oclif/plugin-plugins 5.0.1 (core)",
    "@oclif/plugin-search 1.0.20 (core)",
    "@oclif/plugin-update 4.2.2 (core)",
    "@oclif/plugin-version 2.0.16 (core)",
    "@oclif/plugin-warn-if-update-available 3.0.15 (core)",
    "@oclif/plugin-which 3.1.7 (core)",
    "@salesforce/cli 2.35.6 (core)",
    "apex 3.1.0 (core)",
    "auth 3.5.0 (core)",
    "data 3.2.2 (core)",
    "deploy-retrieve 3.4.0 (core)",
    "info 3.1.0 (core)",
    "limits 3.2.0 (core)",
    "marketplace 1.1.0 (core)",
    "org 3.6.0 (core)",
    "packaging 2.2.0 (core)",
    "schema 3.2.0 (core)",
    "settings 2.1.0 (core)",
    "sobject 1.2.0 (core)",
    "source 3.2.0 (core)",
    "telemetry 3.1.17 (core)",
    "templates 56.1.0 (core)",
    "trust 3.4.0 (core)",
    "user 3.4.0 (core)"
  ]
}

Additional information

Doing various tests, I have found out that when requesting big chunks, after the query has finished processing, sometimes my org just stops sending the result, you can see this because the amount of data downloaded just freezes. This depends on the number of lines and fields you are retrieving, for us, normally 1 million with wich we havent found any problems

Just some numbers, recovering records 1M at a time, the process just takes a 1-2 minutes

@mnunezdm mnunezdm added the investigating We're actively investigating this issue label Apr 2, 2024
Copy link

github-actions bot commented Apr 2, 2024

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

@github-actions github-actions bot added the validated Version information for this issue has been validated label Apr 2, 2024
@cristiand391
Copy link
Member

Hey @mnunezdm, you are right, data commands using the bulk API keep records in memory while paginating.
Will close as dup of #1995

We fixed this in the library (jsforce v3) and Allan proposed a new command for this but we haven't got any update from our side unfortunately:
salesforcecli/plugin-data#527

Our new PM @VivekMChawla will be back next week, I'll forward this to him so we can prioritize the work.

@cristiand391 cristiand391 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating We're actively investigating this issue validated Version information for this issue has been validated
Projects
None yet
Development

No branches or pull requests

2 participants