sf data is unable to retrieve huge jobs #2799

mnunezdm · 2024-04-02T18:33:18Z

Summary

When trying to retrieve A LOT of records using bulk data, the cli just dies. I think that the reason for this is that it relies entirely in loading the data in memory. In this cases, one solution could be to rely on the filesystem to store the data retrieved

Steps To Reproduce

Create a bulk job that retrieves millions of records (+5M) and a lot of fields
Try to download it using sf data resume

Expected result

Retrive all the data

Actual result

The process just dies

❯ sf data query resume --bulk-query-id 750Jz00000K1nZRIAZ --target-org rep-prod-plat --result-format csv > cases.csv
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 unpipe listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(Use node --trace-warnings ... to show where the warning was created)
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 finish listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
(node:37340) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit

System Information

zsh

{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.35.6",
  "nodeVersion": "node-v20.11.1",
  "osVersion": "Darwin 23.4.0",
  "rootPath": "/Users/miguelnunezdiaz-montes/.local/share/sf/client/2.35.6-3a4a215",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.0.13 (core)",
    "@oclif/plugin-commands 3.2.2 (core)",
    "@oclif/plugin-help 6.0.20 (core)",
    "@oclif/plugin-not-found 3.1.1 (core)",
    "@oclif/plugin-plugins 5.0.1 (core)",
    "@oclif/plugin-search 1.0.20 (core)",
    "@oclif/plugin-update 4.2.2 (core)",
    "@oclif/plugin-version 2.0.16 (core)",
    "@oclif/plugin-warn-if-update-available 3.0.15 (core)",
    "@oclif/plugin-which 3.1.7 (core)",
    "@salesforce/cli 2.35.6 (core)",
    "apex 3.1.0 (core)",
    "auth 3.5.0 (core)",
    "data 3.2.2 (core)",
    "deploy-retrieve 3.4.0 (core)",
    "info 3.1.0 (core)",
    "limits 3.2.0 (core)",
    "marketplace 1.1.0 (core)",
    "org 3.6.0 (core)",
    "packaging 2.2.0 (core)",
    "schema 3.2.0 (core)",
    "settings 2.1.0 (core)",
    "sobject 1.2.0 (core)",
    "source 3.2.0 (core)",
    "telemetry 3.1.17 (core)",
    "templates 56.1.0 (core)",
    "trust 3.4.0 (core)",
    "user 3.4.0 (core)"
  ]
}

Additional information

Doing various tests, I have found out that when requesting big chunks, after the query has finished processing, sometimes my org just stops sending the result, you can see this because the amount of data downloaded just freezes. This depends on the number of lines and fields you are retrieving, for us, normally 1 million with wich we havent found any problems

Just some numbers, recovering records 1M at a time, the process just takes a 1-2 minutes

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-02T18:33:36Z

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

cristiand391 · 2024-04-03T16:44:51Z

Hey @mnunezdm, you are right, data commands using the bulk API keep records in memory while paginating.
Will close as dup of #1995

We fixed this in the library (jsforce v3) and Allan proposed a new command for this but we haven't got any update from our side unfortunately:
salesforcecli/plugin-data#527

Our new PM @VivekMChawla will be back next week, I'll forward this to him so we can prioritize the work.

mnunezdm added the investigating We're actively investigating this issue label Apr 2, 2024

github-actions bot added the validated Version information for this issue has been validated label Apr 2, 2024

cristiand391 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sf data is unable to retrieve huge jobs #2799

sf data is unable to retrieve huge jobs #2799

mnunezdm commented Apr 2, 2024

github-actions bot commented Apr 2, 2024

cristiand391 commented Apr 3, 2024

sf data is unable to retrieve huge jobs #2799

sf data is unable to retrieve huge jobs #2799

Comments

mnunezdm commented Apr 2, 2024

Summary

Steps To Reproduce

Expected result

Actual result

System Information

Additional information

github-actions bot commented Apr 2, 2024

cristiand391 commented Apr 3, 2024