Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sf data export bulk fails in JSON for large extracts #3138

Closed
dbragaSFDC opened this issue Dec 5, 2024 · 8 comments
Closed

sf data export bulk fails in JSON for large extracts #3138

dbragaSFDC opened this issue Dec 5, 2024 · 8 comments
Labels
bug Issue or pull request that identifies or fixes a bug validated Version information for this issue has been validated

Comments

@dbragaSFDC
Copy link

### Summary
sf data export bulk fails in JSON for large extracts, works fine in CSV

### Steps To Reproduce

  1. Using v.2.68.6, authorized to a full sandbox.
  2. Run sf data export bulk --query --output-file <FILENAME.JSON> --result-format json, the file is new.
  3. The query has to have a big enough result to be broken down into multiple results (in my case, 5.5 million records broken into 25 batches of about 250k records.
  4. The command ends with the message "Error (EEXIST): EEXIST: file already exists, open '<FILENAME.JSON>'".
  5. Opening the file afterwards, looks like only the first result set was saved. My impression is that when it fetches the next result set using Sforce-Locator header, it tries to start a new write in the same file.
  6. Running the same command with CSV result format works just fine, result is one big 5.5 million file.

Expected result

JSON file with 5.5 million records should be created.

Actual result

JSON file with 250 thousand records was created.

Additional information

Screenshot 2024-12-05 at 2 23 24 PM

System Information

Using zsh inside VSCode.

{
  "architecture": "darwin-arm64",
  "cliVersion": "@salesforce/cli/2.68.6",
  "nodeVersion": "node-v20.17.0",
  "osVersion": "Darwin 23.6.0",
  "rootPath": "/opt/homebrew/lib/node_modules/@salesforce/cli",
  "shell": "zsh",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.2.10 (core)",
    "@oclif/plugin-commands 4.1.10 (core)",
    "@oclif/plugin-help 6.2.18 (core)",
    "@oclif/plugin-not-found 3.2.28 (core)",
    "@oclif/plugin-plugins 5.4.17 (core)",
    "@oclif/plugin-search 1.2.16 (core)",
    "@oclif/plugin-update 4.6.13 (core)",
    "@oclif/plugin-version 2.2.16 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.23 (core)",
    "@oclif/plugin-which 3.2.19 (core)",
    "@salesforce/cli 2.68.6 (core)",
    "apex 3.6.3 (core)",
    "api 1.3.2 (core)",
    "auth 3.6.75 (core)",
    "data 3.11.4 (core)",
    "deploy-retrieve 3.15.13 (core)",
    "info 3.4.21 (core)",
    "limits 3.3.40 (core)",
    "marketplace 1.3.6 (core)",
    "org 5.2.4 (core)",
    "packaging 2.9.3 (core)",
    "schema 3.3.42 (core)",
    "settings 2.4.6 (core)",
    "sobject 1.4.46 (core)",
    "telemetry 3.6.23 (core)",
    "templates 56.3.30 (core)",
    "trust 3.7.43 (core)",
    "user 3.6.3 (core)",
    "@salesforce/sfdx-scanner 4.6.0 (user) published 72 days ago (Tue Sep 24 2024) (latest is 4.7.0)",
    "sfdx-git-delta 5.49.0 (user) published 47 days ago (Sat Oct 19 2024) (latest is 5.49.4)"
  ]
}
@dbragaSFDC dbragaSFDC added the investigating We're actively investigating this issue label Dec 5, 2024
@github-actions github-actions bot added the validated Version information for this issue has been validated label Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

@cristiand391
Copy link
Member

hey, sorry about that, it's likely this line (trying to remember why I added it on purpose):
https://github.com/salesforcecli/plugin-data/blob/ef2ce822936e7e1627d23e730290303f67db880b/src/bulkUtils.ts#L131C1-L134C10

Opening the file afterwards, looks like only the first result set was saved. My impression is that when it fetches the next result set using Sforce-Locator header, it tries to start a new write in the same file.

correct, we fetch the first batch (API returns it as a CSV), parse it and write records as JSON:
https://github.com/salesforcecli/plugin-data/blob/ef2ce822936e7e1627d23e730290303f67db880b/src/bulkUtils.ts#L144

for CSV we do append only, no check if file exists:
https://github.com/salesforcecli/plugin-data/blob/ef2ce822936e7e1627d23e730290303f67db880b/src/bulkUtils.ts#L169

@dbragaSFDC
Copy link
Author

All good @cristiand391, I'm working around that using csvtojson for now... I'm glad it can take big files alright.

@mdonnalley mdonnalley added bug Issue or pull request that identifies or fixes a bug and removed investigating We're actively investigating this issue labels Dec 5, 2024
Copy link

git2gus bot commented Dec 5, 2024

This issue has been linked to a new work item: W-17380350

@cristiand391
Copy link
Member

the fix for this is available in our nightly channel if you want to give it a try:
https://developer.salesforce.com/docs/atlas.en-us.sfdx_setup.meta/sfdx_setup/sfdx_setup_install_cli_rc.htm

it will be promoted to RC on Wed. 11 and to stable on Wed 18.

Thanks for the heads up!

@dbragaSFDC
Copy link
Author

Thanks @cristiand391 I will check it out by the end of the week and provide feedback!

@dbragaSFDC
Copy link
Author

@cristiand391 I used rc 2.70.7 and was able to extract in JSON format using data export bulk.

Screenshot 2024-12-16 at 3 56 16 PM

I will follow up on Dec 18 to close this issue, thanks for the quick fix!

@jshackell-sfdc
Copy link
Collaborator

This issue has been addressed in version 2.70.7 (December 18, 2024). Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue or pull request that identifies or fixes a bug validated Version information for this issue has been validated
Projects
None yet
Development

No branches or pull requests

4 participants