Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onProgress() callback for BlockBlobClient.uploadData() in the browser isn't granular #32404

Open
1 of 6 tasks
au5ton opened this issue Jan 2, 2025 · 6 comments
Open
1 of 6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@au5ton
Copy link

au5ton commented Jan 2, 2025

  • Package Name: @azure/storage-blob
  • Package Version: 12.26.0
  • Operating system: Windows 23H2
  • nodejs
    • version:
  • browser
    • name/version: Edge/131.0.2903.112
  • typescript
    • version:
  • Is the bug related to documentation in

Describe the bug

The onProgress() callback when using BlockBlobClient.uploadData() in the browser is not granular and only updates after each block. This makes it impractical for progress bars and progress reporting without making chunks very small and inefficient. 

To Reproduce
Steps to reproduce the behavior:

  1. Acquire a sufficiently large test file: https://ash-speed.hetzner.com/1GB.bin
  2. Acquire a SAS URL for a browser to upload a blob directly to Azure Blob Storage
  3. Upload the blob in blocks:
const SAS_URL = '(...)';
const input = document.querySelector(`input[type=file]`);
const { BlockBlobClient, AnonymousCredential } = await import('@azure/storage-blob');
const blobClient = new BlockBlobClient(SAS_URL, new AnonymousCredential());
const res= await blobClient.uploadData(input.files.item(0), {
  maxSingleShotSize: 30_000_000, // ~30 megabytes
  blockSize: 30_000_000,
  concurrency: 1,
  onProgress(progress) {
    // only reports in chunks of 30_000_000 bytes, no intermediate progress
    console.log('Progress:', progress);
  }
});
  1. The onProgress callback does not report the progress granularly in a useful way.

Expected behavior
Expected that upload progress be recorded more interactively and at a sub-block refresh rate.

Screenshots
N/A

Additional context

From what I've researched, I am aware of a couple of different factors that contribute to this being the case:

I (naively) think this could be mostly resolved with a few steps:

  1. Export createXhrHttpClient() in @azure/core-rest-pipeline to allow browser consumers to import it again. Maybe there's a history with this being deprecated/removed?
  2. Implement block-level tracking in BlockBlobClient.uploadSeekableInternal() and as a result in BlockBlobClient.uploadData() (see snippets to stub code above)

What is the team's stance on this? Thanks.

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels Jan 2, 2025
Copy link

github-actions bot commented Jan 2, 2025

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

@jeremymeng
Copy link
Member

related: #32247

@xirzec
Copy link
Member

xirzec commented Jan 10, 2025

@au5ton we do have some code for stream bodies specifically to enable progress with fetch:

I wonder if passing the progress callback as you suggest in stageBlock would address this, though I think with block upload there are two subtly different scenarios:

  1. using onProgress to track how many blocks have successfully made their way to the service
  2. tracking the individual progress of streaming each block to the service.

Perhaps there could be a separate callback to this operation (e.g. blockOnProgress) that is passed through to the underlying HttpClient?

@au5ton
Copy link
Author

au5ton commented Jan 10, 2025

@xirzec

we do have some code for stream bodies specifically to enable progress with fetch:

Thanks for clarifying this!

Perhaps there could be a separate callback to this operation (e.g. blockOnProgress) that is passed through to the underlying HttpClient?

I'm not suggesting that the exported API surface of BlobBlobClient.uploadData() be changed, just that it publishes data in the same format/scale more often. From a consumer perspective, it seems unintuitive to me that that progress is reported exclusively in terms of block size, because the API doesn't suggest that and is more granular in a Node.js context.

Currently, transfer progress is calculated over the entire file...

let transferProgress: number = 0;

... and incremented and reported at the conclusion of each block.

// Update progress after block is successfully uploaded to server, in case of block trying
// TODO: Hook with convenience layer progress event in finer level
transferProgress += contentLength;
if (options.onProgress) {
options.onProgress!({
loadedBytes: transferProgress,
});
}

It could be possible that this is replaced so that each block is tracked separately and reported whenever stageBlock(..., { onProgress: () => {}) reports something.

Maybe something like this? If this approach is acceptable, I can make a PR.

diff --git a/sdk/storage/storage-blob/src/Clients.ts b/sdk/storage/storage-blob/src/Clients.ts
index 7e5dcee0a7..e95d71185e 100644
--- a/sdk/storage/storage-blob/src/Clients.ts
+++ b/sdk/storage/storage-blob/src/Clients.ts
@@ -4270,28 +4270,45 @@ export class BlockBlobClient extends BlobClient {

         const blockList: string[] = [];
         const blockIDPrefix = randomUUID();
-        let transferProgress: number = 0;
+        // Stores the amount of bytes progressed in each block
+        let transferProgressPerBlock: number[] = [];

         const batch = new Batch(options.concurrency);
         for (let i = 0; i < numBlocks; i++) {
+          // Initialize at 0
+          transferProgressPerBlock[i] = 0;
+          // Calculate block parameters
+          const start = blockSize * i;
+          const end = i === numBlocks - 1 ? size : start + blockSize;
+          const contentLength = end - start;
+          // Queue the block upload
           batch.addOperation(async (): Promise<any> => {
             const blockID = generateBlockID(blockIDPrefix, i);
-            const start = blockSize * i;
-            const end = i === numBlocks - 1 ? size : start + blockSize;
-            const contentLength = end - start;
             blockList.push(blockID);
             await this.stageBlock(blockID, bodyFactory(start, contentLength), contentLength, {
               abortSignal: options.abortSignal,
               conditions: options.conditions,
               encryptionScope: options.encryptionScope,
               tracingOptions: updatedOptions.tracingOptions,
+              onProgress(progress) {
+                // Record the progress in this block by index. Will overwrite if block is retried.
+                transferProgressPerBlock[i] = progress.loadedBytes;
+                // Report progress externally
+                if (options.onProgress) {
+                  options.onProgress!({
+                    // Report the sum of the array
+                    loadedBytes: transferProgressPerBlock.reduce((sum, a) => sum + a, 0),
+                  });
+                }
+              },
             });
             // Update progress after block is successfully uploaded to server, in case of block trying
+            // In case of inconsistencies in `onProgress` report, write the final value ourselves
+            transferProgressPerBlock[i] = contentLength;
-            // TODO: Hook with convenience layer progress event in finer level
-            transferProgress += contentLength;
             if (options.onProgress) {
               options.onProgress!({
-                loadedBytes: transferProgress,
+                // Report the sum of the array
+                loadedBytes: transferProgressPerBlock.reduce((sum, a) => sum + a, 0),
               });
             }
           });

@xirzec xirzec added feature-request This issue requires a new behavior in the product in order be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 21, 2025
@xirzec
Copy link
Member

xirzec commented Jan 21, 2025

Oh I see what you mean now. I believe this would be a good enhancement. Would you consider opening a PR with your changes plus some tests?

@au5ton
Copy link
Author

au5ton commented Jan 21, 2025

@xirzec Yes, I'd be happy to open a PR. It can be found here: #32642

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

4 participants