-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to Azure blob sink to configure Content-Encoding
and Content-Type
#21795
Comments
Hi @heshanperera-alert, thanks for creating this issue.
Can you help me understand the following, is the Vector request accepted or rejected? |
Hello @pront No request does not fail, its successful. Azure doesn't return an error when downloading either. Its downloading successfully, but when i am about to extract, it just gives me the error when i use gunzip
|
Is this a valid gzip compressed file or your blob is raw bytes? For the latter, you have to set this https://vector.dev/docs/reference/configuration/sinks/azure_blob/#compression to |
@pront i believe its a valid gzip. if i remove the gzip header from the azure portal by editing the blob file everything works fine. I can decompress after downloading the file. I dont want to make the compression to none since we are going to send TBs of data and would like to keep it minimum with compression. |
I see, thank you for sharing these details. Internally we set the BlobContentEncoding which ultimately determines the value of the Unfortunately, I don't have an Azure environment that I can use to test this myself but I am not convinced that removing the content encoding header is the right thing to do. I wonder if it's an issue with Azure or with the crate version we are using. |
@pront do you have any workaround you think to get around this in the short run. Its unrealistic to remove the header from each and everyfile on azure blob as we do have millions of files out there and theres no command to do that from azure cli either. |
The note on the
https://vector.dev/docs/reference/configuration/sinks/aws_s3/#compression The same thing may apply to Azure Blob Storage. That is: if you download via the browser or some SDKs the file will be transparently decompressed when downloading. |
@jszwedko interesting, good thing on s3 sink is it has the ability to override the content-encoding header. azure blob sink doesnt have that capability |
Based a quick internet search (see this), Jesse is right. Azure decompresses automatically. Did you inspect the contents of the downloaded file on your host? Let us know, if so we can close this issue. (Note that gzip files start with the magic bytes |
@pront aint the blob sink should have the same capability like s3 sink, so that we could override the header? |
Are you referring to these?
We can add these to the What I am trying to understand is, if we have a Vector bug or not. If I am reading the above correctly, the downloaded blob is already decompressed but has a |
oh yeah sorry havent answered your question regarding the magic bytes. Its not having the 0x1f, 0x8b.
|
@pront do you know when the feature to overwrite the headers can be added to blob sink? |
Unfortunately this is not on our radar, there's on open feature request for this. If you are motivated, you are welcome to submit a PR and we will review it. |
Thank you for confirming. You can also inspect the contents to see if it matches what you published as one more verification step. |
Content-Encoding
and Content-Type
A note for the community
Problem
When using azure blob sink to upload some log files in gzip format, vector does add 'content-encoding' header. When we try to download and extract the gzip file we are running in to file corrupted error. However when we try to manually remove the content-encoding header from the file and then download the file, everything work as expected. There doesnt seem to have a way to remove this header from the configuration. What should we do? Following is the file properties on azure portal.
Configuration
No response
Version
0.37.1
Debug Output
Example Data
No response
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: