Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for loading UTF16 encoded files #648

Closed
wants to merge 2 commits into from

Conversation

souleb
Copy link
Member

@souleb souleb commented Sep 11, 2023

Add support for loading UTF16 encoded files if they have a correct BOM (Byte-Order-Mark https://en.wikipedia.org/wiki/Byte_order_mark) at the beginning of the file. Falls back on UTF8 encoding, if no understandable BOM is present.

This is dependent on go-yaml support, utf8, utf16LE and utf16BE. see: https://github.com/kubernetes-sigs/kustomize/blob/7c36ed21b3587e7124326743a888db5374e86d93/kyaml/internal/forked/github.com/go-yaml/yaml/readerc.go#L181C4-L181C4

@@ -49,6 +49,14 @@ func TestScanManifests(t *testing.T) {
base: "./testdata/nokustomization/panic",
wantErr: true,
},
{
name: "utf-16LE encoding",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give this test a more descriptive name, please? What does it actually test? Something like "Files with UTF-16LE encoding should be included".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus: This tests passes just fine without the changes to kustomize_generator.go.

@souleb
Copy link
Member Author

souleb commented Sep 11, 2023

I have tested this with an utf16-LE encoded yaml without a prepended BOM. Windows programs sometimes do that...

Well it fails with failed to decode Kubernetes YAML from testdata/nokustomization/utf16le/secret.yaml: MalformedYAMLError: yaml: control characters are not allowed. This is to be expected as we prepend utf8 BOM when it is missing.

This is not perfect but it is still better than the actual behaviour where it silently drop those files, because it provides an actionable error.

Note that it is a breaking change because running workload containing utf16-LE without BOM will start erroring.

If they have a correct BOM (Byte-Order-Mark https://en.wikipedia.org/wiki/Byte_order_mark) at the beginning
of the file, support non utf8 files. Falls back on UTF8 encoding, if no understandable BOM is present.

Signed-off-by: Soule BA <[email protected]>
@makkes
Copy link
Member

makkes commented Sep 11, 2023

I have tested this with an utf16-LE encoded yaml without a prepended BOM. Windows programs sometimes do that...

Well it fails with failed to decode Kubernetes YAML from testdata/nokustomization/utf16le/secret.yaml: MalformedYAMLError: yaml: control characters are not allowed. This is to be expected as we prepend utf8 BOM when it is missing.

I manually reverted your changes to kustomize_generator.go and ran the new test case and it passed. What am I missing?

@souleb
Copy link
Member Author

souleb commented Sep 11, 2023

I have tested this with an utf16-LE encoded yaml without a prepended BOM. Windows programs sometimes do that...
Well it fails with failed to decode Kubernetes YAML from testdata/nokustomization/utf16le/secret.yaml: MalformedYAMLError: yaml: control characters are not allowed. This is to be expected as we prepend utf8 BOM when it is missing.

I manually reverted your changes to kustomize_generator.go and ran the new test case and it passed. What am I missing?

Yes you are right, but in practice we have seen that it drops silently those files. I have to dig deeper. Let me put this on draft.

@souleb souleb marked this pull request as draft September 11, 2023 13:13
@souleb souleb closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants