Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue summary: I cannot use the Wrangler or any other XML plugin provided for a (a priori) simple use case which consist of importing (nested/repeated) XML data (that have repeated columns, i.e. JSON Arrays) to whatever sink.
Steps to reproduce:
Create a pipeline GCS->Wrangler->Whatever sink (with the input path in GCS set as a runtime variable).
Use the following sample to create the output schema (with the xml-to-json transform) and run the pipeline with this file.
Oberve that the pipeline is successful.
Change the source to a new file:
Why this PR? Because there is no general way to know when an XML contains repeated columns or not and thus everything should be expected to be repeated.
Why I think it's a good idea to do that in the standard CDAP code: