Fix record conversion for Arrays #591

Issue summary: I cannot use the Wrangler or any other XML plugin provided for a (a priori) simple use case which consist of importing (nested/repeated) XML data (that have repeated columns, i.e. JSON Arrays) to whatever sink. Steps to reproduce: 1. Create a pipeline GCS->Wrangler->Whatever sink (with the input path in GCS set as a runtime variable). 2. Use the following sample to create the output schema (with the xml-to-json transform) and run the pipeline with this file. <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <MyRoot> <SomeField> <Total>65.95</Total> <Total>3.98</Total> <Total TotalType="FinalTotal">65.95</Total> </SomeField> <Timer> <StartTimestamp>2022-10-03T11:01:48</StartTimestamp> </Timer> </MyRoot> 3. Oberve that the pipeline is successful. 4. Change the source to a new file: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <MyRoot> <SomeField> <Total>65.95</Total> </SomeField> <Timer> <StartTimestamp>2022-10-03T11:01:48</StartTimestamp> </Timer> </MyRoot> 5. Observe that the pipeline fails with the "Unable to decode array 'body_MyRoot_SomeField'" error. Why this PR? Because there is no general way to know when an XML contains repeated columns or not and thus everything should be expected to be repeated. Why I think it's a good idea to do that in the standard CDAP code: 1. Correct me if I'm wrong but this RecordConvertor.java is meant to convert the input Runtime data to match the Output Schema. It is NOT meant to "VALIDATE the input against the output schema". 2. It is a "high level" data type since an array is always filled with elements that have a type themselves (or no element but then we won't have any issue in the first place) thus doing this Collections.singletonList(object) is pretty much the "array equivalent" of doing Double.parseDouble(value); (which is already in this code) i.e. we basically cast the input to match the output schema.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix record conversion for Arrays #591

Fix record conversion for Arrays #591

Commits on Nov 16, 2022

Fix record conversion for Arrays #591

Are you sure you want to change the base?

Fix record conversion for Arrays #591

Commits on Nov 16, 2022