Allow CSV output when columns are consistent even if types aren't #4781

philrz · 2023-09-25T23:21:17Z

Repro is with Zed commit 80dbcda.

Consider this community user's CSV output example that motivated the changes in #4773

$ zq -version
Version: v1.9.0-24-g80dbcda5

$ echo '{a:1,b:null}{a:1,b:2}' | zq -f csv -
a,b
1,
CSV output requires uniform records but multiple types encountered (consider 'fuse')

Or here's another for a non-null case:

$ echo '{"a":1} {"a":"hi"}' | zq -f csv -
a
1
CSV output requires uniform records but multiple types encountered (consider 'fuse')

As the error messages indicate, adding fuse does work around the problem in both cases by taking a first pass through the data to coerce all the input values to construct a single merged record type. However, since CSV effectively lacks real data typing, what's truly important in the record type is that the fields are consistent, since once the field names in the header row are output there's no way to deal with additional fields later encountered. But using the examples above, a null value is going to be output the same in CSV (i.e., "nothing" between the comma delimiters) regardless of whether it had a particular type in Zed. Similarly, if we run fuse on the second example, indeed the union type is established to indicate the field could hold an integer or string:

$ echo '{"a":1} {"a":"hi"}' | zq -z 'fuse | count() by typeof(this)' -
{typeof:<{a:(int64,string)}>,count:2(uint64)}

but when those values are being output on the second pass, they're still printed in the same column of the CSV output as a number or string.

$ echo '{"a":1} {"a":"hi"}' | zq -f csv 'fuse' -
a
1
hi

i.e., the brief existence of the union type was just to satisfy a current constraint of the CSV writer, but didn't really enhance the output in any way.

In a discussion with @nwt he agreed that we could probably relax this constraint of the CSV writer. This would reduce the number of times that new users encounter the (consider 'fuse') message and have to take a detour to learn about why it's needed and how to use it.

The text was updated successfully, but these errors were encountered:

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

philrz · 2023-11-22T18:11:26Z

Verified in Zed commit 79aa231.

Repeating the two examples shown above, we can now output as CSV without requiring fuse.

$ zq -version
Version: v1.11.1-7-g79aa231a

$ echo '{a:1,b:null}{a:1,b:2}' | zq -f csv -
a,b
1,
1,2

$ echo '{"a":1} {"a":"hi"}' | zq -f csv -
a
1
hi

Thanks @mattnibs!

philrz added the community label Sep 25, 2023

philrz assigned mattnibs Nov 7, 2023

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

caba357

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs mentioned this issue Nov 17, 2023

CSV Writer: Allow different types in output #4889

Merged

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

ff1829b

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

7051f59

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

9269e8e

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

2923261

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs added a commit that referenced this issue Nov 17, 2023

CSV Writer: Allow different types in output

dc95dd8

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

mattnibs closed this as completed in #4889 Nov 22, 2023

mattnibs added a commit that referenced this issue Nov 22, 2023

CSV Writer: Allow different types in output (#4889)

6d8abb5

Adjust the csvio.Writer so that it can handle records with the same field names but different types. Closes #4781

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow CSV output when columns are consistent even if types aren't #4781

Allow CSV output when columns are consistent even if types aren't #4781

philrz commented Sep 25, 2023

philrz commented Nov 22, 2023

Allow CSV output when columns are consistent even if types aren't #4781

Allow CSV output when columns are consistent even if types aren't #4781

Comments

philrz commented Sep 25, 2023

philrz commented Nov 22, 2023