You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
message test {
optional group a {
optional group foo (MAP) {
repeated group key_value {
required binary key (STRING);
optional binary value (STRING);
}
}
}
}
The problem is when I write the data into file, no error, seems ok. But. when I use. the 'parquet-tools' to cat the parquet file, it gives error:
java.lang.IllegalArgumentException: [a, foo, key_value, key] required binary key (STRING) is not in the store: [[a, foo, key_value, value] optional binary value (STRING)] 1
at org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:272)
at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:246)
at org.apache.parquet.tools.command.DumpCommand.dump(DumpCommand.java:195)
at org.apache.parquet.tools.command.DumpCommand.execute(DumpCommand.java:148)
at org.apache.parquet.tools.Main.main(Main.java:223)
java.lang.IllegalArgumentException: [a, foo, key_value, key] required binary key (STRING) is not in the store: [[a, foo, key_value, value] optional binary value (STRING)] 1
Unit test to reproduce
Described as above.
I guess the root cause is: in schema.go
funcrecursiveFix(col*Column, colPathColumnPath, maxR, maxDuint16, alloc*allocTracker) {
.......
col.maxR=maxRcol.maxD=maxD// at line 684, the append function internally always update the underlying array col.path=append(colPath, col.name)
ifcol.data!=nil {
col.data.reset(col.rep, col.maxR, col.maxD)
return
}
fori:=rangecol.children {
// so no matter how many children are, the colPath is alway the last child's path due to the bug in line 684recursiveFix(col.children[i], col.path, maxR, maxD, alloc)
}
}
so the quick fix should be
// copy the parent path firstcol.path=append([]string(nil), colPath...)
col.path=append(col.path, col.name)
parquet-go specific details
What version are you using?
0.12.0
Can this be reproduced in earlier versions?
not sure.
Misc Details
Are you using AWS Athena, Google BigQuery, presto... ? No, just normal parquet file.
Any other relevant details... how big are the files / rowgroups you're trying to read/write? A very small file.
Does this behavior exist in other implementations? (link to spec/implementation please)
Do you have memory stats to share?
Can you provide a stacktrace?
Can you upload a test file?
The text was updated successfully, but these errors were encountered:
Describe the bug
I have a schema
The problem is when I write the data into file, no error, seems ok. But. when I use. the 'parquet-tools' to cat the parquet file, it gives error:
Unit test to reproduce
Described as above.
I guess the root cause is: in
schema.go
so the quick fix should be
parquet-go specific details
0.12.0
not sure.
Misc Details
The text was updated successfully, but these errors were encountered: