You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be neat to have richer support for struct tags for auto-generated schema definitions. I added this feature to a branch off my forked repo and am happy to put up a PR if you guys think this is a good idea! I added documentation on what this would look like (I just copied the updates I made to the README on my branch).
Object Schema Definitions
The sub-package parquetschema/autoschema supports auto-generating schema
definitions for a provided object's type using reflection and struct tags. The
generated schema is meant to be compatible with the reflection-based
marshalling/unmarshalling in the floor sub-package.
Supported Parquet Types
Parquet Type
Go Types
Note
BOOLEAN
bool
INT32
int{8,16,32}, uint{,8,16,32}
INT64
int{,64}, uint64
INT96
[12]byte
Must specify type=INT96 in the parquet struct tag.
Pointers are automatically mapped to optional fields. Unsupported Go types
include funcs, interfaces, unsafe pointers, unsigned int pointers, and complex
numbers.
Default Type Mappings
By default, Go types are mapped to Parquet types and in some cases logical
types as well. More specific mappings can be achieved by the use of struct
tags (see below).
Go Type
Default Parquet Type
Default Logical Type
bool
BOOLEAN
int{,8,16,32,64}
INT{64,32,32,32,64}
INTEGER({64,8,16,32,64}, true)
uint{,8,16,32,64}
INT{32,32,32,32,64}
INTEGER({32,8,16,32,64}, false)
string
BYTE_ARRAY
STRING
[]byte
BYTE_ARRAY
[N]byte
FIXED_LEN_BYTE_ARRAY
time.Time
INT64
TIMESTAMP(NANOS, true)
goparquet.Time
INT64
TIME(NANOS, true)
map
group
MAP
slice, array
group
LIST
struct
group
Struct Tags
Automatic schema definition generation supports the use of the parquet struct
tag for further schema specification beyond the default mappings. Tag fields
have the format key=value and are comma separated. The tags do not support
converted types as these are now deprecated by Parquet. Since converted types
are still required to support backward compatibility, they are automatically
set based on a field's logical type.
Tag Field
Type
Values
Notes
name
string
ANY
Defaults to the lower-case struct field name.
type
string
INT96
Unless using a [12]byte field for INT96, this does not ever need to be specified.
logicaltype
string
STRING, ENUM, DECIMAL, DATE, TIME, TIMESTAMP, JSON, BSON, UUID
Maps and non-byte slices and arrays are always mapped to MAP and LIST logical types, respectively.
timeunit
string
MILLIS, MICROS, NANOS
Only used when the logical type is TIME or TIMESTAMP, defaults to NANOS.
isadjustedtoutc
bool
ANY
Only used when the logical type is TIME or TIMESTAMP, defaults to true.
scale
int32
N >= 0
Only used when the logical type is DECIMAL, defaults to 0.
precision
int32
N >= 0
Only used when the logical type is DECIMAL, required.
All fields must be prefixed by key. and value. when referring to key and
value types of a map, respectively, and element. when referring to the
element type of a slice or array. It is invalid to prefix name since it can
only apply to the field itself.
It would be neat to have richer support for struct tags for auto-generated schema definitions. I added this feature to a branch off my forked repo and am happy to put up a PR if you guys think this is a good idea! I added documentation on what this would look like (I just copied the updates I made to the README on my branch).
Object Schema Definitions
The sub-package
parquetschema/autoschema
supports auto-generating schemadefinitions for a provided object's type using reflection and struct tags. The
generated schema is meant to be compatible with the reflection-based
marshalling/unmarshalling in the
floor
sub-package.Supported Parquet Types
type=INT96
in theparquet
struct tag.Supported Logical Types
Pointers are automatically mapped to optional fields. Unsupported Go types
include funcs, interfaces, unsafe pointers, unsigned int pointers, and complex
numbers.
Default Type Mappings
By default, Go types are mapped to Parquet types and in some cases logical
types as well. More specific mappings can be achieved by the use of struct
tags (see below).
Struct Tags
Automatic schema definition generation supports the use of the
parquet
structtag for further schema specification beyond the default mappings. Tag fields
have the format
key=value
and are comma separated. The tags do not supportconverted types as these are now deprecated by Parquet. Since converted types
are still required to support backward compatibility, they are automatically
set based on a field's logical type.
INT96
STRING
,ENUM
,DECIMAL
,DATE
,TIME
,TIMESTAMP
,JSON
,BSON
,UUID
MILLIS
,MICROS
,NANOS
NANOS
.true
.All fields must be prefixed by
key.
andvalue.
when referring to key andvalue types of a map, respectively, and
element.
when referring to theelement type of a slice or array. It is invalid to prefix
name
since it canonly apply to the field itself.
Object Schema Example
The above struct is equivalent to the following schema definition:
The text was updated successfully, but these errors were encountered: