You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pubstructSchema{column_schemas:Vec<ColumnSchema>,name_to_index:HashMap<String,usize>,arrow_schema:Arc<ArrowSchema>,/// Index of the timestamp key column.////// Timestamp key column is the column holds the timestamp and forms part of/// the primary key. None means there is no timestamp key column.timestamp_index:Option<usize>,/// Version of the schema.////// Initial value is zero. The version should bump after altering schema.version:u32,}
From this Schema definition, we don't have an easy way to know the semantic type of one field. Only TIME INDEX is recorded. And it's recorded twice, ColumnSchema also has this info.
This works on querying, where the column might be temporary (e.g., the intermediate compute result) and doesn't have ColumnId and SemanticType. But if this is what Schema is for, then it contains too much unnecessary info. ArrowSchema is enough for the entire query processing phase.
The problem here is, we don't have a clear separation on which schema should be used in a phase. And we have to convert them from and into another.
We may need to be able to convert Schema to ArrowSchema for doing queries, by simply discarding unnecessary infos. And needn't to support converting ArrowSchema back to Schema, which seems to be a wrong requirement at present.
From my perspective, RegionMetadata is closer to what Schema should be. We may consider merging them into one:
pubstructRegionMetadata{/// Columns in the region. Has the same order as columns/// in [schema](RegionMetadata::schema).pubcolumn_metadatas:Vec<ColumnMetadata>,/// Maintains an ordered list of primary keyspubprimary_key:Vec<ColumnId>,/// Immutable and unique id of a region.pubregion_id:RegionId,/// Current version of the region schema.////// The version starts from 0. Altering the schema bumps the version.pubschema_version:u64,}
Tasks
Replace Schema with RegionMetadata and simplify RegionMetadata correspondingly.
Implementation challenges
No response
The text was updated successfully, but these errors were encountered:
What type of enhancement is this?
Tech debt reduction
What does the enhancement do?
Important fields in
Schema
:From this
Schema
definition, we don't have an easy way to know the semantic type of one field. OnlyTIME INDEX
is recorded. And it's recorded twice,ColumnSchema
also has this info.Another missing info in
Schema
isColumnId
.This works on querying, where the column might be temporary (e.g., the intermediate compute result) and doesn't have
ColumnId
andSemanticType
. But if this is whatSchema
is for, then it contains too much unnecessary info.ArrowSchema
is enough for the entire query processing phase.The problem here is, we don't have a clear separation on which schema should be used in a phase. And we have to convert them from and into another.
We may need to be able to convert
Schema
toArrowSchema
for doing queries, by simply discarding unnecessary infos. And needn't to support convertingArrowSchema
back toSchema
, which seems to be a wrong requirement at present.From my perspective,
RegionMetadata
is closer to whatSchema
should be. We may consider merging them into one:Tasks
Schema
withRegionMetadata
and simplifyRegionMetadata
correspondingly.Implementation challenges
No response
The text was updated successfully, but these errors were encountered: