diff --git a/Project.toml b/Project.toml index 731db97..fe534c8 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "Legolas" uuid = "741b9549-f6ed-4911-9fbf-4a1c0c97f0cd" authors = ["Beacon Biosignals, Inc."] -version = "0.5.12" +version = "0.5.13" [deps] Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45" diff --git a/docs/src/index.md b/docs/src/index.md index 9aebe7b..b539754 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -20,7 +20,7 @@ Legolas.name Legolas.version Legolas.identifier Legolas.parent -Legolas.required_fields +Legolas.declared_fields Legolas.declaration Legolas.record_type Legolas.schema_version_from_record diff --git a/docs/src/schema-concepts.md b/docs/src/schema-concepts.md index 7ce7ef8..1a0bef8 100644 --- a/docs/src/schema-concepts.md +++ b/docs/src/schema-concepts.md @@ -27,11 +27,13 @@ While it is fairly established practice to [semantically version source code](ht **Do not introduce a change to an existing schema version that might cause existing compliant data to become non-compliant; instead, incorporate the intended change in a new schema version whose version number is one greater than the previous version number.** -For example, a schema author must introduce a new schema version for any of the following changes: +A schema author must introduce a new schema version if any of the following changes are introduced: -- A new type-restricted required field is added to the schema. -- An existing required field's type restriction is tightened. -- An existing required field is renamed. +- A new type-constrained and/or value-constrained field is declared. In other words, for the introduction of a new declared field to be non-breaking, the new field's type constraint must be `::Any` and it may not feature a value-constraining or value-transforming assignment expression. +- An existing declared field's type or value constraints are tightened. +- An existing declared field is renamed. + +If any of the above breaking changes are made to an existing schema version, instead of introducing a new schema version, subtle downstream breakage may occur. For example, if a new type/value-constrained field is declared, previously compliant tables containing a field with the same name might accidentally become non-compliant if existing values violate the new constraints. Similarly, downstream schema version extensions may have already declared a field with the same name, but with constraints that are incompatible with the new constraints. One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@version(FooV1, ...)` and `@version(FooV2, ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema version and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments. diff --git a/docs/src/upgrade.md b/docs/src/upgrade.md index c137b61..9b57dee 100644 --- a/docs/src/upgrade.md +++ b/docs/src/upgrade.md @@ -8,15 +8,15 @@ See [here](https://github.com/beacon-biosignals/Legolas.jl/pull/54) for a compre * In Legolas v0.4, every `Legolas.Row` field's type was available as a type parameter of `Legolas.Row`; for example, the type of a field `y` specified as `y::Real` in a `Legolas.@row` declaration would be surfaced like `Legolas.Row{..., NamedTuple{(...,:y,...),Tuple{...,typeof(y),...}}`. In Legolas v0.5, the schema version author controls which fields have their types surfaced as type parameters in Legolas-generated record types via the `field::(<:F)` syntax in [`@version`](@ref). * Additionally, to include type parameters associated to fields in a parent schema, they must be re-declared in the child schema. For example, the package LegolasFlux declares a `ModelV1` version with a field `weights::(<:Union{Missing,Weights})`. LegolasFlux includes an [example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L78-L80) with a schema extension `DigitsRowV1` which extends `ModelV1`. This `@version` call must re-declare the field `weights` to be parametric in order for the `DigitsRowV1` struct to also have a type parameter for this field. -* In Legolas v0.4, `@row`-generated `Legolas.Row` constructors accepted and propagated any non-schema-required fields provided by the caller. In Legolas v0.5, `@version`-generated record type constructors will discard any non-schema-required fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-required fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as required fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as required fields directly to the schema (likely declared as `::Union{Missing,T}` to allow them to be missing). - +* In Legolas v0.4, `@row`-generated `Legolas.Row` constructors accepted and propagated any non-schema-declared fields provided by the caller. In Legolas v0.5, `@version`-generated record type constructors will discard any non-schema-declared fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-declared fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as declared fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as declared fields directly to the schema (likely declared as `::Union{Missing,T}` to allow them to be missing). +* Before Legolas v0.5, the documented guidance for schema authors surrounding new fields' impact on schema version breakage was misleading, implying that adding a new declared field to an existing schema version is non-breaking if the field's type allowed for `Missing` values. This is incorrect. For clarity, *adding a new declared field to an existing schema version is a breaking change unless the field's type and value are both completely unconstrained in the declaration*, i.e. the field's type constraint must be `::Any` and may not feature a value-constraining or value-transforming assignment expression. ## Deserializing old tables with Legolas v0.5 Generally, tables serialized with earlier versions of Legolas can be de-serialized with Legolas v0.5, making it only a "code-breaking" change, rather than a "data-breaking" change. However, it is strongly suggested to have reference tests with checked in (pre-Legolas v0.5) serialized tables which are deserialized and verified during the tests, in order to be sure. -Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined `Legolas.Row` values (i.e. a table that contains a row that has a field that is, itself, a `Legolas.Row` value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with `Legolas.read` using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users). +Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined `Legolas.Row` values (i.e. a table that contains a row that has a field that is, itself, a `Legolas.Row` value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with `Legolas.read` using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users). -If you would like to provide such a pathway, though: +If you would like to provide such a pathway, though: -Recall that under Legolas v0.4, `@row`-generated `Legolas.Row` constructors may accept and propagate arbitrary non-schema-required fields, whereas Legolas v0.5's `@version`-generated record types may only contain schema-required fields. Therefore, one must decide what to do with any non-required fields present in serialized `Legolas.Row` values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding `@version` declaration. For example, [this LegolasFlux example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L64-L84) uses a function `compat_config` to handle old `Legolas.Row` values, but does not add any handling for non-required fields, which will be discarded if present. If one did not want non-required fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like `extras::Union{Missing, NamedTuple}`). +Recall that under Legolas v0.4, `@row`-generated `Legolas.Row` constructors may accept and propagate arbitrary non-schema-declared fields, whereas Legolas v0.5's `@version`-generated record types may only contain schema-declared fields. Therefore, one must decide what to do with any non-declared fields present in serialized `Legolas.Row` values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding `@version` declaration. For example, [this LegolasFlux example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L64-L84) uses a function `compat_config` to handle old `Legolas.Row` values, but does not add any handling for non-declared fields, which will be discarded if present. If one did not want non-declared fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like `extras::Union{Missing, NamedTuple}`). diff --git a/examples/tour.jl b/examples/tour.jl index 548a147..ac0327f 100644 --- a/examples/tour.jl +++ b/examples/tour.jl @@ -22,10 +22,9 @@ using Legolas: @schema, @version, complies_with, find_violation, find_violations @schema "example.foo" Foo # The above schema declaration provides the necessary scaffolding to start declaring -# new *versions* of the `example.foo` schema. Schema version declarations specify the -# set of required fields that a given table (or row) must contain in order to comply -# with that schema version. Let's use the `@version` macro to declare an initial -# version of the `example.foo` schema with some required fields: +# new *versions* of the `example.foo` schema. Let's use the `@version` macro to declare +# an initial version of the `example.foo` schema, and in particular, declare the fields +# that a given table (or row) must contain to comply with this new schema version: @version FooV1 begin a::Real b::String @@ -40,7 +39,7 @@ end # special types that match it. For example, our `@version` declaration above generated: # # - `FooV1`: A special subtype of `Tables.AbstractRow` whose fields match the corresponding -# schema version's declared required fields. +# schema version's declared fields. # - `FooV1SchemaVersion`: An alias for `Legolas.SchemaVersion` that matches the corresponding # schema version. @@ -63,11 +62,11 @@ end # `example.foo@1`. # For example, all of the following `Tables.Schema`s comply with `example.foo@1`: -for s in [Tables.Schema((:a, :b, :c, :d), (Real, String, Any, AbstractVector)), # All required fields must be present... +for s in [Tables.Schema((:a, :b, :c, :d), (Real, String, Any, AbstractVector)), # All fields declared by the schema version must be present... Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Vector)), # ...and have subtypes that match the schema's declared type constraints. Tables.Schema((:b, :a, :d, :c), (String, Int, Vector, Float64)), # Fields do not have to be in any particular order, as long as they are present. - Tables.Schema((:a, :b, :d), (Int, String, Vector)), # Fields whose declared type constraints are `>:Missing` may be elided entirely. - Tables.Schema((:a, :x, :b, :y, :d), (Int, Any, String, Any, Vector))] # Non-required fields may also be present. + Tables.Schema((:a, :b, :d), (Int, String, Vector)), # If a declared field is elided, it is implicitly interpreted as present, but `Missing`. + Tables.Schema((:a, :x, :b, :y, :d), (Int, Any, String, Any, Vector))] # Non-declared fields may additionally be present. # if `complies_with` finds a violation, it returns `false`; returns `true` otherwise @test complies_with(s, FooV1SchemaVersion()) @@ -85,13 +84,13 @@ end # ...while the below `Tables.Schema`s do not: -s = Tables.Schema((:a, :c, :d), (Int, Float64, Vector)) # The required non-`>:Missing` field `b::String` is not present. +s = Tables.Schema((:a, :c, :d), (Int, Float64, Vector)) # The declared field `b::String` is missing. @test !complies_with(s, FooV1SchemaVersion()) @test_throws ArgumentError validate(s, FooV1SchemaVersion()) @test isequal(find_violation(s, FooV1SchemaVersion()), :b => missing) @test isequal(find_violations(s, FooV1SchemaVersion()), [:b => missing]) -s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of required field `d::AbstractVector` is not `<:AbstractVector`. +s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of declared field `d` does not match its declared type constraint (`AbstractVector`). @test !complies_with(s, FooV1SchemaVersion()) @test_throws ArgumentError validate(s, FooV1SchemaVersion()) @test isequal(find_violation(s, FooV1SchemaVersion()), :d => Any) @@ -99,9 +98,9 @@ s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of r # The expectations that characterize Legolas' particular notion of "schematic compliance" - requiring the # presence of pre-specified declared fields, assuming non-present fields to be implicitly `missing`, and allowing -# the presence of non-required fields - were chosen such that the question "Does the table `t` comply with the Legolas +# the presence of non-declared fields - were chosen such that the question "Does the table `t` comply with the Legolas # schema version `s`?" is roughly equivalent to "Can a logical view be trivially constructed atop table `t` that contains -# only the required fields declared by `s`?". The ability to cleanly ask this question enables a weak notion of "subtyping" +# only the fields declared by `s`?". The ability to cleanly ask this question enables a weak notion of "subtyping" # (see https://en.wikipedia.org/wiki/Duck_typing, https://en.wikipedia.org/wiki/Liskov_substitution_principle) that is # core to Legolas' mechanisms for defining, extending, and versioning interfaces to tabular data. @@ -110,7 +109,7 @@ s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of r ##### # As mentioned in this tour's introduction, `FooV1` is a subtype of `Tables.AbstractRow` whose fields are guaranteed to -# match all the fields required by `example.foo@1`. We refer to such Legolas-generated types as "record types" (see +# match all the fields declared by `example.foo@1`. We refer to such Legolas-generated types as "record types" (see # https://en.wikipedia.org/wiki/Record_(computer_science)). These record types are direct subtypes of # `Legolas.AbstractRecord`, which is, itself, a subtype of `Tables.AbstractRow`: @test FooV1 <: Legolas.AbstractRecord <: Tables.AbstractRow @@ -123,18 +122,18 @@ fields = (a=1.0, b="hi", c=π, d=[1, 2, 3]) # This may seem like a fairly trivial constructor in the preceding example, but it has some properties # that can be quite convenient in practice. Specifically, row values provided to `FooV1` may: # -# - ...contain the associated schema version's required fields in any order -# - ...elide required fields, in which case the constructor will assume them to be `missing` -# - ...contain any other fields in addition to the required fields; such additional fields are simply ignored +# - ...contain the associated schema version's declared fields in any order +# - ...elide declared fields, in which case the constructor will assume them to be `missing` +# - ...contain any other fields in addition to the declared fields; such additional fields are simply ignored # by the constructor and are not propagated through to the resulting record. # # Demonstrating a few of these properties: -# Providing the additional non-required field `x` in the input, which is simply ignored: +# Providing the additional non-declared field `x` in the input, which is simply ignored: fields_with_x = (; fields..., x="x") @test NamedTuple(FooV1(fields_with_x)) == fields -# Eliding the required field `c`, which is assigned `missing` in the output: +# Eliding the declared field `c`, which is assigned `missing` in the output: foo = FooV1(; a=1.0, b="hi", d=[1, 2, 3]) @test isequal(NamedTuple(foo), (a=1.0, b="hi", c=missing, d=[1, 2, 3])) @@ -218,8 +217,8 @@ fields = (x=1, y=1) ##### Extending Existing Schema Versions ##### -# New schema versions can inherit other schema version's required fields. Here, we declare `example.baz@1` -# as an "extension" of `example.bar@1`: +# New schema versions can inherit other schema version's declared fields as their own. Here, we declare +# `example.baz@1` as an "extension" of `example.bar@1`: @schema "example.baz" Baz @version BazV1 > BarV1 begin @@ -228,16 +227,16 @@ fields = (x=1, y=1) k::Int64 = ismissing(k) ? length(z) : k end -# Notice how the child's `@version` declaration may reference the parent's required fields (but need not reference -# every single one), may tighten the constraints of the parent's required fields, and may introduce new required -# fields atop the parent's required fields. +# Notice how the child's `@version` declaration may reference the parent's declared fields (but need not reference +# every single one), may tighten the constraints of the parent's declared fields, and may introduce new declared +# fields atop the declared fields inherited from the parent. # For a given Legolas schema version extension to be valid, all `Tables.Schema`s that comply with the child -# must comply with the parent, but the reverse need not be true. We can check a schema version's required fields -# and their type constraints via `Legolas.required_fields`. Based on these outputs, it is a worthwhile exercise +# must comply with the parent, but the reverse need not be true. We can check a schema version's declared fields +# and their type constraints via `Legolas.declared_fields`. Based on these outputs, it is a worthwhile exercise # to confirm for yourself that `BazV1SchemaVersion` is a valid extension of `BarV1SchemaVersion` under the aforementioned rule: -@test Legolas.required_fields(BarV1SchemaVersion()) == (x=Union{Missing,Int8}, y=String, z=String) -@test Legolas.required_fields(BazV1SchemaVersion()) == (x=Int8, y=String, z=String, k=Int64) +@test Legolas.declared_fields(BarV1SchemaVersion()) == (x=Union{Missing,Int8}, y=String, z=String) +@test Legolas.declared_fields(BazV1SchemaVersion()) == (x=Int8, y=String, z=String, k=Int64) # As a counterexample, the following is invalid, because the declaration of `x::Any` would allow for `x` # values that are disallowed by the parent schema version `example.bar@1`: @@ -263,7 +262,7 @@ end # One last note on syntax: You might ask "Why use the greater-than symbol as the inheritance operator instead of `<:`?" # There are a few reasons. The primary reason is purely historical: earlier versions of Legolas did not as rigorously -# demand/enforce subtyping relationships between parent and child schemas' required fields, and so the `<:` operator +# demand/enforce subtyping relationships between parent and child schemas' declared fields, and so the `<:` operator # was considered to be a bit too misleading. A secondary reason in favor of `>` was that it implied the actual order # of application of constraints (i.e. the parent's are applied before the child's). Lastly, `>` aligns well with the # property that child schema versions have a greater number of constraints than their parents. @@ -294,10 +293,10 @@ fields = (a=1.0, b="b", c=3, d=[1,2,3]) # https://beacon-biosignals.github.io/Legolas.jl/stable/schema-concepts/#Schema-Versioning:-You-Break-It,-You-Bump-It-1 ##### -##### Parameterizing Required Field Types +##### Parameterizing Declared Field Types ##### -# Sometimes, it's useful to surface a required field's type as a type parameter of the generated record type. To +# Sometimes, it's useful to surface a declared field's type as a type parameter of the generated record type. To # achieve this, the `@version` macro supports use of the `<:` operator to mark fields whose types should be exposed # as parameters. For example: @@ -316,8 +315,8 @@ end @test typeof(ParamV1{Int,Float32}(a=1, b=2.0, c="3", d=1)) === ParamV1{Int,Float32} # Note that extension schema versions do not implicitly "inherit" their parent's type parameters; if you'd -# like to parameterize the type of a parent's required field in the child schema version, you should explicitly -# include the field in the child's required field list: +# like to parameterize the type of a parent's declared field in the child schema version, you should explicitly +# include the field in the child's declared field list: @schema "example.child-param" ChildParam @@ -415,7 +414,7 @@ end @test complies_with(Tables.Schema((:id,), (UUID,)), PortableV1SchemaVersion()) @test complies_with(Tables.Schema((:id,), (UInt128,)), PortableV1SchemaVersion()) -# How is this possible? Well, when Legolas checks whether a given field `f::T` matches a required field `f::F`, it doesn't +# How is this possible? Well, when Legolas checks whether a given field `f::T` matches a declared field `f::F`, it doesn't # directly check that `T <: F`; instead, it checks that `T <: Legolas.accepted_field_type(sv, F)` where `sv` is the relevant # `SchemaVersion`. The fallback definition of `Legolas.accepted_field_type(::SchemaVersion, F::Type)` is simply `F`, but there # are a few other default overloads to support common Base types that can cause portability issues: diff --git a/src/schemas.jl b/src/schemas.jl index e446e80..4935634 100644 --- a/src/schemas.jl +++ b/src/schemas.jl @@ -166,29 +166,31 @@ written via [`Legolas.write`](@ref). identifier(sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv)) """ - Legolas.required_fields(sv::Legolas.SchemaVersion) + Legolas.declared_fields(sv::Legolas.SchemaVersion) Return a `NamedTuple{...,Tuple{Vararg{DataType}}` whose fields take the form: - = + = -If `sv` has a parent, the returned fields will include `required_fields(parent(sv))`. +If `sv` has a parent, the returned fields will include `declared_fields(parent(sv))`. """ -required_fields(sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv)) +declared_fields(sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv)) + +@deprecate required_fields(sv) declared_fields(sv) false """ Legolas.declaration(sv::Legolas.SchemaVersion) Return a `Pair{String,Vector{NamedTuple}}` of the form - schema_version_identifier::String => required_field_infos::Vector{Legolas.RequiredFieldInfo} + schema_version_identifier::String => declared_field_infos::Vector{Legolas.DeclaredFieldInfo} -where `RequiredFieldInfo` has the fields: +where `DeclaredFieldInfo` has the fields: -- `name::Symbol`: the required field's name -- `type::Union{Symbol,Expr}`: the required field's declared type constraint -- `parameterize::Bool`: whether or not the required field is exposed as a parameter -- `statement::Expr`: the required field's full assignment statement (as processed by `@version`, not necessarily as written) +- `name::Symbol`: the declared field's name +- `type::Union{Symbol,Expr}`: the declared field's declared type constraint +- `parameterize::Bool`: whether or not the declared field is exposed as a parameter +- `statement::Expr`: the declared field's full assignment statement (as processed by `@version`, not necessarily as written) Note that `declaration` is primarily intended to be used for interactive discovery purposes, and does not include the contents of `declaration(parent(sv))`. @@ -264,7 +266,7 @@ accepted_field_type(sv::SchemaVersion, ::Type{Union{T,Missing}}) where {T} = Uni """ Legolas.find_violation(ts::Tables.Schema, sv::Legolas.SchemaVersion) -For required field `f::F` of `sv`: +For each field `f::F` declared by `sv`: - Define `A = Legolas.accepted_field_type(sv, F)` - If `f::T` is present in `ts`, ensure that `T <: A` or else immediately return `f::Symbol => T::DataType`. @@ -307,13 +309,13 @@ function validate(ts::Tables.Schema, sv::SchemaVersion) if ismissing(violation) push!(field_err, field) else - expected = getfield(required_fields(sv), field) + expected = getfield(declared_fields(sv), field) push!(type_err, (field, expected, violation)) end end err_msg = "Tables.Schema violates Legolas schema `$(string(name(sv), "@", version(sv)))`:\n" for err in field_err - err_msg *= " - Could not find required field: `$err`\n" + err_msg *= " - Could not find declared field: `$err`\n" end for (field, expected, violation) in type_err err_msg *= " - Incorrect type: `$field` expected `<:$expected`, found `$violation`\n" @@ -409,24 +411,27 @@ function Base.showerror(io::IO, e::SchemaVersionDeclarationError) for a particular schema via a prior `@schema` declaration and `n` is a non-negative integer literal. - - `@version` declarations must list at least one required field, - and must not list duplicate fields within the same declaration. + - `@version` declarations must declare at least one field, and must not + declare duplicate fields within the same declaration. - New versions of a given schema may only be declared within the same module that declared the schema. """) end -struct RequiredFieldInfo +struct DeclaredFieldInfo name::Symbol type::Union{Symbol,Expr} parameterize::Bool statement::Expr end -Base.:(==)(a::RequiredFieldInfo, b::RequiredFieldInfo) = all(getfield(a, i) == getfield(b, i) for i in 1:fieldcount(RequiredFieldInfo)) +# We maintain an alias to the deprecated name for this type, xref https://github.com/beacon-biosignals/Legolas.jl/pull/100 +Base.@deprecate_binding RequiredFieldInfo DeclaredFieldInfo + +Base.:(==)(a::DeclaredFieldInfo, b::DeclaredFieldInfo) = all(getfield(a, i) == getfield(b, i) for i in 1:fieldcount(DeclaredFieldInfo)) -function _parse_required_field_info!(f) +function _parse_declared_field_info!(f) f isa Symbol && (f = Expr(:(::), f, :Any)) f.head == :(::) && (f = Expr(:(=), f, f.args[1])) f.head == :(=) && f.args[1] isa Symbol && (f.args[1] = Expr(:(::), f.args[1], :Any)) @@ -437,7 +442,7 @@ function _parse_required_field_info!(f) type = type.args[1] parameterize = true end - return RequiredFieldInfo(f.args[1].args[1], type, parameterize, f) + return DeclaredFieldInfo(f.args[1].args[1], type, parameterize, f) end function _has_valid_child_field_types(child_fields::NamedTuple, parent_fields::NamedTuple) @@ -461,17 +466,17 @@ end function _generate_schema_version_definitions(schema_version::SchemaVersion, parent, declared_field_names_types, schema_version_declaration) identifier_string = string(name(schema_version), '@', version(schema_version)) - required_field_names_types = declared_field_names_types + declared_field_names_types = declared_field_names_types if !isnothing(parent) identifier_string = string(identifier_string, '>', Legolas.identifier(parent)) - required_field_names_types = merge(Legolas.required_fields(parent), required_field_names_types) + declared_field_names_types = merge(Legolas.declared_fields(parent), declared_field_names_types) end quoted_schema_version_type = Base.Meta.quot(typeof(schema_version)) return quote @inline $Legolas.declared(::$quoted_schema_version_type) = true @inline $Legolas.identifier(::$quoted_schema_version_type) = $identifier_string @inline $Legolas.parent(::$quoted_schema_version_type) = $(Base.Meta.quot(parent)) - $Legolas.required_fields(::$quoted_schema_version_type) = $required_field_names_types + $Legolas.declared_fields(::$quoted_schema_version_type) = $declared_field_names_types $Legolas.declaration(::$quoted_schema_version_type) = $(Base.Meta.quot(schema_version_declaration)) end end @@ -482,7 +487,7 @@ function _generate_validation_definitions(schema_version::SchemaVersion) statements = Expr[] violations = gensym() fail_fast || push!(statements, :($violations = Pair{Symbol,Union{Type,Missing}}[])) - for (fname, ftype) in pairs(required_fields(schema_version)) + for (fname, ftype) in pairs(declared_fields(schema_version)) fname = Base.Meta.quot(fname) found = :($fname => result) handle_found = fail_fast ? :(return $found) : :(push!($violations, $found)) @@ -509,7 +514,7 @@ end _schema_version_from_record_type(::Nothing) = nothing -# Note also that this function's implementation is allowed to "observe" `Legolas.required_fields(parent)` +# Note also that this function's implementation is allowed to "observe" `Legolas.declared_fields(parent)` # (if a parent exists), but is NOT allowed to "observe" `Legolas.declaration(parent)`, since the latter # includes the parent's declared field RHS statements. We cannot interpolate/incorporate these statements # in the child's record type definition because they may reference bindings from the parent's `@version` @@ -520,7 +525,7 @@ function _generate_record_type_definitions(schema_version::SchemaVersion, record schema_version_type_alias_definition = :(const $T = $(Base.Meta.quot(typeof(schema_version)))) # generate building blocks for record type definitions - record_fields = required_fields(schema_version) + record_fields = declared_fields(schema_version) _, declared_field_infos = declaration(schema_version) declared_field_infos = Dict(f.name => f for f in declared_field_infos) type_param_defs = Expr[] @@ -592,7 +597,7 @@ function _generate_record_type_definitions(schema_version::SchemaVersion, record if !isnothing(parent) p = gensym() P = Base.Meta.quot(record_type(parent)) - parent_record_field_names = keys(required_fields(parent)) + parent_record_field_names = keys(declared_fields(parent)) parent_record_application = quote $p = $P(; $(parent_record_field_names...)) $((:($n = $p.$n) for n in parent_record_field_names)...) @@ -681,14 +686,14 @@ end """ @version RecordType begin - required_field_expression_1 - required_field_expression_2 + declared_field_expression_1 + declared_field_expression_2 ⋮ end @version RecordType > ParentRecordType begin - required_field_expression_1 - required_field_expression_2 + declared_field_expression_1 + declared_field_expression_2 ⋮ end @@ -699,27 +704,27 @@ Given a prior `@schema` declaration of the form: ...the `n`th version of `example.name` can be declared in the same module via a `@version` declaration of the form: @version NameV\$(n) begin - required_field_expression_1 - required_field_expression_2 + declared_field_expression_1 + declared_field_expression_2 ⋮ end ...which generates types definitions for the `NameV\$(n)` type (a `Legolas.AbstractRecord` subtype) and `NameV\$(n)SchemaVersion` type (an alias of `typeof(SchemaVersion("example.name", n))`), as well as the necessary definitions to overload relevant Legolas methods with specialized behaviors in accordance with -the declared required fields. +the declared fields. If the declared schema version has a parent, it should be specified via the optional `> ParentRecordType` clause. `ParentRecordType` should refer directly to an existing Legolas-generated record type. -Each `required_field_expression` specifies a required field of the declared schema version, and is an -expression of the form `field::F = rhs` where: +Each `declared_field_expression` declares a field of the schema version, and is an expression of the form +`field::F = rhs` where: - `field` is the corresponding field's name - `::F` denotes the field's type constraint (if elided, defaults to `::Any`). - `rhs` is the expression which produces `field::F` (if elided, defaults to `field`). -Accounting for all of the aforementioned allowed elisions, valid `required_field_expression`s include: +Accounting for all of the aforementioned allowed elisions, valid `declared_field_expression`s include: - `field::F = rhs` - `field::F` (interpreted as `field::F = field`) @@ -750,8 +755,8 @@ This macro will throw a `Legolas.SchemaVersionDeclarationError` if: - The provided `RecordType` does not follow the `\$(Prefix)V\$(n)` format, where `Prefix` was previously associated with a given schema by a prior `@schema` declaration. -- There are no required field expressions, duplicate required fields are declared, a given - required field expression is invalid. +- There are no declared field expressions, duplicate fields are declared, or a given declared + field expression is invalid. - (if a parent is specified) The `@version` declaration does not comply with its parent's `@version` declaration, or the parent hasn't yet been declared at all. @@ -760,7 +765,7 @@ Note that this macro expects to be evaluated within top-level scope. For more details and examples, please see `Legolas.jl/examples/tour.jl` and the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation. """ -macro version(record_type, required_fields_block) +macro version(record_type, declared_fields_block) # parse `record_type` if record_type isa Symbol parent_record_type = nothing @@ -776,31 +781,31 @@ macro version(record_type, required_fields_block) schema_prefix, schema_version_integer = x quoted_schema_prefix = Base.Meta.quot(schema_prefix) - # parse `required_fields_block` - required_field_statements = Any[] - if required_fields_block isa Expr && required_fields_block.head == :block && !isempty(required_fields_block.args) - required_field_statements = [f for f in required_fields_block.args if !(f isa LineNumberNode)] + # parse `declared_fields_block` + declared_field_statements = Any[] + if declared_fields_block isa Expr && declared_fields_block.head == :block && !isempty(declared_fields_block.args) + declared_field_statements = [f for f in declared_fields_block.args if !(f isa LineNumberNode)] end - isempty(required_field_statements) && return :(throw(SchemaVersionDeclarationError("malformed or missing declaration of required fields"))) - required_field_infos = RequiredFieldInfo[] - for stmt in required_field_statements + isempty(declared_field_statements) && return :(throw(SchemaVersionDeclarationError("malformed or missing field declaration(s)"))) + declared_field_infos = DeclaredFieldInfo[] + for stmt in declared_field_statements original_stmt = Base.Meta.quot(deepcopy(stmt)) try - push!(required_field_infos, _parse_required_field_info!(stmt)) + push!(declared_field_infos, _parse_declared_field_info!(stmt)) catch return :(throw(SchemaVersionDeclarationError("malformed `@version` field expression: ", $original_stmt))) end end - if !allunique(f.name for f in required_field_infos) - msg = string("cannot have duplicate field names in `@version` declaration; received: ", [f.name for f in required_field_infos]) + if !allunique(f.name for f in declared_field_infos) + msg = string("cannot have duplicate field names in `@version` declaration; received: ", [f.name for f in declared_field_infos]) return :(throw(SchemaVersionDeclarationError($msg))) end - invalid_field_names = filter!(fname -> startswith(string(fname), '_'), [f.name for f in required_field_infos]) + invalid_field_names = filter!(fname -> startswith(string(fname), '_'), [f.name for f in declared_field_infos]) if !isempty(invalid_field_names) msg = string("cannot have field name which start with an underscore in `@version` declaration: ", invalid_field_names) return :(throw(SchemaVersionDeclarationError($msg))) end - declared_field_names_types = Expr(:tuple, (:($(f.name) = $(esc(f.type))) for f in required_field_infos)...) + declared_field_names_types = Expr(:tuple, (:($(f.name) = $(esc(f.type))) for f in declared_field_infos)...) return quote if !isdefined((@__MODULE__), :__legolas_schema_name_from_prefix__) @@ -816,13 +821,13 @@ macro version(record_type, required_fields_block) if !isnothing(parent) declared_identifier = string(declared_identifier, '>', $Legolas.name(parent), '@', $Legolas.version(parent)) end - schema_version_declaration = declared_identifier => $(Base.Meta.quot(required_field_infos)) + schema_version_declaration = declared_identifier => $(Base.Meta.quot(declared_field_infos)) if $Legolas.declared(schema_version) && $Legolas.declaration(schema_version) != schema_version_declaration throw(SchemaVersionDeclarationError("invalid redeclaration of existing schema version; all `@version` redeclarations must exactly match previous declarations")) elseif parent isa $Legolas.SchemaVersion && $Legolas.name(parent) == schema_name throw(SchemaVersionDeclarationError("cannot extend from another version of the same schema")) - elseif parent isa $Legolas.SchemaVersion && !($Legolas._has_valid_child_field_types($declared_field_names_types, $Legolas.required_fields(parent))) + elseif parent isa $Legolas.SchemaVersion && !($Legolas._has_valid_child_field_types($declared_field_names_types, $Legolas.declared_fields(parent))) throw(SchemaVersionDeclarationError("declared field types violate parent's field types")) else Base.@__doc__($(Base.Meta.quot(record_type))) diff --git a/test/runtests.jl b/test/runtests.jl index 0719442..a3bc7a7 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -1,6 +1,6 @@ using Compat: current_exceptions using Legolas, Test, DataFrames, Arrow, UUIDs -using Legolas: SchemaVersion, @schema, @version, SchemaVersionDeclarationError, RequiredFieldInfo +using Legolas: SchemaVersion, @schema, @version, SchemaVersionDeclarationError, DeclaredFieldInfo @test_throws SchemaVersionDeclarationError("no prior `@schema` declaration found in current module") @version(TestV1, begin x end) @@ -284,8 +284,8 @@ end @testset "`Legolas.@version` and associated utilities for declared `Legolas.SchemaVersion`s" begin @testset "Legolas.SchemaVersionDeclarationError" begin - @test_throws SchemaVersionDeclarationError("malformed or missing declaration of required fields") eval(:(@version(NewV1, $(Expr(:block, LineNumberNode(1, :test)))))) - @test_throws SchemaVersionDeclarationError("malformed or missing declaration of required fields") @version(ChildV2, begin end) + @test_throws SchemaVersionDeclarationError("malformed or missing field declaration(s)") eval(:(@version(NewV1, $(Expr(:block, LineNumberNode(1, :test)))))) + @test_throws SchemaVersionDeclarationError("malformed or missing field declaration(s)") @version(ChildV2, begin end) @test_throws SchemaVersionDeclarationError("missing prior `@schema` declaration for `Unknown` in current module") @version(UnknownV1 > ChildV1, begin x end) @test_throws SchemaVersionDeclarationError("provided record type symbol is malformed: Child") @version(Child, begin x end) @test_throws SchemaVersionDeclarationError("provided record type symbol is malformed: Childv2") @version(Childv2, begin x end) @@ -329,11 +329,16 @@ end @test Legolas.identifier(GrandchildV1SchemaVersion()) == "test.grandchild@1>test.child@1>test.parent@1" end - @testset "Legolas.required_fields" begin - @test_throws Legolas.UnknownSchemaVersionError(undeclared) Legolas.required_fields(undeclared) - @test Legolas.required_fields(ParentV1SchemaVersion()) == (x=Vector, y=AbstractString) - @test Legolas.required_fields(ChildV1SchemaVersion()) == (x=Vector, y=AbstractString, z=Any) - @test Legolas.required_fields(GrandchildV1SchemaVersion()) == (x=Vector, y=String, z=Any, a=Int32) + @testset "Legolas.declared_fields" begin + @test_throws Legolas.UnknownSchemaVersionError(undeclared) Legolas.declared_fields(undeclared) + @test Legolas.declared_fields(ParentV1SchemaVersion()) == (x=Vector, y=AbstractString) + @test Legolas.declared_fields(ChildV1SchemaVersion()) == (x=Vector, y=AbstractString, z=Any) + @test Legolas.declared_fields(GrandchildV1SchemaVersion()) == (x=Vector, y=String, z=Any, a=Int32) + + # xref https://github.com/beacon-biosignals/Legolas.jl/pull/100 + @test Legolas.declared_fields(ParentV1SchemaVersion()) == (@test_deprecated Legolas.required_fields(ParentV1SchemaVersion())) + @test Legolas.declared_fields(ChildV1SchemaVersion()) == (@test_deprecated Legolas.required_fields(ChildV1SchemaVersion())) + @test Legolas.declared_fields(GrandchildV1SchemaVersion()) == (@test_deprecated Legolas.required_fields(GrandchildV1SchemaVersion())) end @testset "Legolas.find_violation + Legolas.complies_with + Legolas.validate" begin @@ -351,7 +356,7 @@ end (ParentV1SchemaVersion(), "test.parent@1")) msg = """ Tables.Schema violates Legolas schema `$id`: - - Could not find required field: `x` + - Could not find declared field: `x` Provided Tables.Schema: :a Int32 :y String @@ -369,8 +374,8 @@ end (ParentV1SchemaVersion(), "test.parent@1")) msg = """ Tables.Schema violates Legolas schema `$id`: - - Could not find required field: `x` - - Could not find required field: `y` + - Could not find declared field: `x` + - Could not find declared field: `y` Provided Tables.Schema: :a Int32 :z Any""" @@ -385,7 +390,7 @@ end let s = GrandchildV1SchemaVersion() msg = """ Tables.Schema violates Legolas schema `test.grandchild@1`: - - Could not find required field: `x` + - Could not find declared field: `x` - Incorrect type: `y` expected `<:String`, found `Bool` Provided Tables.Schema: :y Bool @@ -419,11 +424,12 @@ end @testset "Legolas.declaration" begin @test_throws Legolas.UnknownSchemaVersionError(undeclared) Legolas.declaration(undeclared) - @test Legolas.declaration(ParentV1SchemaVersion()) == ("test.parent@1" => [RequiredFieldInfo(:x, :Vector, false, :(x::Vector = x)), - RequiredFieldInfo(:y, :AbstractString, false, :(y::AbstractString = y))]) - @test Legolas.declaration(ChildV1SchemaVersion()) == ("test.child@1>test.parent@1" => [RequiredFieldInfo(:z, :Any, false, :(z::Any = z))]) - @test Legolas.declaration(GrandchildV1SchemaVersion()) == ("test.grandchild@1>test.child@1" => [RequiredFieldInfo(:a, :Int32, false, :(a::Int32 = round(Int32, a))), - RequiredFieldInfo(:y, :String, false, :(y::String = string(y[1:2])))]) + @test Legolas.declaration(ParentV1SchemaVersion()) == ("test.parent@1" => [DeclaredFieldInfo(:x, :Vector, false, :(x::Vector = x)), + DeclaredFieldInfo(:y, :AbstractString, false, :(y::AbstractString = y))]) + @test Legolas.declaration(ChildV1SchemaVersion()) == ("test.child@1>test.parent@1" => [DeclaredFieldInfo(:z, :Any, false, :(z::Any = z))]) + @test Legolas.declaration(GrandchildV1SchemaVersion()) == ("test.grandchild@1>test.child@1" => [DeclaredFieldInfo(:a, :Int32, false, :(a::Int32 = round(Int32, a))), + DeclaredFieldInfo(:y, :String, false, :(y::String = string(y[1:2])))]) + @test Legolas.DeclaredFieldInfo === Legolas.RequiredFieldInfo # xref https://github.com/beacon-biosignals/Legolas.jl/pull/100 end @testset "Legolas.record_type" begin