Skip to content

Commit

Permalink
replace the usage of the term "required field" with "declared field" …
Browse files Browse the repository at this point in the history
…and clarify breaking changes for schema versions (#100)

* Expand on 'required' and 'breaking' additions in versioning

* Update schema-concepts.md

* required field -> declared field

* Update Project.toml

* Update src/schemas.jl

Co-authored-by: Phillip Alday <[email protected]>

---------

Co-authored-by: Phillip Alday <[email protected]>
  • Loading branch information
jrevels and palday authored Jul 18, 2023
1 parent 3e86ec0 commit b0d44cc
Show file tree
Hide file tree
Showing 7 changed files with 126 additions and 114 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Legolas"
uuid = "741b9549-f6ed-4911-9fbf-4a1c0c97f0cd"
authors = ["Beacon Biosignals, Inc."]
version = "0.5.12"
version = "0.5.13"

[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Legolas.name
Legolas.version
Legolas.identifier
Legolas.parent
Legolas.required_fields
Legolas.declared_fields
Legolas.declaration
Legolas.record_type
Legolas.schema_version_from_record
Expand Down
10 changes: 6 additions & 4 deletions docs/src/schema-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ While it is fairly established practice to [semantically version source code](ht

**Do not introduce a change to an existing schema version that might cause existing compliant data to become non-compliant; instead, incorporate the intended change in a new schema version whose version number is one greater than the previous version number.**

For example, a schema author must introduce a new schema version for any of the following changes:
A schema author must introduce a new schema version if any of the following changes are introduced:

- A new type-restricted required field is added to the schema.
- An existing required field's type restriction is tightened.
- An existing required field is renamed.
- A new type-constrained and/or value-constrained field is declared. In other words, for the introduction of a new declared field to be non-breaking, the new field's type constraint must be `::Any` and it may not feature a value-constraining or value-transforming assignment expression.
- An existing declared field's type or value constraints are tightened.
- An existing declared field is renamed.

If any of the above breaking changes are made to an existing schema version, instead of introducing a new schema version, subtle downstream breakage may occur. For example, if a new type/value-constrained field is declared, previously compliant tables containing a field with the same name might accidentally become non-compliant if existing values violate the new constraints. Similarly, downstream schema version extensions may have already declared a field with the same name, but with constraints that are incompatible with the new constraints.

One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@version(FooV1, ...)` and `@version(FooV2, ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema version and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments.

Expand Down
10 changes: 5 additions & 5 deletions docs/src/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ See [here](https://github.com/beacon-biosignals/Legolas.jl/pull/54) for a compre

* In Legolas v0.4, every `Legolas.Row` field's type was available as a type parameter of `Legolas.Row`; for example, the type of a field `y` specified as `y::Real` in a `Legolas.@row` declaration would be surfaced like `Legolas.Row{..., NamedTuple{(...,:y,...),Tuple{...,typeof(y),...}}`. In Legolas v0.5, the schema version author controls which fields have their types surfaced as type parameters in Legolas-generated record types via the `field::(<:F)` syntax in [`@version`](@ref).
* Additionally, to include type parameters associated to fields in a parent schema, they must be re-declared in the child schema. For example, the package LegolasFlux declares a `ModelV1` version with a field `weights::(<:Union{Missing,Weights})`. LegolasFlux includes an [example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L78-L80) with a schema extension `DigitsRowV1` which extends `ModelV1`. This `@version` call must re-declare the field `weights` to be parametric in order for the `DigitsRowV1` struct to also have a type parameter for this field.
* In Legolas v0.4, `@row`-generated `Legolas.Row` constructors accepted and propagated any non-schema-required fields provided by the caller. In Legolas v0.5, `@version`-generated record type constructors will discard any non-schema-required fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-required fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as required fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as required fields directly to the schema (likely declared as `::Union{Missing,T}` to allow them to be missing).

* In Legolas v0.4, `@row`-generated `Legolas.Row` constructors accepted and propagated any non-schema-declared fields provided by the caller. In Legolas v0.5, `@version`-generated record type constructors will discard any non-schema-declared fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-declared fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as declared fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as declared fields directly to the schema (likely declared as `::Union{Missing,T}` to allow them to be missing).
* Before Legolas v0.5, the documented guidance for schema authors surrounding new fields' impact on schema version breakage was misleading, implying that adding a new declared field to an existing schema version is non-breaking if the field's type allowed for `Missing` values. This is incorrect. For clarity, *adding a new declared field to an existing schema version is a breaking change unless the field's type and value are both completely unconstrained in the declaration*, i.e. the field's type constraint must be `::Any` and may not feature a value-constraining or value-transforming assignment expression.

## Deserializing old tables with Legolas v0.5

Generally, tables serialized with earlier versions of Legolas can be de-serialized with Legolas v0.5, making it only a "code-breaking" change, rather than a "data-breaking" change. However, it is strongly suggested to have reference tests with checked in (pre-Legolas v0.5) serialized tables which are deserialized and verified during the tests, in order to be sure.

Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined `Legolas.Row` values (i.e. a table that contains a row that has a field that is, itself, a `Legolas.Row` value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with `Legolas.read` using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users).
Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined `Legolas.Row` values (i.e. a table that contains a row that has a field that is, itself, a `Legolas.Row` value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with `Legolas.read` using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users).

If you would like to provide such a pathway, though:
If you would like to provide such a pathway, though:

Recall that under Legolas v0.4, `@row`-generated `Legolas.Row` constructors may accept and propagate arbitrary non-schema-required fields, whereas Legolas v0.5's `@version`-generated record types may only contain schema-required fields. Therefore, one must decide what to do with any non-required fields present in serialized `Legolas.Row` values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding `@version` declaration. For example, [this LegolasFlux example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L64-L84) uses a function `compat_config` to handle old `Legolas.Row` values, but does not add any handling for non-required fields, which will be discarded if present. If one did not want non-required fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like `extras::Union{Missing, NamedTuple}`).
Recall that under Legolas v0.4, `@row`-generated `Legolas.Row` constructors may accept and propagate arbitrary non-schema-declared fields, whereas Legolas v0.5's `@version`-generated record types may only contain schema-declared fields. Therefore, one must decide what to do with any non-declared fields present in serialized `Legolas.Row` values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding `@version` declaration. For example, [this LegolasFlux example](https://github.com/beacon-biosignals/LegolasFlux.jl/blob/53c677848c6b65e5158ef2d43dd5f7eab174892e/examples/digits.jl#L64-L84) uses a function `compat_config` to handle old `Legolas.Row` values, but does not add any handling for non-declared fields, which will be discarded if present. If one did not want non-declared fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like `extras::Union{Missing, NamedTuple}`).
Loading

2 comments on commit b0d44cc

@jrevels
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/87677

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.13 -m "<description of version>" b0d44cc019aad6d9daf38bc39ac2c6c9170306b1
git push origin v0.5.13

Please sign in to comment.