Skip to content

Commit

Permalink
Print all schema violations in validate's error message (#85)
Browse files Browse the repository at this point in the history
* return all violations

* wip more tests

* update message format

* add test

* clean-up

* bump patch version

* Update test/runtests.jl

* update comments

* switch test to one that won't differ in error string between julia versions

* Update src/schemas.jl

Co-authored-by: Glenn Moynihan <[email protected]>

* review

* update docstrings

* Update src/schemas.jl

Co-authored-by: Glenn Moynihan <[email protected]>

* update tests for reword

* update tests for new error format

* clean up messages/tests

* refactor violation code to share

* remove extraneous return

* Apply suggestions from code review

Co-authored-by: Curtis Vogt <[email protected]>

* improve error message

* prepare for future deprecation of find_violation

* spacing parity

* fix docstring

* remove test that duplicates test coverage and is tricky across julia versions

* Use `find_violations` in `find_violation` (#87)

* Use `find_violations` in `find_violation`

* Add better guidance in deprecation

* Update documentation

* Drop fail_fast

* update tour example description

* revert refactor

* add back desired changes

* remove deprecation warning

* update docstrings

* cleanup

* Update src/schemas.jl

Co-authored-by: Curtis Vogt <[email protected]>

* Update src/schemas.jl

* Update src/schemas.jl

* add to api documentation in docs

---------

Co-authored-by: Glenn Moynihan <[email protected]>
Co-authored-by: Curtis Vogt <[email protected]>
Co-authored-by: Jarrett Revels <[email protected]>
  • Loading branch information
4 people authored Apr 20, 2023
1 parent 39bb280 commit 321b19c
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 36 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Legolas"
uuid = "741b9549-f6ed-4911-9fbf-4a1c0c97f0cd"
authors = ["Beacon Biosignals, Inc."]
version = "0.5.8"
version = "0.5.9"

[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Expand Down
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Legolas.record_type
Legolas.schema_version_from_record
Legolas.declared
Legolas.find_violation
Legolas.find_violations
Legolas.complies_with
Legolas.validate
Legolas.accepted_field_type
Expand Down
27 changes: 20 additions & 7 deletions examples/tour.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

using Legolas, Arrow, Tables, Test, UUIDs

using Legolas: @schema, @version, complies_with, find_violation, validate
using Legolas: @schema, @version, complies_with, find_violation, find_violations, validate

#####
##### Introduction
Expand Down Expand Up @@ -58,7 +58,7 @@ end
##### `Tables.Schema` Compliance/Validation
#####

# We can use `complies_with`, `validate`, and `find_violation` to check whether a given
# We can use `complies_with`, `validate`, and `find_violations` to check whether a given
# `Tables.Schema` (ref https://tables.juliadata.org/stable/#Tables.Schema) complies with
# `example.foo@1`.

Expand All @@ -71,12 +71,15 @@ for s in [Tables.Schema((:a, :b, :c, :d), (Real, String, Any, AbstractVector)),
# if `complies_with` finds a violation, it returns `false`; returns `true` otherwise
@test complies_with(s, FooV1SchemaVersion())

# if `validate` finds a violation, it throws an error indicating the violation;
# if `validate` finds any violations, it throws an error indicating the violations;
# returns `nothing` otherwise
@test validate(s, FooV1SchemaVersion()) isa Nothing

# if `find_violation` finds a violation, it returns a tuple indicating the relevant
# field and its violation; returns `nothing` otherwise
# `find_violations` returns a vector of all violations, if any, where each violation is a tuple indicating the relevant
# field and its violations
@test isempty(find_violations(s, FooV1SchemaVersion()))

# `find_violation` immediately returns the first violation found, returns `nothing` otherwise
@test isnothing(find_violation(s, FooV1SchemaVersion()))
end

Expand All @@ -86,11 +89,13 @@ s = Tables.Schema((:a, :c, :d), (Int, Float64, Vector)) # The required non-`>:Mi
@test !complies_with(s, FooV1SchemaVersion())
@test_throws ArgumentError validate(s, FooV1SchemaVersion())
@test isequal(find_violation(s, FooV1SchemaVersion()), :b => missing)
@test isequal(find_violations(s, FooV1SchemaVersion()), [:b => missing])

s = Tables.Schema((:a, :b, :c, :d), (Int, String, Float64, Any)) # The type of required field `d::AbstractVector` is not `<:AbstractVector`.
@test !complies_with(s, FooV1SchemaVersion())
@test_throws ArgumentError validate(s, FooV1SchemaVersion())
@test isequal(find_violation(s, FooV1SchemaVersion()), :d => Any)
@test isequal(find_violations(s, FooV1SchemaVersion()), [:d => Any])

# The expectations that characterize Legolas' particular notion of "schematic compliance" - requiring the
# presence of pre-specified declared fields, assuming non-present fields to be implicitly `missing`, and allowing
Expand Down Expand Up @@ -120,7 +125,7 @@ fields = (a=1.0, b="hi", c=π, d=[1, 2, 3])
#
# - ...contain the associated schema version's required fields in any order
# - ...elide required fields, in which case the constructor will assume them to be `missing`
# - ...contain any other fields in addition to the required fields; such additional fields are simply ignored
# - ...contain any other fields in addition to the required fields; such additional fields are simply ignored
# by the constructor and are not propagated through to the resulting record.
#
# Demonstrating a few of these properties:
Expand Down Expand Up @@ -367,7 +372,15 @@ msg = """
@test_throws ArgumentError(msg) Legolas.read(Arrow.tobuffer(table))
invalid = [Tables.rowmerge(row; k=string(row.k)) for row in table]
invalid_but_has_metadata = Arrow.tobuffer(invalid; metadata=("legolas_schema_qualified" => Legolas.identifier(BazV1SchemaVersion()),))
@test_throws ArgumentError("field `k` has unexpected type; expected <:Int64, found String") Legolas.read(invalid_but_has_metadata)
msg2 = """
Tables.Schema violates Legolas schema `example.baz@1`:
- Incorrect type: `k` expected `<:Int64`, found `String`
Provided Tables.Schema:
:x Int8
:y String
:z String
:k String"""
@test_throws ArgumentError(msg2) Legolas.read(invalid_but_has_metadata)

# A note about one additional benefit of `Legolas.read`/`Legolas.write`: Unlike their Arrow.jl counterparts,
# these functions are relatively agnostic to the types of provided path arguments. Generally, as long as a
Expand Down
84 changes: 62 additions & 22 deletions src/schemas.jl
Original file line number Diff line number Diff line change
Expand Up @@ -262,35 +262,62 @@ For required field `f::F` of `sv`:
Otherwise, return `nothing`.
See also: [`Legolas.validate`](@ref), [`Legolas.complies_with`](@ref)
To return all violations instead of just the first, use [`Legolas.find_violations`](@ref).
See also: [`Legolas.validate`](@ref), [`Legolas.complies_with`](@ref), [`Legolas.find_violations`](@ref).
"""
find_violation(::Tables.Schema, sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv))

function _find_violation end
"""
Legolas.find_violations(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Return a `Vector{Pair{Symbol,Union{Type,Missing}}}` of all of `ts`'s violations with respect to `sv`.
This function's notion of "violation" is defined by [`Legolas.find_violation`](@ref), which immediately returns the first violation found; prefer to use that function instead of `find_violations` in situations where you only need to detect *any* violation instead of *all* violations.
See also: [`Legolas.validate`](@ref), [`Legolas.complies_with`](@ref), [`Legolas.find_violation`](@ref).
"""
find_violations(::Tables.Schema, sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv))

"""
Legolas.validate(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Throws a descriptive `ArgumentError` if `!isnothing(find_violation(ts, sv))`,
otherwise return `nothing`.
Throws a descriptive `ArgumentError` if any violations are found, else return `nothing`.
See also: [`Legolas.find_violation`](@ref), [`Legolas.complies_with`](@ref)
See also: [`Legolas.find_violation`](@ref), [`Legolas.find_violations`](@ref), [`Legolas.find_violation`](@ref), [`Legolas.complies_with`](@ref)
"""
function validate(ts::Tables.Schema, sv::SchemaVersion)
result = find_violation(ts, sv)
isnothing(result) && return nothing
field, violation = result
ismissing(violation) && throw(ArgumentError("could not find expected field `$field` in $ts"))
expected = getfield(required_fields(sv), field)
throw(ArgumentError("field `$field` has unexpected type; expected <:$expected, found $violation"))
results = find_violations(ts, sv)
isempty(results) && return nothing

field_err = Symbol[]
type_err = Tuple{Symbol,Type,Type}[]
for result in results
field, violation = result
if ismissing(violation)
push!(field_err, field)
else
expected = getfield(required_fields(sv), field)
push!(type_err, (field, expected, violation))
end
end
err_msg = "Tables.Schema violates Legolas schema `$(string(name(sv), "@", version(sv)))`:\n"
for err in field_err
err_msg *= " - Could not find required field: `$err`\n"
end
for (field, expected, violation) in type_err
err_msg *= " - Incorrect type: `$field` expected `<:$expected`, found `$violation`\n"
end
err_msg *= "Provided $ts"
throw(ArgumentError(err_msg))
end

"""
Legolas.complies_with(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Return `isnothing(find_violation(ts, sv))`.
See also: [`Legolas.find_violation`](@ref), [`Legolas.validate`](@ref)
See also: [`Legolas.find_violation`](@ref), [`Legolas.find_violations`](@ref), [`Legolas.validate`](@ref)
"""
complies_with(ts::Tables.Schema, sv::SchemaVersion) = isnothing(find_violation(ts, sv))

Expand Down Expand Up @@ -440,19 +467,32 @@ function _generate_schema_version_definitions(schema_version::SchemaVersion, par
end

function _generate_validation_definitions(schema_version::SchemaVersion)
field_violation_check_statements = Expr[]
for (fname, ftype) in pairs(required_fields(schema_version))
fname = Base.Meta.quot(fname)
push!(field_violation_check_statements, quote
S = $Legolas.accepted_field_type(sv, $ftype)
result = $Legolas._check_for_expected_field(ts, $fname, S)
isnothing(result) || return $fname => result
end)
# When `fail_fast == true`, return first violation found rather than all violations
_violation_check = (; fail_fast::Bool) -> begin
statements = Expr[]
violations = gensym()
fail_fast || push!(statements, :($violations = Pair{Symbol,Union{Type,Missing}}[]))
for (fname, ftype) in pairs(required_fields(schema_version))
fname = Base.Meta.quot(fname)
found = :($fname => result)
handle_found = fail_fast ? :(return $found) : :(push!($violations, $found))
push!(statements, quote
S = $Legolas.accepted_field_type(sv, $ftype)
result = $Legolas._check_for_expected_field(ts, $fname, S)
isnothing(result) || $handle_found
end)
end
push!(statements, :(return $(fail_fast ? :nothing : violations)))
return statements
end
return quote
function $(Legolas).find_violation(ts::$(Tables).Schema, sv::$(Base.Meta.quot(typeof(schema_version))))
$(field_violation_check_statements...)
return nothing
$(_violation_check(; fail_fast=true)...)
end

# Multiple violation reporting
function $(Legolas).find_violations(ts::$(Tables).Schema, sv::$(Base.Meta.quot(typeof(schema_version))))
$(_violation_check(; fail_fast=false)...)
end
end
end
Expand Down
50 changes: 44 additions & 6 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -315,24 +315,62 @@ end
# functions work as expected w.r.t. schema extension in particular.

t = Tables.Schema((:a, :y, :z), (Int32, String, Any))
for s in (GrandchildV1SchemaVersion(), ChildV1SchemaVersion(), ParentV1SchemaVersion())
@test_throws ArgumentError("could not find expected field `x` in $t") Legolas.validate(t, s)
for (s, id) in ((GrandchildV1SchemaVersion(), "test.grandchild@1"),
(ChildV1SchemaVersion(), "test.child@1"),
(ParentV1SchemaVersion(), "test.parent@1"))
msg = """
Tables.Schema violates Legolas schema `$id`:
- Could not find required field: `x`
Provided Tables.Schema:
:a Int32
:y String
:z Any"""
@test_throws ArgumentError(msg) Legolas.validate(t, s)
@test !Legolas.complies_with(t, s)
@test isequal(Legolas.find_violation(t, s), :x => missing)
@test isequal(Legolas.find_violations(t, s), [:x => missing])
end

t = Tables.Schema((:x, :a, :y), (ComplexF64, Int32, String))
for s in (GrandchildV1SchemaVersion(), ChildV1SchemaVersion(), ParentV1SchemaVersion())
@test_throws ArgumentError("field `x` has unexpected type; expected <:$(Vector), found $(Complex{Float64})") Legolas.validate(t, s)
# Multiple missing field violations
t = Tables.Schema((:a, :z), (Int32, Any))
for (s, id) in ((GrandchildV1SchemaVersion(), "test.grandchild@1"),
(ChildV1SchemaVersion(), "test.child@1"),
(ParentV1SchemaVersion(), "test.parent@1"))
msg = """
Tables.Schema violates Legolas schema `$id`:
- Could not find required field: `x`
- Could not find required field: `y`
Provided Tables.Schema:
:a Int32
:z Any"""
@test_throws ArgumentError(msg) Legolas.validate(t, s)
@test !Legolas.complies_with(t, s)
@test isequal(Legolas.find_violation(t, s), :x => ComplexF64)
@test isequal(Legolas.find_violation(t, s), :x => missing)
@test isequal(Legolas.find_violations(t, s), [:x => missing, :y => missing])
end

# Multiple violations, both missing fields and type error
t = Tables.Schema((:y, :a), (Bool, Int32))
let s = GrandchildV1SchemaVersion()
msg = """
Tables.Schema violates Legolas schema `test.grandchild@1`:
- Could not find required field: `x`
- Incorrect type: `y` expected `<:String`, found `Bool`
Provided Tables.Schema:
:y Bool
:a Int32"""
@test_throws ArgumentError(msg) Legolas.validate(t, s)
@test !Legolas.complies_with(t, s)
@test isequal(Legolas.find_violation(t, s), :x => missing)
@test isequal(Legolas.find_violations(t, s), [:x => missing, :y => Bool])
end

t = Tables.Schema((:x, :a, :y), (Vector, Int32, String))
for s in (GrandchildV1SchemaVersion(), ChildV1SchemaVersion(), ParentV1SchemaVersion())
@test isnothing(Legolas.validate(t, s))
@test Legolas.complies_with(t, s)
@test isnothing(Legolas.find_violation(t, s))
@test isempty(Legolas.find_violations(t, s))
end

for T in (UUID, UInt128), S in (Symbol, String)
Expand Down

2 comments on commit 321b19c

@hannahilea
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/82002

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.5.9 -m "<description of version>" 321b19c88c5c7e07c616278f2aaf876495533667
git push origin v0.5.9

Please sign in to comment.