Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing order functions where missings is the smallest value #144

Merged
merged 16 commits into from
Apr 6, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ This package provides additional functionality for working with `missing` values
- `Missings.replace` to wrap a collection in a (possibly indexable) iterator replacing `missing` with another value
- `Missings.fail` to wrap a collection in a (possibly indexable) iterator throwing an error if `missing` is encountered
- `skipmissings` to loop through a collection of iterators excluding indices where any iterators are `missing`
- `missingsmallest(f)` to create a partial order function that treats `missing` as the smallest value and otherwise behaves like `f`
- `missingsmallest`: the standard `isless` function modified to treat `missing` as the smallest value rather than the largest one

## Contributing and Questions

Expand Down
88 changes: 86 additions & 2 deletions src/Missings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ module Missings

export allowmissing, disallowmissing, ismissing, missing, missings,
Missing, MissingException, levels, coalesce, passmissing, nonmissingtype,
skipmissings, emptymissing
skipmissings, emptymissing, missingsmallest

using Base: ismissing, missing, Missing, MissingException

Expand Down Expand Up @@ -514,4 +514,88 @@ julia> emptymissing(first)([1], 2)
"""
emptymissing(f) = (x, args...; kwargs...) -> isempty(x) ? missing : f(x, args...; kwargs...)

end # module
# Only for internal use. Allows dispatch over anonymous functions.
struct MissingSmallest{T}
lt::T
end

"""
missingsmallest(f)

Return a function of two arguments `x` and `y` that tests whether `x` is less
than `y` such that `missing` is always less than the other argument. In other
words, return a modified version of the partial order function `f` such that
`missing` is the smallest possible value, and all other non-`missing` values are
compared according to `f`.

The behavior of the standard `isless` function modified to treat `missing` as
the smallest value can be obtained by calling the 2-argument `missingsmallest(x,
y)` function. This is equivalent to `missingsmallest(isless)(x, y)`.

# Examples
```
julia> lengthmissing = passmissing(length)
julia> isshorter = missingsmallest((s1, s2) -> isless(lengthmissing(s1), lengthmissing(s2)))
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
julia> isshorter("short", "longstring")
true

julia> isshorter("longstring", "short")
false

julia> isshorter("", missing) # Is shorter than length 0?
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
false
```
"""
missingsmallest(f) = MissingSmallest(f)

"""
missingsmallest(x, y)

The standard partial order `isless` modified so that `missing` is always the
smallest possible value:
- If neither argument is `missing`, the function behaves exactly as `isless`.
- If `x` is `missing` the result will be `true` regardless of the value of `y`.
- If `y` is `missing` the result will be `false` regardless of the value of `x`.
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved

See also the 1-argument method which takes a partial ordering function (like
`isless`) and modifies it to treat `missing` as explained above. These functions
can be used together with sorting functions so that missing values are sorted
first. This is useful in particular so that when sorting in reverse order
missing values appear at the end.

# Examples
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
```
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
julia> v = [missing, 10, missing, 1, 2]
julia> sort(v, lt=missingsmallest)
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved

nalimilan marked this conversation as resolved.
Show resolved Hide resolved
5-element Vector{Union{Missing, Int64}}:
missing
missing
1
2
10

julia> sort(v, lt=missingsmallest, rev=true)

bkamins marked this conversation as resolved.
Show resolved Hide resolved
5-element Vector{Union{Missing, Int64}}:
10
2
1
missing
missing

julia> missingsmallest(missing, Inf)
true

julia> missingsmallest(-Inf, missing)
false

julia> missingsmallest(missing, missing)
true
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
```
"""
missingsmallest(x, y) = missingsmallest(isless)(x, y)

(ms::MissingSmallest)(x, y) = ismissing(y) ? false : ismissing(x) ? true : ms.lt(x, y)

end # module
27 changes: 27 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -257,4 +257,31 @@ struct CubeRooter end
@test emptymissing(fun)(3, 1, c=2) == (1, 2)
end

@testset "missingsmallest" begin
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
@test missingsmallest(missing, Inf) == true
@test missingsmallest(-Inf, missing) == false
@test missingsmallest(missing, missing) == false
@test missingsmallest(3, 4) == true
@test missingsmallest(-Inf, Inf) == true
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved

@test missingsmallest("a", "b") == true
@test missingsmallest("short", missing) == false
@test missingsmallest(missing, "") == true

@test missingsmallest((1, 2), (3, 4)) == true
@test missingsmallest((3, 4), (1, 2)) == false
@test missingsmallest(missing, (1e3, 1e4)) == true

# Compare strings by length, not lexicographically
lengthmissing = passmissing(length)
isshorter = missingsmallest((s1, s2) -> isless(lengthmissing(s1), lengthmissing(s2)))
alonsoC1s marked this conversation as resolved.
Show resolved Hide resolved
@test isshorter("short", "longstring") == true
@test isshorter("longstring", "short") == false
@test isshorter(missing, "short") == true
@test isshorter("", missing) == false

@test_throws MethodError missingsmallest(isless)(isless)
@test missingsmallest !== missingsmallest(isless)
end

end
Loading