The package is registered in the General
registry and so can be installed at the REPL with
] add GroupedArrays
.
GroupedArray
returns an AbstractArray
with integers corresponding to each group (or a missing
for groups with missing
).
using GroupedArrays
p = repeat(["a", "b", missing], outer = 2)
g = GroupedArray(p)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# missing
# 1
# 2
# missing
Use the keyword argument coalesce = true
to consider missing values as distinct
using GroupedArrays
p = repeat(["a", "b", missing], outer = 2)
g = GroupedArray(p; coalesce = true)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# 3
# 1
# 2
# 3
GroupedArray
can be used to compute groups across multiple vectors:
p1 = repeat(["a", "b"], outer = 3)
p2 = repeat(["d", "e"], inner = 3)
g = GroupedArray(p1, p2)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# 1
# 3
# 4
# 3
GroupedArrays is similar to PooledArray, except that the pool is simply the set of integers from 1 to n where n is the number of groups(missing
is encoded as 0). This allows for faster lookup in setups where the group value is not meaningful.
The algorithm to construct GroupedArrays
is taken from DataFrames.jl