Use PooledVector
for "categorical" columns like "bus type code" (buses.ide
)?
#9
Labels
idea
needs some investigation before we decide
Bus type code is either 1, 2, 3 or 4. Currently this is parsed into a
Vector{Int64}
. But this could potentially be more efficient in a couple ways:Int8
rather thanInt64
(technically i suppose we only need 2 bits, butInt8
is probably the smallest type it is actually practical for users to be given).Vector
(with NT=Int64
integer values), it could be stored as aPooledVector
(with NUInt8
values and a 4-elementUInt8 => T
values). And both options could be combined (e.g. pool and haveT=Int8
).Option 2 doesn't really sound worth it on storage-efficiency alone, but it could be worth it (i.e. provide practical performance improvements to users) depending on how the pooled columns (e.g. "bus type code") are going to be used, because certain operations (joins, mapping, ...) can be very efficient on PooledVectors (as they can work with the 4 pooled values, rather than all N entries).
So we should probably do 1 (i.e. change to
Int8
s) and then investigate how these columns will be used to decide about 2 (pooling).The text was updated successfully, but these errors were encountered: