- Add
byrow(allequal)
as a special case ofbyrow(isequal)
. (This feature need at least Julia 1.8)
- Fix bug in stat routines - some corner cases
- Fix unnecessary allocations in stat routins
- Fix a performance issue in
sort
due to the recent change inThreads.@threads
. - Fix the allocation problem in computing
var
andstd
in fast path ofgatherby
. - Fix an issue with Julia-latest.
- Now we exploit multithreading during gathering observation for huge data sets.
- Fix a problem that was causing tests fail in Julia 1.9
- Fix an issue with
eltype
and the output ofeachcol
. Noweltype(::Type{<:DatasetColumns})
properly returnsAbstractDatasetColumn
instead ofAbstractVector
. - Fix a problem with
nonmissingtype
withUnion{}
output. - Fix an issue that was causing the join functions sort already-sorted data sets, issue #108
- Remove precompilation for Julia 1.9 - it causes enormous amount of allocation in precompiling and loading
- Now
IMD
throws errors when accesses a grouped data set which its parent is modified.
- Functions
searchsorted
,searchsortedfirst
, andsearchsortedlast
now works withDatasetColumn
- Fix a bug in
byrow(nunique)
- Fix a bug which caused
stable=true
being ignored ingatherby
, issue #100
- Add docstring for
groupby!
,groupby
, andgatherby
.
- Fix issue with
QuickSortAlg
in future version of Julia - Empty the rows of a
SubDataset
without columns - Fix a bug which causes
modify/combine
throw errors on columns with Vector{Vector} type
- Users can use
resize!
to resize a data set
- Fix function signature for some stat functions
- Update to
PrettyTables
version 2
- Fix a but in
byrow
for writing values of typeBigInt
- Update for
Julia
VERSION >= v"1.9.0-DEV.1635" - Fix a bug in
modify
which causes an error to show an error! - Fix a bug in
sort
which causes to treatBool
as a vector with length 1
topk
andtopkperm
useisless
by default for comparing values.- Fix a bug in
show
which causes ignoring format of a column when calculating the max width. - Better
show
forGroupBy/GatherBy
in Jupyter hcat!
keeps the format of the second data set.- Fix an issue in show with HTML MIME, issue #91
- Now
Jupyter
shows very wide data sets much faster, issue #82 - Add precompilation for Julia > 1.8
- The
topk
andtopkperm
functions supports two extra arguments:lt
andby
which by default are set as<
andidentity
, respectively topkperm
is a new function for outputting the indices of top(bottom) k values issue #67.topk
now supports anyDataType
, see issue #67.filter
,filter!
,delete
anddelete!
have a new keyword argument for controlling how the missing values should be interpreted issue #69
-
topk
now works onDatasetColumn
/SubDatasetColumn
. -
Stats functions throw
ArgumentError
when an empty vector is passed to them.
-
The
topk
andtopkperm
functions are multithreaded ready, i.e. users can passthreads = true
to these functions.- Now we use binary search for large values of k. This improves the performance of the functions in the worst case scenarios.
-
row_join!
allocates less whenmapformats=true
, thus, performs better. This directly affectsfilewriter
performance inDLMReader
.
- A new functionality has been added to
byrow
for passing a Tuple of column indices.byrow(ds, fun, cols)
callsfun.(ds[:, cols[1]], ds[:, cols[2]], ...)
whencols
is a NTuple of column indices.
- Fix type ambiguity in
filter/!
- Two new functions:
delete
anddelete!
. They should be compared tofilter
andfilter!
, respectively - issue #63 - Add
DLMReader
tosysimage
inIMD.create_sysimage
.
- Fix mistakes in
byrow(argmin)
andbyrow(argmax)
- pull #62
byrow(ds, t::DataType, col)
convert values ofcol
tot
.
- Fix an issue in
flatten/!
- columns with typeAny
. - Fix an issue with
IMD.create_sysimage
- issue #59 - Improve
eachgroup
- Drop support of
UInt16
inCharacters
-Characters
now only supports length
- Users now can choose between having the observations ids for the left data set and/or the right data set as part of the output data set.
- Add a new function
eachgroup
. It allows iteration over each group of a grouped data set. op
is a new keyword argument for theupdate/!
functions which allows passing a user defined function to control how the value of the main data set should be updated by the values from the transaction data set. (issue #55)- Supporting of the
mapformats
keyword argument inflatten/!
. Now users can flatten a data set based on the formatted values. (issue #57) - Support of the
threads
keyword argument inflatten/!
.
- The
combine
function will now work fine when a view of data set is passed - For the join functions the
makeunique
argument is now passed correctly to the inside functions. update
andupdate!
have the samemode
option by default.- Fix the problem with preserving format of
SubDataset
inflatten/!
- Fix the problem that caused
flatten!
to produce a copy of data when an empty data set were passed to it. - Fix the bug in
flatten!
related to flatten the first column. - Fix the bug in
flatten
that caused Segmentation fault for view of data sets.
- Faster
flatten/!
- The
outerjoin
function accepts thesource
keyword argument. - All join functions support
obs_id
option. This allows to output obs id for the matched pairs.- All join functions support
obs_id_name
for assigning column names forobs_id
.
- All join functions support
- The
leftjoin/!
,innerjoin
andouterjoin
functions supportmultiple_match
option. This indicates the rows in the left data set that has been repeated in the output data set due to multiple matches in the right data set.- All join functions support
multiple_match_name
for assigning the column name formultiple_match
.
- All join functions support
- The
compare
function is updated to support more complex comparisons(issue #53).- [BREAKING] the
on
keyword argument in previous versions is equivalent to thecols
keyword argument in version 0.7.0+. - The
compare
function can compare two data sets with different number of rows. - User can pass key columns to
compare
, via theon
keyword argument, for matching observations before comparing. - Few keyword arguments are added to
compare
for supporting new functionalities.
- [BREAKING] the
- The
maximum
andminimum
functions now work properly withString
columns.