-
Improve
sort
keyword argument ingroupby
(#2812).In the
groupby
function thesort
keyword argument now allows three values:nothing
(the default) leaves the order of groups undefined and allowsgroupby
to pick the fastest available grouping algorithm;true
sorts groups by key columns;false
creates groups in the order of their appearance in the parent data frame;
In previous versions, the
sort
keyword argument allowed onlyBool
values andfalse
(which was the default) corresponded to the new behavior whennothing
is passed. Therefore only the user visible change affecting existing code is whensort=false
is passed explicitly. The order of groups was undefined in that case, but in practice groups were already created in their order of appearance, except when grouping columns implemented theDataAPI.refpool
API (notablyPooledArray
andCategoricalArray
) or when they contained only integers in a small range. (#2812) -
the
unstack
function receives new keyword argumentfill
(withmissing
default) that is used to fill combinations of not encountered rows and columns. This feature allows to distinguish between missings in value column and just missing row/column combinations and to easily fill with zeros non existing combinations in case of counting. (#2828) -
Allow adding new columns to a
SubDataFrame
created with:
as column selector (#2794).If
sdf
is aSubDataFrame
created with:
as a column selector theninsertcols!
,setindex!
, and broadcasted assignment allow for creation of new columns, automatically filling filtered-out rows withmissing
values; -
Allow replacing existing columns in a
SubDataFrame
with!
as row selector in assignment and broadcasted assignment (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied.
-
Allow
SubDataFrame
to be passed as an argument toselect!
andtransform!
(also onGroupedDataFrame
created from aSubDataFrame
) (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied. In case of creation of new columns, filtered-out rows are automatically filled with
missing
values. IfSubDataFrame
was not created with:
as column selector the resulting operation must produce the same column names as stored in the sourceSubDataFrame
or an error is thrown. -
Tables.materializer
when passed the following types or their subtypes:AbstractDataFrame
,DataFrameRows
,DataFrameColumns
returnsDataFrame
. (#2839) -
the
insertcols!
function receives new keyword argumentafter
(withfalse
default) that specifies if columns should be inserted after or beforecol
. (#2829) -
leftjoin!
performing a left join of two data frame objects by updating the left data frame with the joined columns from right data frame. (#2843) -
the
DataFrame
constructor when column names are passed to it as a second argument now determines if a passed vector of column names is valid based on its contents and not element type (#2859) -
the
DataFrame
constructor when matrix is passed to it as a first argument now allowscopycols
keyword argument (#2859)
- fix a problem with
unstack
on empty data frame (#2842)
- fix a bug in
crossjoin
if the first argument isSubDataFrame
andmakeunique=true
(#2826)
- Add workaround for
deleteat!
bug in Julia Base indelete!
function (#2820)
- add option
matchmissing=:notequal
in joins; inleftjoin
,semijoin
andantijoin
missings are dropped in right data frame, but preserved in left; inrightjoin
missings are dropped in left data frame, but preserved in right; ininnerjoin
missings are dropped in both data frames; inouterjoin
this value of keyword argument is not supported (#2724) - correctly handle selectors of the form
:col => AsTable
and:col => cols
by expanding a single column into multiple columns (#2780) - if
subset!
is passed aGroupedDataFrame
the grouping in the passed object gets updated to reflect rows removed from the parent data frame (#2809)
- fix bug in how
groupby
handles grouping of float columns; now-0.0
is treated as not integer when deciding on which grouping algorithm should be used (#2791) - fix bug in how
issorted
handles custom orderings and improve performance of sorting when complex custom orderings are passed (#2746) - fix bug in
combine
,select
,select!
,transform
, andtransform!
that incorrectly disallowed matrices ofPair
s inGroupedDataFrame
processing (#2782) - fix location of summary in
text/html
output (#2801)
SubDataFrame
,filter!
,unique!
,getindex
,delete!
,leftjoin
,rightjoin
, andouterjoin
are now more efficient if rows selected in internal operations form a continuous block (#2727, #2769)
hcat
of a data frame with a vector is now deprecated to allow consistent handling of horizontal concatenation of data frame with Tables.jl tables in the future (#2777)
text/plain
rendering of columns containing complex numbers is now improved (#2756)- in
text/html
display of a data frame show full type information when hovering over the shortened type with a mouse (#2774)
- fix performance issue when aggregation function produces multiple rows in split-apply-combine (2749)
completecases
is now optimized and only processes columns that can contain missing values; additionally it is now type stable and always returns aBitVector
(#2726)- fix performance bottleneck when displaying wide tables (#2750)
- make sure
subset
checks if the passed condition function returns a vector of values (in the 1.0 release also returning scalartrue
,false
, ormissing
was allowed which was unintended and error prone) (#2744)
- fix of performance issue of
groupby
when using multi-threading (#2736) - fix of performance issue of
groupby
when usingPooledVector
(2733)
- No breaking changes are planned for v1.0 release
- DataFrames.jl now checks that passed columns are 1-based as this is a current design assumption (#2594)
mapcols!
makes sure not to create columns beingAbstractRange
consistently with other methods that add columns to aDataFrame
(#2594)transform
andtransform!
always copy columns when column renaming transformation is passed. If similar issues are identified after 1.0 release (i.e. that a copy of data is not made in scenarios where it normally should be made these will be considered bugs and fixed as non-breaking changes) (#2721)
firstindex
,lastindex
,size
,ndims
, andaxes
are now consistently defined and documented in the manual forAbstractDataFrame
,DataFrameRow
,DataFrameRows
,DataFrameColumns
,GroupedDataFrame
,GroupKeys
, andGroupKey
(#2573)- add
subset
andsubset!
functions that allow to subset rows (#2496) names
now allows passing a predicate as a column selector (#2417)vcat
now allows asource
keyword argument that specifies the additional column to be added in the last position in the resulting data frame that will identify the source data frame. (#2649)GroupKey
andDataFrameRow
are consistently behaving likeNamedTuple
in comparisons and they now implement:hash
,==
,isequal
,<
,isless
(#2669])- since Julia 1.7 using broadcasting assignment on a
DataFrame
column selected as a property (e.g.df.col .= 1
) is allowed when column does not exist and it allocates a fresh column (#2655) delete!
now correctly handles the case when columns of a data frame are aliased (#2690)
- in
leftjoin
,rightjoin
, andouterjoin
theindicator
keyword argument is deprecated in favor ofsource
keyword argument;indicator
will be removed in 2.0 release (2649) - Using broadcasting assignment on a
SubDataFrames
column selected as a property (e.g.sdf.col .= 1
) is deprecated; it will be disallowed in the future. (#2655) - Broadcasting assignment to an existing column of a
DataFrame
selected as a property (e.g.df.col .= 1
) being an in-place operation is deprecated. It will allocate a fresh column in the future (#2655) - all deprecations present in 0.22 release now throw an error
(#2554);
in particular
convert
methods,map
onGroupedDataFrame
that were deprecated in 0.22.6 release now throw an error (#2679)
innerjoin
,leftjoin
,rightjoin
,outerjoin
,semijoin
, andantijoin
are now much faster and check if passed data frames are sorted by theon
columns and take into account if shorter data frame that is joined has unique values inon
columns. These aspects of input data frames might affect the order of rows produced in the output (#2612, #2622)DataFrame
constructor,copy
,getindex
,select
,select!
,transform
,transform!
,combine
,sort
, and join functions now use multiple threads in selected operations (#2647, #2588, #2574, #2664)
convert
methods fromAbstractDataFrame
,DataFrameRow
andGroupKey
toArray
,Matrix
,Vector
andTuple
, as well as fromAbstractDict
toDataFrame
, are now deprecated: use corresponding constructors instead. The only conversions that are retained areconvert(::Type{NamedTuple}, dfr::DataFrameRow)
,convert(::Type{NamedTuple}, key::GroupKey)
, andconvert(::Type{DataFrame}, sdf::SubDataFrame)
; the deprecated methods will be removed in 1.0 release- as a bug fix
eltype
of vector returned byeachrow
is nowDataFrameRow
(#2662) - applying
map
toGroupedDataFrame
is now deprecated. It will be an error in 1.0 release. (#2662) copycols
keyword argument is now respected when building aDataFrame
fromTables.CopiedColumns
(#2656)
- the rules for transformations passed to
select
/select!
,transform
/transform!
, andcombine
have been made more flexible; in particular now it is allowed to return multiple columns from a transformation function (#2461 and #2481) - CategoricalArrays.jl is no longer reexported: call
using CategoricalArrays
to use it #2404. In the same vein, thecategorical
andcategorical!
functions have been deprecated in favor oftransform(df, cols .=> categorical .=> cols)
and similar syntaxes #2394.stack
now creates aPooledVector{String}
variable column rather than aCategoricalVector{String}
column by default; passvariable_eltype=CategoricalValue{String}
to get the previous behavior (#2391) isless
forDataFrameRow
s now checks column names (#2292)DataFrameColumns
is now not a subtype ofAbstractVector
(#2291)nunique
is not reported now bydescribe
by default (#2339)- stop reordering columns of the parent in
transform
andtransform!
; always generate columns that were specified to be computed even forGroupedDataFrame
with zero rows (#2324) - improve the rule for automatically generated column names in
combine
/select(!)
/transform(!)
with composed functions (#2274) :nmissing
indescribe
now produces0
if the column does not allow missing values; earliernothing
was produced in this case (#2360)- fast aggregation functions in for
GroupedDataFrame
now correctly choose the fast path only when it is safe; this resolves inconsistencies with what the same functions not using fast path produce (#2357) - joins now return
PooledVector
notCategoricalVector
in indicator column (#2505) GroupKeys
now supportsin
forGroupKey
,Tuple
,NamedTuple
and dictionaries (2392)- in
describe
the specification of custom aggregation is nowfunction => name
; oldname => function
order is now deprecated (#2401) - in joins passing
NaN
or real or imaginary-0.0
inon
column now throws an error; passingmissing
thows an error unlessmatchmissing=:equal
keyword argument is passed (#2504) unstack
now produces row and column keys in the order of their first appearance and has two new keyword argumentsallowmissing
andallowduplicates
(#2494)- PrettyTables.jl is now the
default back-end to print DataFrames to text/plain; the print option
splitcols
was removed and the output format was changed (#2429)
- add
filter
toGroupedDataFrame
(#2279) - add
empty
andempty!
function forDataFrame
that remove all rows from it, but keep columns (#2262) - make
indicator
keyword argument in joins allow passing a string (#2284, #2296) - add new functions to
GroupKey
API to make it more consistent withDataFrameRow
(#2308) - allow column renaming in joins (#2313 and (#2398)
- add
rownumber
toDataFrameRow
(#2356) - allow passing column name to specify the position where a new columns should be
inserted in
insertcols!
(#2365) - allow
GroupedDataFrame
s to be indexed using a dictionary, which can useSymbol
or string keys and are not dependent on the order of keys. (#2281) - add
isapprox
method to check for approximate equality between two dataframes (#2373) - add
columnindex
forDataFrameRow
(#2380) names
now acceptsType
as a column selector (#2400)select
,select!
,transform
,transform!
andcombine
now allowrenamecols
keyword argument that makes it possible to avoid adding transformation function name as a suffix in automatically generated column names (#2397)filter
,sort
,dropmissing
, andunique
now support aview
keyword argument which if set totrue
makes them retun aSubDataFrame
view into the passed data frame.- add
only
method forAbstractDataFrame
(#2449) - passing empty sets of columns in
filter
/filter!
and inselect
/transform
/combine
withByRow
is now accepted (#2476) - add
permutedims
method forAbstractDataFrame
(#2447) - add support for
Cols
from DataAPI.jl (#2495)
DataFrame!
is now deprecated (#2338)- several in-standard
DataFrame
constructors are now deprecated (#2464) - all old deprecations now throw an error (#2350)
- Tables.jl version 1.2 is now required.
- DataAPI.jl version 1.4 is now required. It implies that
All(args...)
is deprecated andCols(args...)
is recommended instead.All()
is still supported.
- Documentation is now available also in Dark mode (#2315)
- add rich display support for Markdown cell entries in HTML and LaTeX (#2346)
- limit the maximal display width the output can use in
text/plain
before being truncated (in thetextwidth
sense, excluding…
) to32
per column by default and fix a corner case when no columns are printed in situations when they are too wide (#2403) - Common methods are now precompiled to improve responsiveness the first time a method is called in a Julia session. Precompilation takes up to 30 seconds after installing the package (#2456).