Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull 1.14.10 into master #5814

Merged
merged 5 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ S3method(as.IDate, Date)
S3method(as.IDate, POSIXct)
S3method(as.IDate, default)
S3method(as.IDate, numeric)
S3method(as.IDate, IDate)
S3method(as.ITime, character)
S3method(as.ITime, default)
S3method(as.ITime, POSIXct)
Expand Down
19 changes: 16 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
**If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.**

# data.table [v1.14.99](https://github.com/Rdatatable/data.table/milestone/20) (in development)
# data.table [v1.14.99](https://github.com/Rdatatable/data.table/milestone/29) (in development)

## BREAKING CHANGE

Expand Down Expand Up @@ -610,6 +610,19 @@
15. Thanks to @ssh352, Václav Tlapák, Cole Miller, András Svraka and Toby Dylan Hocking for reporting and bisecting a significant performance regression in dev. This was fixed before release thanks to a PR by Jan Gorecki, [#5463](https://github.com/Rdatatable/data.table/pull/5463).


# data.table [v1.14.10](https://github.com/Rdatatable/data.table/milestone/20?closed=1) (8 Dec 2023)

## NOTES

1. Maintainer of the package for CRAN releases is from now on Tyson Barrett (@tysonstanley), [#5710](https://github.com/Rdatatable/data.table/issues/5710).

2. Updated internal code for breaking change of `is.atomic(NULL)` in R-devel, [#5691](https://github.com/Rdatatable/data.table/pull/5691). Thanks to Martin Maechler for the patch.

3. Fix multiple test concerning coercion to missing complex numbers, [#5695](https://github.com/Rdatatable/data.table/issues/5695) and [#5748](https://github.com/Rdatatable/data.table/issues/5748). Thanks to @MichaelChirico and @ben-schwen for the patches.

4. Fix multiple format warnings (e.g., -Wformat) [#5712](https://github.com/Rdatatable/data.table/pull/5712), [#5781](https://github.com/Rdatatable/data.table/pull/5781), [#5880](https://github.com/Rdatatable/data.table/pull/5800), [#5786](https://github.com/Rdatatable/data.table/pull/5786). Thanks to @MichaelChirico and @jangorecki for the patches.


# data.table [v1.14.8](https://github.com/Rdatatable/data.table/milestone/28?closed=1) (17 Feb 2023)

## NOTES
Expand Down Expand Up @@ -736,7 +749,7 @@

## NOTES

1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://h2o.ai/blog/behind-the-scenes-of-cran/).
1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://h2o.ai/blog/2016/behind-the-scenes-of-cran/).

2. `as.Date.IDate` is no longer exported as a function to solve a new error in R-devel `S3 method lookup found 'as.Date.IDate' on search path`, [#4777](https://github.com/Rdatatable/data.table/issues/4777). The S3 method is still exported; i.e. `as.Date(x)` will still invoke the `as.Date.IDate` method when `x` is class `IDate`. The function had been exported, in addition to exporting the method, to solve a compatibility issue with `zoo` (and `xts` which uses `zoo`) because `zoo` exports `as.Date` which masks `base::as.Date`. Happily, since zoo 1.8-1 (Jan 2018) made a change to its `as.IDate`, the workaround is no longer needed.

Expand Down Expand Up @@ -1008,7 +1021,7 @@ has a better chance of working on Mac.
* `colClasses` now supports `'complex'`, `'raw'`, `'Date'`, `'POSIXct'`, and user-defined classes (so long as an `as.` method exists), [#491](https://github.com/Rdatatable/data.table/issues/491) [#1634](https://github.com/Rdatatable/data.table/issues/1634) [#2610](https://github.com/Rdatatable/data.table/issues/2610). Any error during coercion results in a warning and the column is left as the default type (probably `"character"`). Thanks to @hughparsonage for the PR.
* `stringsAsFactors=0.10` will factorize any character column containing under `0.10*nrow` unique strings, [#2025](https://github.com/Rdatatable/data.table/issues/2025). Thanks to @hughparsonage for the PR.
* `colClasses=list(numeric=20:30, numeric="ID")` will apply the `numeric` type to column numbers `20:30` as before and now also column name `"ID"`; i.e. all duplicate class names are now respected rather than only the first. This need may arise when specifying some columns by name and others by number, as in this example. Thanks to @hughparsonage for the PR.
* gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (https://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/leeper/rio) for the inspiration and @MichaelChirico for implementing.
* gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (https://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/gesistsa/rio) for the inspiration and @MichaelChirico for implementing.
* `select` can now be used to specify types for just the columns selected, [#1426](https://github.com/Rdatatable/data.table/issues/1426). Just like `colClasses` it can be a named vector of `colname=type` pairs, or a named `list` of `type=col(s)` pairs. For example:

```R
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![CRAN status](https://badges.cranchecks.info/flavor/release/data.table.svg)](https://cran.r-project.org/web/checks/check_results_data.table.html)
[![R-CMD-check](https://github.com/Rdatatable/data.table/workflows/R-CMD-check/badge.svg)](https://github.com/Rdatatable/data.table/actions)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/kayjdh5qtgymhoxr/branch/master?svg=true)](https://ci.appveyor.com/project/Rdatatable/data-table)
[![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://codecov.io/github/Rdatatable/data.table?branch=master)
[![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://app.codecov.io/github/Rdatatable/data.table?branch=master)
[![GitLab CI build status](https://gitlab.com/Rdatatable/data.table/badges/master/pipeline.svg)](https://gitlab.com/Rdatatable/data.table/-/pipelines)
[![downloads](https://cranlogs.r-pkg.org/badges/data.table)](https://www.rdocumentation.org/trends)
[![CRAN usage](https://jangorecki.gitlab.io/rdeps/data.table/CRAN_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps)
Expand Down
4 changes: 4 additions & 0 deletions vignettes/datatable-faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ knitr::opts_chunk$set(
tidy = FALSE,
cache = FALSE,
collapse = TRUE)
.old.th = setDTthreads(1)
```

The first section, Beginner FAQs, is intended to be read in order, from start to finish. It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow.
Expand Down Expand Up @@ -615,3 +616,6 @@ Sure. You're more likely to get a faster answer from the Issues page or Stack Ov

Please see [this answer](https://stackoverflow.com/a/10529888/403310).

```{r, echo=FALSE}
setDTthreads(.old.th)
```
4 changes: 4 additions & 0 deletions vignettes/datatable-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ knitr::opts_chunk$set(
cache = FALSE,
collapse = TRUE
)
.old.th = setDTthreads(1)
```

This vignette introduces the `data.table` syntax, its general form, how to *subset* rows, *select and compute* on columns, and perform aggregations *by group*. Familiarity with `data.frame` data structure from base R is useful, but not essential to follow this vignette.
Expand Down Expand Up @@ -651,3 +652,6 @@ We will see how to *add/update/delete* columns *by reference* and how to combine

***

```{r, echo=FALSE}
setDTthreads(.old.th)
```
6 changes: 6 additions & 0 deletions vignettes/datatable-keys-fast-subset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ knitr::opts_chunk$set(
tidy = FALSE,
cache = FALSE,
collapse = TRUE)
.old.th = setDTthreads(1)
```

This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first.
Expand Down Expand Up @@ -494,3 +495,8 @@ In this vignette, we have learnt another method to subset rows in `i` by keying
* combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before.

Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next vignette, we will address this using a *new* feature -- *secondary indexes*.


```{r, echo=FALSE}
setDTthreads(.old.th)
```
5 changes: 5 additions & 0 deletions vignettes/datatable-reference-semantics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ knitr::opts_chunk$set(
tidy = FALSE,
cache = FALSE,
collapse = TRUE)
.old.th = setDTthreads(1)
```
This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the *"Introduction to data.table"* vignette first.

Expand Down Expand Up @@ -348,6 +349,10 @@ However we could improve this functionality further by *shallow* copying instead

* We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference.

```{r, echo=FALSE}
setDTthreads(.old.th)
```

#

So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to perform *blazing fast subsets* by *keying data.tables*.
Expand Down
5 changes: 5 additions & 0 deletions vignettes/datatable-reshape.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ knitr::opts_chunk$set(
tidy = FALSE,
cache = FALSE,
collapse = TRUE)
.old.th = setDTthreads(1)
```

This vignette discusses the default usage of reshaping functions `melt` (wide to long) and `dcast` (long to wide) for *data.tables* as well as the **new extended functionalities** of melting and casting on *multiple columns* available from `v1.9.6`.
Expand Down Expand Up @@ -314,6 +315,10 @@ DT.c2

You can also provide *multiple functions* to `fun.aggregate` to `dcast` for *data.tables*. Check the examples in `?dcast` which illustrates this functionality.

```{r, echo=FALSE}
setDTthreads(.old.th)
```

#

***
5 changes: 5 additions & 0 deletions vignettes/datatable-sd-usage.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ knitr::opts_chunk$set(
out.width = '100%',
dpi = 144
)
.old.th = setDTthreads(1)
```

This vignette will explain the most common ways to use the `.SD` variable in your `data.table` analyses. It is an adaptation of [this answer](https://stackoverflow.com/a/47406952/3576984) given on StackOverflow.
Expand Down Expand Up @@ -254,3 +255,7 @@ abline(v = overall_coef, lty = 2L, col = 'red')
While there is indeed a fair amount of heterogeneity, there's a distinct concentration around the observed overall value.

The above is just a short introduction of the power of `.SD` in facilitating beautiful, efficient code in `data.table`!

```{r, echo=FALSE}
setDTthreads(.old.th)
```
6 changes: 6 additions & 0 deletions vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ knitr::opts_chunk$set(
tidy = FALSE,
cache = FALSE,
collapse = TRUE)
.old.th = setDTthreads(1)
```

This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the *"Introduction to data.table"*, *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first.
Expand Down Expand Up @@ -325,3 +326,8 @@ In recent version we extended auto indexing to expressions involving more than o
We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, *"Joins and rolling joins"*.

***

```{r, echo=FALSE}
setDTthreads(.old.th)
```

Loading