Skip to content

Commit

Permalink
Merge branch 'master' into c_api_maintain
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelChirico authored Jul 11, 2024
2 parents e577e3b + 5a091b1 commit 2006179
Show file tree
Hide file tree
Showing 14 changed files with 89 additions and 86 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/R-CMD-check-occasional.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
on:
schedule:
- cron: '17 13 9 * *' # 9th of month at 13:17 UTC
- cron: '17 13 12 * *' # 12th of month at 13:17 UTC

# A more complete suite of checks to run monthly; each PR/merge need not pass all these, but they should pass before CRAN release
name: R-CMD-check-occasional
Expand All @@ -15,7 +15,7 @@ jobs:
fail-fast: false
matrix:
os: [macOS-latest, windows-latest, ubuntu-latest]
r: ['devel', 'release', '3.2', '3.3', '3.4', '3.5', '3.6', '4.0', '4.1', '4.2', '4.3']
r: ['devel', 'release', '3.3', '3.4', '3.5', '3.6', '4.0', '4.1', '4.2', '4.3']
locale: ['en_US.utf8', 'zh_CN.utf8', 'lv_LV.utf8'] # Chinese for translations, Latvian for collate order (#3502)
exclude:
# only run non-English locale CI on Ubuntu
Expand All @@ -28,8 +28,6 @@ jobs:
- os: windows-latest
locale: 'lv_LV.utf8'
# macOS/arm64 only available for R>=4.1.0
- os: macOS-latest
r: '3.2'
- os: macOS-latest
r: '3.3'
- os: macOS-latest
Expand Down
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: data.table
Version: 1.15.99
Title: Extension of `data.frame`
Depends: R (>= 3.2.0)
Depends: R (>= 3.3.0)
Imports: methods
Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils, xts, zoo (>= 1.8-1), yaml, knitr, markdown
Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Expand All @@ -12,11 +12,11 @@ VignetteBuilder: knitr
Encoding: UTF-8
ByteCompile: TRUE
Authors@R: c(
person("Tyson","Barrett", role=c("aut","cre"), email="[email protected]"),
person("Tyson","Barrett", role=c("aut","cre"), email="[email protected]", comment = c(ORCID="0000-0002-2137-1391")),
person("Matt","Dowle", role="aut", email="[email protected]"),
person("Arun","Srinivasan", role="aut", email="[email protected]"),
person("Jan","Gorecki", role="aut"),
person("Michael","Chirico", role="aut"),
person("Michael","Chirico", role="aut", comment = c(ORCID="0000-0003-0787-087X")),
person("Toby","Hocking", role="aut", comment = c(ORCID="0000-0002-3146-0865")),
person("Benjamin","Schwendinger",role="aut"),
person("Pasha","Stetsenko", role="ctb"),
Expand Down
14 changes: 9 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@

# data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30) (in development)

## BREAKING CHANGES

1. Usage of comma-separated character strings representing multiple columns in `data.table()`'s `key=` argument and `[`'s `by=`/`keyby=` arguments is deprecated, [#4357](https://github.com/Rdatatable/data.table/issues/4357). While sometimes convenient, ultimately it introduces inconsistency in implementation that is not worth the benefit to maintain. NB: this hard deprecation is temporary in the development version. Before release, it will soften into the normal data.table deprecation cycle starting from introducing the new behavior with an option, then changing the default for the option with a warning, then upgrading the warning to an error before finally removing the option and the error.

## NEW FEATURES

1. `print.data.table()` shows empty (`NULL`) list column entries as `[NULL]` for emphasis. Previously they would just print nothing (same as for empty string). Part of [#4198](https://github.com/Rdatatable/data.table/issues/4198). Thanks @sritchie73 for the proposal and fix.
Expand Down Expand Up @@ -64,13 +60,15 @@

9. In `DT[,j,by]`, `by` retains its attributes (e.g. class) when `j` is GForce optimized, [#5567](https://github.com/Rdatatable/data.table/issues/5567). Thanks to @danwwilson for the report, and @ben-schwen for the PR.

10. `dt[,,by=año]` (i.e., using a column name containing a non-ASCII character in `by` as a plain symbol) no longer errors with "object 'año' not found", #4708. Thanks @pfv07 for the report, and Michael Chirico for the fix.

## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.

2. The documentation for the `fill` argument in `rbind()` and `rbindlist()` now notes the expected behaviour for missing `list` columns when `fill=TRUE`, namely to use `NULL` (not `NA`), [#4198](https://github.com/Rdatatable/data.table/pull/4198). Thanks @sritchie73 for the proposal and fix.

3. data.table now depends on R 3.2.0 (2015) instead of 3.1.0 (2014). 1.17.0 will likely move to R 3.3.0 (2016). Recent versions of R have good features that we would gradually like to incorporate, and we see next to no usage of these very old versions of R.
3. data.table now depends on R 3.3.0 (2016) instead of 3.1.0 (2014). Recent versions of R have good features that we would gradually like to incorporate, and we see next to no usage of these very old versions of R. We originally attempted to bump only to R 3.2.0 in this release, but {knitr} requiring 3.3.0 and `R CMD check` lacking an `--ignore-vignettes` option until 3.3.0 essentially forced our hands.

4. Erroneous assignment calls in `[` with a trailing comma (e.g. ``DT[, `:=`(a = 1, b = 2,)]``) get a friendlier error since this situation is common during refactoring and easy to miss visually. Thanks @MichaelChirico for the fix.

Expand Down Expand Up @@ -106,6 +104,12 @@
18. `integer64` columns print well even if {bit64} has not yet been loaded, [#6224](https://github.com/Rdatatable/data.table/issues/6224). Thanks @renkun-ken for the report and @MichaelChirico for the fix.
19. `fwrite()` header names are no longer quoted automatically when `na` argument is given, [#2964](https://github.com/Rdatatable/data.table/issues/2964). Thanks @jangorecki for the report and @joshhwuu for the fix.
20. Removed a warning about the now totally-obsolete option `datatable.CJ.names`, as discussed in previous releases.
21. Refactored some non-API calls to R macros for S4 objects (#6180)[https://github.com/Rdatatable/data.table/issues/6180]. There should be no user-visible change. Thanks to various R users & R core for pushing to have a clearer definition of "API" for R, and thanks @MichaelChirico for implementing here.
## TRANSLATIONS
1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
Expand Down
12 changes: 6 additions & 6 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,7 @@ data.table = function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, str
ans = as.data.table.list(x, keep.rownames=keep.rownames, check.names=check.names, .named=nd$.named) # see comments inside as.data.table.list re copies
if (!is.null(key)) {
if (!is.character(key)) stopf("key argument of data.table() must be character")
if (length(key)==1L) {
if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
}
if (length(key)==1L) key = cols_from_csv(key)
setkeyv(ans,key)
} else {
# retain key of cbind(DT1, DT2, DT3) where DT2 is keyed but not DT1. cbind calls data.table().
Expand Down Expand Up @@ -797,7 +795,8 @@ replace_dot_alias = function(e) {

if (mode(bysub) == "character") {
if (any(grepl(",", bysub, fixed = TRUE))) {
stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "by=")
if (length(bysub) > 1L) stopf("'by' is a character vector length %d but one or more items include a comma. Either pass a vector of column names (which can contain spaces, but no commas), or pass a vector length 1 containing comma separated column names. See ?data.table for other possibilities.", length(bysub))
bysub = cols_from_csv(bysub)
}
bysub = gsub("^`(.*)`$", "\\1", bysub) # see test 138
nzidx = nzchar(bysub)
Expand Down Expand Up @@ -1764,7 +1763,7 @@ replace_dot_alias = function(e) {
}
else {
# adding argument to ghead/gtail if none is supplied to g-optimized head/tail
if (length(jsub) == 2L && jsub[[1L]] %chin% c("head", "tail")) jsub[["n"]] = 6L
if (length(jsub) == 2L && jsub %iscall% c("head", "tail")) jsub[["n"]] = 6L
jsub = .gforce_jsub(jsub, names_x)
}
if (verbose) catf("GForce optimized j to '%s' (see ?GForce)\n", deparse(jsub, width.cutoff=200L, nlines=1L))
Expand Down Expand Up @@ -2717,8 +2716,9 @@ chmatch = function(x, table, nomatch=NA_integer_)
chmatchdup = function(x, table, nomatch=NA_integer_)
.Call(Cchmatchdup, x, table, as.integer(nomatch[1L]))

# Force as.character as part of #4708
"%chin%" = function(x, table)
.Call(Cchin, x, table) # TO DO if table has 'ul' then match to that
.Call(Cchin, as.character(x), table) # TO DO if table has 'ul' then match to that

chorder = function(x) {
o = forderv(x, sort=TRUE, retGrp=FALSE)
Expand Down
8 changes: 2 additions & 6 deletions R/fread.R
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,6 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
if (w <- startsWithAny(file, c("https://", "ftps://", "http://", "ftp://", "file://"))) { # avoid grepl() for #2531
# nocov start
tmpFile = tempfile(fileext = paste0(".",tools::file_ext(file)), tmpdir=tmpdir) # retain .gz extension in temp filename so it knows to be decompressed further below
if (w<=2L && base::getRversion()<"3.2.2") { # https: or ftps: can be read by default by download.file() since 3.2.2
stopf("URL requires download.file functionalities from R >=3.2.2. You can still manually download the file and fread the downloaded file.")
}
method = if (w==5L) "internal" # force 'auto' when file: to ensure we don't use an invalid option (e.g. wget), #1668
else getOption("download.file.method", default="auto") # http: or ftp:
# In text mode on Windows-only, R doubles up \r to make \r\r\n line endings. mode="wb" avoids that. See ?connections:"CRLF"
Expand Down Expand Up @@ -340,9 +337,8 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
if (!is.null(key) && data.table) {
if (!is.character(key))
stopf("key argument of data.table() must be a character vector naming columns (NB: col.names are applied before this)")
if (length(key) == 1L) {
if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
}
if (length(key) == 1L)
key = cols_from_csv(key)
setkeyv(ans, key)
}
if (yaml) setattr(ans, 'yaml_metadata', yaml_header)
Expand Down
4 changes: 0 additions & 4 deletions R/onLoad.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,6 @@
eval(parse(text=paste0("options(",i,"=",opts[i],")")))
}

# default TRUE from v1.12.0, FALSE before. Now ineffectual. Remove this warning after 1.15.0.
if (!is.null(getOption("datatable.CJ.names")))
warningf("Option 'datatable.CJ.names' no longer has any effect, as promised for 4 years. It is now ignored. Manually name `...` entries as needed if you still prefer the old behavior.")

# Test R behaviour that changed in v3.1 and is now depended on
x = 1L:3L
y = list(x)
Expand Down
13 changes: 9 additions & 4 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@ nan_is_na = function(x) {
stopf("Argument 'nan' must be NA or NaN")
}

if (!exists('startsWith', 'package:base', inherits=FALSE)) { # R 3.3.0; Apr 2016
startsWith = function(x, stub) substr(x, 1L, nchar(stub))==stub
}
# endsWith no longer used from #5097 so no need to backport; prevent usage to avoid dev delay until GLCI's R 3.1.0 test
endsWith = function(...) stop("Internal error: use endsWithAny instead of base::endsWith", call.=FALSE)

Expand Down Expand Up @@ -114,6 +111,10 @@ brackify = function(x, quote=FALSE) {
sprintf('[%s]', toString(x))
}

# convenience for specifying columns in some cases, e.g. by= and key=
# caller should ensure length(x) == 1 & handle accordingly.
cols_from_csv = function(x) strsplit(x, ',', fixed=TRUE)[[1L]]

# patterns done via NSE in melt.data.table and .SDcols in `[.data.table`
# was called do_patterns() before PR#4731
eval_with_cols = function(orig_call, all_cols) {
Expand Down Expand Up @@ -148,7 +149,11 @@ is_utc = function(tz) {
}

# very nice idea from Michael to avoid expression repetition (risk) in internal code, #4226
"%iscall%" = function(e, f) { is.call(e) && e[[1L]] %chin% f }
`%iscall%` = function(e, f) {
if (!is.call(e)) return(FALSE)
if (is.name(e1 <- e[[1L]])) return(e1 %chin% f)
e1 %iscall% '::' && e1[[3L]] %chin% f
}

# nocov start #593 always return a data.table
edit.data.table = function(name, ...) {
Expand Down
7 changes: 7 additions & 0 deletions inst/tests/S4.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,10 @@ if (TZnotUTC) {
fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date",NA,NA)),
ans, output=ans_print)
}

# S4 object in grouping output requiring growVector
# coverage test towards refactoring for #6180
DT = data.table(a = rep(1:2, c(1, 100)))
# Set the S4 bit on a simple object
DT[, b := asS4(seq_len(.N))]
test(6, DT[, b, by=a, verbose=TRUE][, isS4(b)], output="dogroups: growing")
10 changes: 0 additions & 10 deletions inst/tests/nafill.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,6 @@ for (s in sugg) {
if (!loaded) cat("\n**** Suggested package",s,"is not installed or has dependencies missing. Tests using it will be skipped.\n\n")
}

# Ensure an operation uses C-locale sorting (#3502). For test set-ups/comparisons that use base operations, which are
# susceptible to locale-specific sorting issues, but shouldn't be needed for data.table code, which always uses C sorting.
# TODO(R>=3.3.0): use order(method="radix") as a way to avoid needing this helper
with_c_collate = function(expr) {
old = Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE", old))
Sys.setlocale("LC_COLLATE", "C")
expr
}

x = 1:10
x[c(1:2, 5:6, 9:10)] = NA
test(1.01, nafill(x, "locf"), INT(NA,NA,3,4,4,4,7,8,8,8))
Expand Down
Loading

0 comments on commit 2006179

Please sign in to comment.