Merge branch 'master' into c_api_maintain

Rdatatable · Jul 11, 2024 · 2006179 · 2006179
2 parents e577e3b + 5a091b1
commit 2006179
Show file tree

Hide file tree

Showing 14 changed files with 89 additions and 86 deletions.
diff --git a/.github/workflows/R-CMD-check-occasional.yaml b/.github/workflows/R-CMD-check-occasional.yaml
@@ -1,6 +1,6 @@
 on:
   schedule:
-   - cron: '17 13 9 * *' # 9th of month at 13:17 UTC
+   - cron: '17 13 12 * *' # 12th of month at 13:17 UTC
 
 # A more complete suite of checks to run monthly; each PR/merge need not pass all these, but they should pass before CRAN release
 name: R-CMD-check-occasional
@@ -15,7 +15,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [macOS-latest, windows-latest, ubuntu-latest]
-        r: ['devel', 'release', '3.2', '3.3', '3.4', '3.5', '3.6', '4.0', '4.1', '4.2', '4.3']
+        r: ['devel', 'release', '3.3', '3.4', '3.5', '3.6', '4.0', '4.1', '4.2', '4.3']
         locale: ['en_US.utf8', 'zh_CN.utf8', 'lv_LV.utf8'] # Chinese for translations, Latvian for collate order (#3502)
         exclude:
           # only run non-English locale CI on Ubuntu
@@ -28,8 +28,6 @@ jobs:
           - os: windows-latest
             locale: 'lv_LV.utf8'
           # macOS/arm64 only available for R>=4.1.0
-          - os: macOS-latest
-            r: '3.2'
           - os: macOS-latest
             r: '3.3'
           - os: macOS-latest

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: data.table
 Version: 1.15.99
 Title: Extension of `data.frame`
-Depends: R (>= 3.2.0)
+Depends: R (>= 3.3.0)
 Imports: methods
 Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils, xts, zoo (>= 1.8-1), yaml, knitr, markdown
 Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
@@ -12,11 +12,11 @@ VignetteBuilder: knitr
 Encoding: UTF-8
 ByteCompile: TRUE
 Authors@R: c(
-  person("Tyson","Barrett",        role=c("aut","cre"), email="[email protected]"),
+  person("Tyson","Barrett",        role=c("aut","cre"), email="[email protected]", comment = c(ORCID="0000-0002-2137-1391")),
   person("Matt","Dowle",           role="aut",          email="[email protected]"),
   person("Arun","Srinivasan",      role="aut",          email="[email protected]"),
   person("Jan","Gorecki",          role="aut"),
-  person("Michael","Chirico",      role="aut"),
+  person("Michael","Chirico",      role="aut", comment = c(ORCID="0000-0003-0787-087X")),
   person("Toby","Hocking",         role="aut", comment = c(ORCID="0000-0002-3146-0865")),
   person("Benjamin","Schwendinger",role="aut"),
   person("Pasha","Stetsenko",      role="ctb"),

diff --git a/NEWS.md b/NEWS.md
@@ -2,10 +2,6 @@
 
 # data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30)  (in development)
 
-## BREAKING CHANGES
-
-1. Usage of comma-separated character strings representing multiple columns in `data.table()`'s `key=` argument and `[`'s `by=`/`keyby=` arguments is deprecated, [#4357](https://github.com/Rdatatable/data.table/issues/4357). While sometimes convenient, ultimately it introduces inconsistency in implementation that is not worth the benefit to maintain. NB: this hard deprecation is temporary in the development version. Before release, it will soften into the normal data.table deprecation cycle starting from introducing the new behavior with an option, then changing the default for the option with a warning, then upgrading the warning to an error before finally removing the option and the error.
-
 ## NEW FEATURES
 
 1. `print.data.table()` shows empty (`NULL`) list column entries as `[NULL]` for emphasis. Previously they would just print nothing (same as for empty string). Part of [#4198](https://github.com/Rdatatable/data.table/issues/4198). Thanks @sritchie73 for the proposal and fix.
@@ -64,13 +60,15 @@
 
 9. In `DT[,j,by]`, `by` retains its attributes (e.g. class) when `j` is GForce optimized, [#5567](https://github.com/Rdatatable/data.table/issues/5567). Thanks to @danwwilson for the report, and @ben-schwen for the PR.
 
+10. `dt[,,by=año]` (i.e., using a column name containing a non-ASCII character in `by` as a plain symbol) no longer errors with "object 'año' not found", #4708. Thanks @pfv07 for the report, and Michael Chirico for the fix.
+
 ## NOTES
 
 1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.
 
 2. The documentation for the `fill` argument in `rbind()` and `rbindlist()` now notes the expected behaviour for missing `list` columns when `fill=TRUE`, namely to use `NULL` (not `NA`), [#4198](https://github.com/Rdatatable/data.table/pull/4198). Thanks @sritchie73 for the proposal and fix.
 
-3. data.table now depends on R 3.2.0 (2015) instead of 3.1.0 (2014). 1.17.0 will likely move to R 3.3.0 (2016). Recent versions of R have good features that we would gradually like to incorporate, and we see next to no usage of these very old versions of R.
+3. data.table now depends on R 3.3.0 (2016) instead of 3.1.0 (2014). Recent versions of R have good features that we would gradually like to incorporate, and we see next to no usage of these very old versions of R. We originally attempted to bump only to R 3.2.0 in this release, but {knitr} requiring 3.3.0 and `R CMD check` lacking an `--ignore-vignettes` option until 3.3.0 essentially forced our hands.
 
 4. Erroneous assignment calls in `[` with a trailing comma (e.g. ``DT[, `:=`(a = 1, b = 2,)]``) get a friendlier error since this situation is common during refactoring and easy to miss visually. Thanks @MichaelChirico for the fix.
 
@@ -106,6 +104,12 @@
 
 18. `integer64` columns print well even if {bit64} has not yet been loaded, [#6224](https://github.com/Rdatatable/data.table/issues/6224). Thanks @renkun-ken for the report and @MichaelChirico for the fix.
 
+19. `fwrite()` header names are no longer quoted automatically when `na` argument is given, [#2964](https://github.com/Rdatatable/data.table/issues/2964). Thanks @jangorecki for the report and @joshhwuu for the fix.
+
+20. Removed a warning about the now totally-obsolete option `datatable.CJ.names`, as discussed in previous releases.
+
+21. Refactored some non-API calls to R macros for S4 objects (#6180)[https://github.com/Rdatatable/data.table/issues/6180]. There should be no user-visible change. Thanks to various R users & R core for pushing to have a clearer definition of "API" for R, and thanks @MichaelChirico for implementing here.
+
 ## TRANSLATIONS
 
 1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.

diff --git a/R/data.table.R b/R/data.table.R
@@ -53,9 +53,7 @@ data.table = function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, str
   ans = as.data.table.list(x, keep.rownames=keep.rownames, check.names=check.names, .named=nd$.named)  # see comments inside as.data.table.list re copies
   if (!is.null(key)) {
     if (!is.character(key)) stopf("key argument of data.table() must be character")
-    if (length(key)==1L) {
-      if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
-    }
+    if (length(key)==1L) key = cols_from_csv(key)
     setkeyv(ans,key)
   } else {
     # retain key of cbind(DT1, DT2, DT3) where DT2 is keyed but not DT1. cbind calls data.table().
@@ -797,7 +795,8 @@ replace_dot_alias = function(e) {
 
         if (mode(bysub) == "character") {
           if (any(grepl(",", bysub, fixed = TRUE))) {
-            stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "by=")
+            if (length(bysub) > 1L) stopf("'by' is a character vector length %d but one or more items include a comma. Either pass a vector of column names (which can contain spaces, but no commas), or pass a vector length 1 containing comma separated column names. See ?data.table for other possibilities.", length(bysub))
+            bysub = cols_from_csv(bysub)
           }
           bysub = gsub("^`(.*)`$", "\\1", bysub) # see test 138
           nzidx = nzchar(bysub)
@@ -1764,7 +1763,7 @@ replace_dot_alias = function(e) {
             }
           else {
             # adding argument to ghead/gtail if none is supplied to g-optimized head/tail
-            if (length(jsub) == 2L && jsub[[1L]] %chin% c("head", "tail")) jsub[["n"]] = 6L
+            if (length(jsub) == 2L && jsub %iscall% c("head", "tail")) jsub[["n"]] = 6L
             jsub = .gforce_jsub(jsub, names_x)
           }
           if (verbose) catf("GForce optimized j to '%s' (see ?GForce)\n", deparse(jsub, width.cutoff=200L, nlines=1L))
@@ -2717,8 +2716,9 @@ chmatch = function(x, table, nomatch=NA_integer_)
 chmatchdup = function(x, table, nomatch=NA_integer_)
   .Call(Cchmatchdup, x, table, as.integer(nomatch[1L]))
 
+# Force as.character as part of #4708
 "%chin%" = function(x, table)
-  .Call(Cchin, x, table)  # TO DO  if table has 'ul' then match to that
+  .Call(Cchin, as.character(x), table)  # TO DO  if table has 'ul' then match to that
 
 chorder = function(x) {
   o = forderv(x, sort=TRUE, retGrp=FALSE)

diff --git a/R/fread.R b/R/fread.R
@@ -78,9 +78,6 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
     if (w <- startsWithAny(file, c("https://", "ftps://", "http://", "ftp://", "file://"))) {  # avoid grepl() for #2531
       # nocov start
       tmpFile = tempfile(fileext = paste0(".",tools::file_ext(file)), tmpdir=tmpdir)  # retain .gz extension in temp filename so it knows to be decompressed further below
-      if (w<=2L && base::getRversion()<"3.2.2") { # https: or ftps: can be read by default by download.file() since 3.2.2
-        stopf("URL requires download.file functionalities from R >=3.2.2. You can still manually download the file and fread the downloaded file.")
-      }
       method = if (w==5L) "internal"  # force 'auto' when file: to ensure we don't use an invalid option (e.g. wget), #1668
                else getOption("download.file.method", default="auto")  # http: or ftp:
       # In text mode on Windows-only, R doubles up \r to make \r\r\n line endings. mode="wb" avoids that. See ?connections:"CRLF"
@@ -340,9 +337,8 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
   if (!is.null(key) && data.table) {
     if (!is.character(key))
       stopf("key argument of data.table() must be a character vector naming columns (NB: col.names are applied before this)")
-    if (length(key) == 1L) {
-      if (key != strsplit(key,split=",")[[1L]]) stopf("Usage of comma-separated literals in %s is deprecated, please split such entries yourself before passing to data.table", "key=")
-    }
+    if (length(key) == 1L)
+      key = cols_from_csv(key)
     setkeyv(ans, key)
   }
   if (yaml) setattr(ans, 'yaml_metadata', yaml_header)

diff --git a/R/onLoad.R b/R/onLoad.R
@@ -92,10 +92,6 @@
     eval(parse(text=paste0("options(",i,"=",opts[i],")")))
   }
 
-  # default TRUE from v1.12.0, FALSE before. Now ineffectual. Remove this warning after 1.15.0.
-  if (!is.null(getOption("datatable.CJ.names")))
-    warningf("Option 'datatable.CJ.names' no longer has any effect, as promised for 4 years. It is now ignored. Manually name `...` entries as needed if you still prefer the old behavior.")
-
   # Test R behaviour that changed in v3.1 and is now depended on
   x = 1L:3L
   y = list(x)

diff --git a/R/utils.R b/R/utils.R
@@ -21,9 +21,6 @@ nan_is_na = function(x) {
   stopf("Argument 'nan' must be NA or NaN")
 }
 
-if (!exists('startsWith', 'package:base', inherits=FALSE)) {  # R 3.3.0; Apr 2016
-  startsWith = function(x, stub) substr(x, 1L, nchar(stub))==stub
-}
 # endsWith no longer used from #5097 so no need to backport; prevent usage to avoid dev delay until GLCI's R 3.1.0 test
 endsWith = function(...) stop("Internal error: use endsWithAny instead of base::endsWith", call.=FALSE)
 
@@ -114,6 +111,10 @@ brackify = function(x, quote=FALSE) {
   sprintf('[%s]', toString(x))
 }
 
+# convenience for specifying columns in some cases, e.g. by= and key=
+# caller should ensure length(x) == 1 & handle accordingly.
+cols_from_csv = function(x) strsplit(x, ',', fixed=TRUE)[[1L]]
+
 # patterns done via NSE in melt.data.table and .SDcols in `[.data.table`
 # was called do_patterns() before PR#4731
 eval_with_cols = function(orig_call, all_cols) {
@@ -148,7 +149,11 @@ is_utc = function(tz) {
 }
 
 # very nice idea from Michael to avoid expression repetition (risk) in internal code, #4226
-"%iscall%" = function(e, f) { is.call(e) && e[[1L]] %chin% f }
+`%iscall%` = function(e, f) {
+  if (!is.call(e)) return(FALSE)
+  if (is.name(e1 <- e[[1L]])) return(e1 %chin% f)
+  e1 %iscall% '::' && e1[[3L]] %chin% f
+}
 
 # nocov start #593 always return a data.table
 edit.data.table = function(name, ...) {

diff --git a/inst/tests/S4.Rraw b/inst/tests/S4.Rraw
@@ -102,3 +102,10 @@ if (TZnotUTC) {
        fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date",NA,NA)),
        ans, output=ans_print)
 }
+
+# S4 object in grouping output requiring growVector
+#   coverage test towards refactoring for #6180
+DT = data.table(a = rep(1:2, c(1, 100)))
+# Set the S4 bit on a simple object
+DT[, b := asS4(seq_len(.N))]
+test(6, DT[, b, by=a, verbose=TRUE][, isS4(b)], output="dogroups: growing")
diff --git a/inst/tests/nafill.Rraw b/inst/tests/nafill.Rraw
@@ -21,16 +21,6 @@ for (s in sugg) {
   if (!loaded) cat("\n**** Suggested package",s,"is not installed or has dependencies missing. Tests using it will be skipped.\n\n")
 }
 
-# Ensure an operation uses C-locale sorting (#3502). For test set-ups/comparisons that use base operations, which are
-#   susceptible to locale-specific sorting issues, but shouldn't be needed for data.table code, which always uses C sorting.
-# TODO(R>=3.3.0): use order(method="radix") as a way to avoid needing this helper
-with_c_collate = function(expr) {
-  old = Sys.getlocale("LC_COLLATE")
-  on.exit(Sys.setlocale("LC_COLLATE", old))
-  Sys.setlocale("LC_COLLATE", "C")
-  expr
-}
-
 x = 1:10
 x[c(1:2, 5:6, 9:10)] = NA
 test(1.01, nafill(x, "locf"), INT(NA,NA,3,4,4,4,7,8,8,8))