Skip to content

Commit

Permalink
Merge branch 'master' into condition-signals
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelChirico committed Apr 4, 2024
2 parents 6e13511 + fad5b17 commit cb61ff7
Show file tree
Hide file tree
Showing 57 changed files with 791 additions and 423 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.dir-locals.el
^\.Rprofile$
^data\.table_.*\.tar\.gz$
^config\.log$
^vignettes/plots/figures$
^\.Renviron$
^[^/]+\.R$
Expand All @@ -16,7 +17,6 @@
^\.graphics$
^\.github$

^\.appveyor\.yml$
^\.gitlab-ci\.yml$

^Makefile$
Expand Down
71 changes: 0 additions & 71 deletions .appveyor.yml

This file was deleted.

4 changes: 0 additions & 4 deletions .ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,6 @@ Artifacts:

TODO document

### [Appveyor](./../.appveyor.yml)

TODO document

## CI tools

### [`ci.R`](./ci.R)
Expand Down
10 changes: 10 additions & 0 deletions .dev/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# data.table developer

## Setup

To use the optional helper function `cc()`, one needs to set up the project path and source `.dev/cc.R` to use `cc()` conveniently. This works through creating an additional `.Rprofile` in the `data.table` directory.

```r
# content of .Rprofile in the package directory
Sys.setenv(PROJ_PATH="~/git/data.table")
source(".dev/cc.R")
```

## Utilities

### [`cc.R`](./cc.R)
Expand Down
10 changes: 5 additions & 5 deletions .dev/cc.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ sourceDir = function(path=getwd(), trace = TRUE, ...) {
if(trace) cat("\n")
}

cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys.getenv("PROJ_PATH"), CC="gcc") {
cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys.getenv("PROJ_PATH"), CC="gcc", quiet=FALSE) {
if (!missing(cc_dir)) {
warning("'cc_dir' arg is deprecated, use 'path' argument or 'PROJ_PATH' env var instead")
path = cc_dir
Expand All @@ -55,13 +55,13 @@ cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys
old = getwd()
on.exit(setwd(old))
setwd(file.path(path,"src"))
cat(getwd(),"\n")
if (!quiet) cat(getwd(),"\n")
if (clean) system("rm *.o *.so")
OMP = if (omp) "" else "no-"
if (debug) {
ret = system(sprintf("MAKEFLAGS='-j CC=%s PKG_CFLAGS=-f%sopenmp CFLAGS=-std=c99\\ -O0\\ -ggdb\\ -pedantic' R CMD SHLIB -d -o data_table.so *.c", CC, OMP))
ret = system(ignore.stdout=quiet, sprintf("MAKEFLAGS='-j CC=%s PKG_CFLAGS=-f%sopenmp CFLAGS=-std=c99\\ -O0\\ -ggdb\\ -pedantic' R CMD SHLIB -d -o data_table.so *.c", CC, OMP))
} else {
ret = system(sprintf("MAKEFLAGS='-j CC=%s CFLAGS=-f%sopenmp\\ -std=c99\\ -O3\\ -pipe\\ -Wall\\ -pedantic\\ -Wstrict-prototypes\\ -isystem\\ /usr/share/R/include\\ -fno-common' R CMD SHLIB -o data_table.so *.c", CC, OMP))
ret = system(ignore.stdout=quiet, sprintf("MAKEFLAGS='-j CC=%s CFLAGS=-f%sopenmp\\ -std=c99\\ -O3\\ -pipe\\ -Wall\\ -pedantic\\ -Wstrict-prototypes\\ -isystem\\ /usr/share/R/include\\ -fno-common' R CMD SHLIB -o data_table.so *.c", CC, OMP))
# the -isystem suppresses strict-prototypes warnings from R's headers, #5477. Look at the output to see what -I is and pass the same path to -isystem.
# TODO add -Wextra too?
}
Expand All @@ -81,7 +81,7 @@ cc = function(test=FALSE, clean=FALSE, debug=FALSE, omp=!debug, cc_dir, path=Sys
.GlobalEnv[[Call$name]] = Call$address
for (Extern in xx$.External)
.GlobalEnv[[Extern$name]] = Extern$address
sourceDir(file.path(path, "R"))
sourceDir(file.path(path, "R"), trace=!quiet)
if (base::getRversion()<"4.0.0") rm(list=c("rbind.data.table", "cbind.data.table"), envir=.GlobalEnv) # 3968 follow up
.GlobalEnv$testDir = function(x) file.path(path,"inst/tests",x)
.onLoad()
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master

name: R-CMD-check

concurrency:
group: ${{ github.event.pull_request.number || github.run_id }}
cancel-in-progress: true

jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}
Expand All @@ -25,7 +25,7 @@ jobs:
# Rdatatable has full-strength GLCI which runs after merge. So we just need a few
# jobs (mainly test-coverage) to run on every commit in PRs so as to not slow down dev.
# GHA does run these jobs concurrently but even so reducing the load seems like a good idea.
# - {os: windows-latest, r: 'release'} # currently using AppVeyor which runs 32bit in 5 min and works
- {os: windows-latest, r: 'devel'}
# - {os: macOS-latest, r: 'release'} # test-coverage.yaml uses macOS
- {os: ubuntu-20.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"}
# - {os: ubuntu-20.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest", http-user-agent: "R/4.1.0 (ubuntu-20.04) R (4.1.0 x86_64-pc-linux-gnu x86_64 linux-gnu) on GitHub Actions" }
Expand Down
4 changes: 0 additions & 4 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master

name: test-coverage

Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ Rplots.pdf
data.table_*.tar.gz
data.table.Rcheck
src/Makevars
.Rprofile

# Package install
inst/cc
config.log

# Emacs IDE files
.emacs.desktop
Expand Down
10 changes: 6 additions & 4 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ variables:
## a non-UTC timezone, although, that's what we do routinely in dev.
R_REL_VERSION: "4.3"
R_REL_WIN_BIN: "https://cloud.r-project.org/bin/windows/base/old/4.3.2/R-4.3.2-win.exe"
RTOOLS_REL_BIN: "https://cloud.r-project.org/bin/windows/Rtools/rtools43/files/rtools43-5863-5818.exe"
RTOOLS_REL_BIN: "https://cloud.r-project.org/bin/windows/Rtools/rtools43/files/rtools43-5958-5975.exe"
RTOOLS43_HOME: "/c/rtools"
R_DEV_VERSION: "4.4"
R_DEV_WIN_BIN: "https://cloud.r-project.org/bin/windows/base/R-devel-win.exe"
RTOOLS_DEV_BIN: "https://cloud.r-project.org/bin/windows/Rtools/rtools43/files/rtools43-5863-5818.exe"
RTOOLS_DEV_BIN: "https://cloud.r-project.org/bin/windows/Rtools/rtools43/files/rtools43-5958-5975.exe"
RTOOLS44_HOME: "" ## in case R-devel will use new Rtools toolchain, now it uses 4.3 env var
R_OLD_VERSION: "4.2"
R_OLD_WIN_BIN: "https://cloud.r-project.org/bin/windows/base/old/4.2.3/R-4.2.3-win.exe"
Expand Down Expand Up @@ -211,8 +211,10 @@ test-lin-310-cran:
tags:
- shared-windows
before_script:
- curl.exe -s -o ../R-win.exe $R_BIN; Start-Process -FilePath ..\R-win.exe -ArgumentList "/VERYSILENT /DIR=C:\R" -NoNewWindow -Wait
- curl.exe -s -o ../rtools.exe $RTOOLS_BIN; Start-Process -FilePath ..\rtools.exe -ArgumentList "/VERYSILENT /DIR=C:\rtools" -NoNewWindow -Wait
- curl.exe -s -o ../R-win.exe $R_BIN --fail; if (!(Test-Path -Path ..\R-win.exe)) {Write-Error "R-win.exe not found, download failed?"}
- Start-Process -FilePath ..\R-win.exe -ArgumentList "/VERYSILENT /DIR=C:\R" -NoNewWindow -Wait
- curl.exe -s -o ../rtools.exe $RTOOLS_BIN --fail; if (!(Test-Path -Path ..\rtools.exe)) {Write-Error "rtools.exe not found, download failed?"}
- Start-Process -FilePath ..\rtools.exe -ArgumentList "/VERYSILENT /DIR=C:\rtools" -NoNewWindow -Wait
- $env:PATH = "C:\R\bin;C:\rtools\usr\bin;$env:PATH"
- Rscript.exe -e "source('.ci/ci.R'); install.packages(dcf.dependencies('DESCRIPTION', which='all'), repos=file.path('file://',getwd(),'bus/mirror-packages/cran'), quiet=TRUE)"
- cp.exe $(ls.exe -1t bus/build/data.table_*.tar.gz | head.exe -n 1) .
Expand Down
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
* @mattdowle
* @jangorecki @michaelchirico

# melt
/R/fmelt.R @tdhock
Expand Down
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,7 @@ Authors@R: c(
person("Olivier","Delmarcell", role="ctb"),
person("Josh","O'Brien", role="ctb"),
person("Dereck","de Mezquita", role="ctb"),
person("Michael","Czekanski", role="ctb")
person("Michael","Czekanski", role="ctb"),
person("Dmitry", "Shemetov", role="ctb"),
person("Nitish", "Jha", role="ctb")
)
5 changes: 4 additions & 1 deletion GOVERNANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Functionality that is out of current scope:

* Definition: permission to commit to, and merge PRs into, master branch.
* How to obtain this role: after a reviewer has a consistent history of careful reviews of others' PRs, then a current Committer should ask all other current Committers if they approve promoting the Reviewer to Committer, and it should be done if there is Consensus among active Committers.
* How this role is recognized: credited via role="aut" in DESCRIPTION (so they appear in Author list on CRAN), and added to https://github.com/orgs/Rdatatable/teams/maintainers which gives permission to merge PRs into master branch.
* How this role is recognized: credited via role="aut" in DESCRIPTION (so they appear in Author list on CRAN), and added to https://github.com/orgs/Rdatatable/teams/committers which gives permission to merge PRs into master branch.

## CRAN maintainer

Expand Down Expand Up @@ -123,6 +123,9 @@ data.table Version line in DESCRIPTION typically has the following meanings

# Governance history

Feb 2024: change team name/link maintainers to committers, to be consistent with role defined in governance.

Nov-Dec 2023: initial version drafted by Toby Dylan Hocking and
reviewed by Tyson Barrett, Jan Gorecki, Michael Chirico, Benjamin
Schwendinger.

7 changes: 4 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
useDynLib("data_table", .registration=TRUE)

## For S4-ization
import(methods)
importFrom(methods, "S3Part<-", slotNames)
importMethodsFrom(methods, "[")
exportClasses(data.table, IDate, ITime)
##

Expand Down Expand Up @@ -130,11 +131,11 @@ S3method(melt, default)
# and many packges on CRAN call dcast.data.table() and/or melt.data.table() directly. See #3082.
export(melt.data.table, dcast.data.table)

import(utils)
importFrom(utils, capture.output, contrib.url, download.file, flush.console, getS3method, head, packageVersion, tail, untar, unzip)
export(update_dev_pkg)
S3method(tail, data.table)
S3method(head, data.table)
import(stats)
importFrom(stats, as.formula, na.omit, setNames, terms)
S3method(na.omit, data.table)

S3method(as.data.table, xts)
Expand Down
32 changes: 31 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,29 @@
# 2:
```

2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654).
2. `cedta()` now returns `FALSE` if `.datatable.aware = FALSE` is set in the calling environment, [#5654](https://github.com/Rdatatable/data.table/issues/5654). Thanks @dvg-p4 for the request and PR.

3. `split.data.table` also accepts a formula for `f`, [#5392](https://github.com/Rdatatable/data.table/issues/5392), mirroring the same in `base::split.data.frame` since R 4.1.0 (May 2021). Thanks to @XiangyunHuang for the request, and @ben-schwen for the PR.

4. Namespace-qualifying `data.table::shift()`, `data.table::first()`, or `data.table::last()` will not deactivate GForce, [#5942](https://github.com/Rdatatable/data.table/issues/5942). Thanks @MichaelChirico for the proposal and fix. Namespace-qualifying other calls like `stats::sum()`, `base::prod()`, etc., continue to work as an escape valve to avoid GForce, e.g. to ensure S3 method dispatch.

5. `transpose` gains `list.cols=` argument, [#5639](https://github.com/Rdatatable/data.table/issues/5639). Use this to return output with list columns and avoids type promotion (an exception is `factor` columns which are promoted to `character` for consistency between `list.cols=TRUE` and `list.cols=FALSE`). This is convenient for creating a row-major representation of a table. Thanks to @MLopez-Ibanez for the request, and Benjamin Schwendinger for the PR.

6. Using `dt[, names(.SD) := lapply(.SD, fx)]` now works, [#795](https://github.com/Rdatatable/data.table/issues/795) -- one of our [most-requested issues (see #3189)](https://github.com/Rdatatable/data.table/issues/3189). Thanks to @brodieG for the report, 20 or so others for chiming in, and @ColeMiller1 for PR.

7. `fread`'s `fill` argument now also accepts an `integer` in addition to boolean values. `fread` always guesses the number of columns based on reading a sample of rows in the file. When `fill=TRUE`, `fread` stops reading and ignores subsequent rows when this estimate winds up too low, e.g. when the sampled rows happen to exclude some rows that are even wider, [#2727](https://github.com/Rdatatable/data.table/issues/2727) [#2691](https://github.com/Rdatatable/data.table/issues/2691) [#4130](https://github.com/Rdatatable/data.table/issues/4130) [#3436](https://github.com/Rdatatable/data.table/issues/3436). Providing an `integer` as argument for `fill` allows for a manual estimate of the number of columns instead, [#1812](https://github.com/Rdatatable/data.table/issues/1812) [#5378](https://github.com/Rdatatable/data.table/issues/5378). Thanks to @jangorecki, @christellacaze, @Yiguan, @alexdthomas, @ibombonato, @Befrancesco, @TobiasGold for reporting/requesting, and Benjamin Schwendinger for the PR.
## BUG FIXES
1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
2. `dcast` handles coercion of `fill` to `integer64` correctly, [#4561](https://github.com/Rdatatable/data.table/issues/4561). Thanks to @emallickhossain for the bug report and @MichaelChirico for the fix.
3. Optimized `shift` per group produced wrong results when simultaneously subsetting, for example, `DT[i==1L, shift(x), by=group]`, [#5962](https://github.com/Rdatatable/data.table/issues/5962). Thanks to @renkun-ken for the report and Benjamin Schwendinger for the fix.
4. `dcast(fill=NULL)` only computes default fill value if necessary, which eliminates some previous warnings (for example, when fun.aggregate=min or max, warning was NAs introduced by coercion to integer range) which were potentially confusing, [#5512](https://github.com/Rdatatable/data.table/issues/5512), [#5390](https://github.com/Rdatatable/data.table/issues/5390). Thanks to Toby Dylan Hocking for the fix.
5. `fwrite(x, row.names=TRUE)` with `x` a `matrix` writes `row.names` when present, not row numbers, [#5315](https://github.com/Rdatatable/data.table/issues/5315). Thanks to @Liripo for the report, and @ben-schwen for the fix.
## NOTES
Expand All @@ -26,6 +48,14 @@
4. Erroneous assignment calls in `[` with a trailing comma (e.g. ``DT[, `:=`(a = 1, b = 2,)]``) get a friendlier error since this situation is common during refactoring and easy to miss visually. Thanks @MichaelChirico for the fix.
5. Input files are now kept open during `mmap()` when running under Emscripten, [emscripten-core/emscripten#20459](https://github.com/emscripten-core/emscripten/issues/20459). This avoids an error in `fread()` when running in WebAssembly, [#5969](https://github.com/Rdatatable/data.table/issues/5969). Thanks to @maek-ies for the report and @georgestagg for the PR.
6. `dcast()` message about `fun.aggregate` defaulting to `length()` when aggregation is necessary, which could be confusing if duplicates were unexpected, does better explaining the behavior and suggesting alternatives, [#5217](https://github.com/Rdatatable/data.table/issues/5217). Thanks @MichaelChirico for the suggestion and @Nj221102 for the fix.
7. Updated a test relying on `>` working for comparing language objects to a string, which will be deprecated by R, [#5977](https://github.com/Rdatatable/data.table/issues/5977); no user-facing effect. Thanks to R-core for continuously improving the language.
8. OpenMP detection when building from source on Mac is improved, [#4348](https://github.com/Rdatatable/data.table/issues/4348). Thanks @jameshester and @kevinushey for the request and @kevinushey for the PR, @jameslamb for the advice and @s-u of R-core for ensuring CRAN machines are configured to support the uxpected setup.
# data.table [v1.15.0](https://github.com/Rdatatable/data.table/milestone/29) (30 Jan 2024)
## BREAKING CHANGE
Expand Down
Loading

0 comments on commit cb61ff7

Please sign in to comment.