Skip to content

Commit

Permalink
Merge branch 'master' into implicit-return
Browse files Browse the repository at this point in the history
  • Loading branch information
aitap authored Feb 28, 2025
2 parents 677e84e + 3bb4e16 commit 57cef15
Show file tree
Hide file tree
Showing 14 changed files with 191 additions and 19 deletions.
4 changes: 1 addition & 3 deletions .dev/CRAN_Release.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,7 @@ potools::check_untranslated_src(message_db)
potools::po_extract(custom_translation_functions = dt_custom_translators)

# 2) Open a PR with the new templates & contact the translators
# * zh_CN: @hongyuanjia
# * pt_BR: @rffontenelle
# * es: @rikivillalba
# using @Rdatatable/<lang>, e.g. @Rdatatable/chinese
## Translators to submit commits with translations to this PR
## [or perhaps, if we get several languages, each to open
## its own PR and merge to main translation PR]
Expand Down
35 changes: 35 additions & 0 deletions .github/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery
* Personal attacks
* Trolling or insulting/derogatory comments
* Public or private harassment
* Publishing other's private information, such as physical or electronic addresses, without explicit permission
* Other unethical or unprofessional conduct

Project members with the Committer role have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

By adopting this Code of Conduct, project members commit themselves to fairly and consistently apply these principles to every aspect of managing this project. Project maintainers who do not follow or enforce the Code of Conduct may be permanently removed from the project team.

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.


## Reporting

Project members with the Committer role or the CRAN Maintainer role are pledged to promptly address any reported issues. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to any individual with this role.

Those who prefer to report in a way that is independent of the current Committers and Maintainer may instead contact the Community Engagement Coordinator by e-mailing [r.data.table\@gmail.com](mailto:[email protected]). Messages sent to this e-mail address will be visible only to the current Community Engagement Coordinator, a position always held by an individual who is not a Committer or CRAN Maintainer of the package.

The current Committers are Toby Dylan Hocking (@tdhock), Matt Dowle (@mattdowle), Arun Srinivasan (@arunsrinivasan), Jan Gorecki (@jangorecki), Michael Chirico (@MichaelChirico), and Benjamin Schwendinger (@ben-schwen).

The current CRAN Maintainer is Tyson Barrett (@tysonstanley).

The current Community Engagement Coordinator is Kelly Bodwin (@kbodwin).

All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. Complaint respondents are obligated to maintain confidentiality with regard to the reporter of an incident.

This Code of Conduct is adapted from the [Contributor Covenant, version 1.3.0](https://www.contributor-covenant.org/version/1/3/0/code-of-conduct/), available at [https://www.contributor-covenant.org/version/1/3/0/](https://www.contributor-covenant.org/version/1/3/0/), and the Swift Code of Conduct.
40 changes: 33 additions & 7 deletions GOVERNANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,14 @@ Functionality that is out of current scope:
* How to obtain this role: (1) merge into master a PR adding role="cre" to DESCRIPTION, and (2) submit updated package to CRAN (previous CRAN Maintainer will have to confirm change by email to CRAN).
* How this role is recognized: credited via role="cre" in DESCRIPTION, so they appear as Maintainer on CRAN.

## Community Engagement Coordinator

* Definition: An individual who is involved in the project but does **not** also occupy the Committer or CRAN Maintainer role. In charge of maintaining The Raft blog, preparing Seal of Approval Applications, addressing Code of Conduct violations, and planning social or community events.

* How to obtain this role: At the discretion of the current Community Engagement Coordinator(s) in conversation with the current Committers.

* How this role is recognized: Holds the Owner role in the [rdatatable-community organization](https://github.com/orgs/rdatatable-community/people) on GitHub.

# Decision-making processes

## Definition of Consensus
Expand All @@ -98,19 +106,35 @@ There is no special process for changing this document. Submit a PR and ask for

Please also make a note in the change log under [`# Governance history`](#governance-history)

# Code of conduct
# Finances and Funding

There is currently no mechanism for the data.table project to receive funding as an entity.

Funding support for this project therefore may come in two forms:

## Individual external funding

As contributors of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
Any individual developer or community member of data.table may apply for and receive funding for their work on the project. Individuals or groups seeking funding support are strongly encouraged to consult directly with the data.table Project Members (by initiating an Issue on GitHub) to ensure funds are used meaningfully. Formally, however, decisions about use of funds are governed by the individual grantee(s) and their contract with the funding agency.

We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, etc.
There is no guarantee that funded work will be incorporated into the data.table package; any contributions, whether funded or unfunded, are subject to the same review process as outlined above.

Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
## Direct donations

Committers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. A person with special roles who does not follow the Code of Conduct may have their roles revoked.
Direct donations to the project may be made via GitHub Sponsorships, which allow individuals to fund a specific developer. If the current CRAN Maintainer offers a personal sponsorship option, donations may be made to them to support the project in general.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or emailing one or more of the Committers.
## Decision-making for future opportunities

This Code of Conduct is adapted from Tidyverse code of conduct.
We here outline a procedure for disbursing funds, should this project in the future become a directly fundable entity (e.g. an LLC or a subsidiary of an umbrella LLC).

Funds acquired by the data.table project will be disbursed at the discretion of the **Committers**, defined as above. The **CRAN Maintainer** will have authority to make final decisions in the event that no consensus is reached among committers prior to deadlines for use of funds, and will be responsible for disbursement logistics.

# Code of conduct

The full Code of Conduct can be found [here](CODE_OF_CONDUCT.md), including details for reporting violations.

## Reporting Responsibility

Committers and the Community Engagement Coordinator pledge to address any publicly posted issues or privately sent messages regarding Code of Conduct violations in a respectful and timely manner.

# Version numbering

Expand All @@ -124,6 +148,8 @@ data.table Version line in DESCRIPTION typically has the following meanings

# Governance history

Feb 2025: add Finances and Funding section, update Code of Conduct section to be a brief summary and reference the broader CoC document.

Jan 2025: clarify that edits to governance should notify all committers, and that role names are proper nouns (i.e., upper-case) throughout.

Feb 2024: change team name/link maintainers to committers, to be consistent with role defined in governance.
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -206,3 +206,6 @@ S3method(format_list_item, data.frame)

export(fdroplevels, setdroplevels)
S3method(droplevels, data.table)

# sort_by added in R 4.4.0, #6662, https://stat.ethz.ch/pipermail/r-announce/2024/000701.html
if (getRversion() >= "4.4.0") S3method(sort_by, data.table)
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

# data.table [v1.17.99](https://github.com/Rdatatable/data.table/milestone/35) (in development)

## NEW FEATURES

1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.

# data.table [v1.17.0](https://github.com/Rdatatable/data.table/milestone/34) (20 Feb 2025)

Expand All @@ -20,7 +22,7 @@ rowwiseDT(
a=,b=,c=, d=,
1, 2, "a", 2:3,
3, 4, "b", list("e"),
5, 6, "c", ~a+b,
5, 6, "c", ~a+b
)
#> a b c d
#> <num> <num> <char> <list>
Expand Down
14 changes: 13 additions & 1 deletion R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2454,7 +2454,7 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
# same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency
# evaluate formula mirroring split.data.frame #5392. Mimics base::.formula2varlist.
if (inherits(f, "formula"))
f = eval(attr(terms(f), "variables"), x, environment(f))
f = formula_vars(f, x)
# be sure to use x[ind, , drop = FALSE], not x[ind], in case downstream methods don't follow the same subsetting semantics (#5365)
return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind, , drop = FALSE]))
}
Expand Down Expand Up @@ -2526,6 +2526,18 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
}
}

sort_by.data.table <- function(x, y, ...)
{
if (!cedta()) return(NextMethod()) # nocov
if (inherits(y, "formula"))
y <- formula_vars(y, x)
if (!is.list(y))
y <- list(y)
# use forder instead of base 'order'
o <- do.call(forder, c(unname(y), list(...)))
x[o, , drop=FALSE]
}

# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want

copy = function(x) {
Expand Down
2 changes: 1 addition & 1 deletion R/print.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
if (show.indices) toprint = cbind(toprint, index_dt)
}
require_bit64_if_needed(x)
classes = classes1(toprint)
toprint=format.data.table(toprint, na.encode=FALSE, timezone = timezone, ...) # na.encode=FALSE so that NA in character cols print as <NA>

# FR #353 - add row.names = logical argument to print.data.table
Expand All @@ -100,7 +101,6 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
factor = "<fctr>", POSIXct = "<POSc>", logical = "<lgcl>",
IDate = "<IDat>", integer64 = "<i64>", raw = "<raw>",
expression = "<expr>", ordered = "<ord>")
classes = classes1(x)
abbs = unname(class_abb[classes])
if ( length(idx <- which(is.na(abbs))) ) abbs[idx] = paste0("<", classes[idx], ">")
toprint = rbind(abbs, toprint)
Expand Down
8 changes: 8 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -212,3 +212,11 @@ rss = function() { #5515 #5517
round(ans / 1024.0, 1L) # return MB
# nocov end
}

formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
terms <- terms(f)
setNames(
eval(attr(terms, "variables"), x, environment(f)),
attr(terms, "term.labels")
)
}
1 change: 1 addition & 0 deletions Seal_of_Approval.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ Translates `data.table` syntax to a different syntax, or provides helper functio
Not necessarily directly connected to `data.table`, but deliberately follows the [core philosophies of `data.table`](https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package).

- [collapse](https://github.com/SebKrantz/collapse): Advanced and Fast Data Transformation in R.

34 changes: 34 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21042,3 +21042,37 @@ test(2304.100, set(copy(DT), i=2L, j=c("L1", "L2"), value=list(list(NULL), list(

# the integer overflow in #6729 is only noticeable with UBSan
test(2305, { fread(testDir("issue_6729.txt.bz2")); TRUE })

if (exists("sort_by", "package:base")) {
# sort_by.data.table
DT1 = data.table(a=c(1, 3, 2, NA, 3), b=4:0)
DT2 = data.table(a=c("c", "a", "B")) # data.table uses C-locale and should sort_by if cedta()
DT3 = data.table(a=c(1, 2, 3), b=list(c("a", "b", "", NA), c(1, 3, 2, 0), c(TRUE, TRUE, FALSE, NA))) # list column

# sort_by.data.table: basics
test(2306.01, sort_by(DT1, ~a + b), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
test(2306.02, sort_by(DT1, ~I(a + b)), data.table(a=c(3, 2, 1, 3, NA), b=c(0L, 2L, 4L, 3L, 1L)))
test(2306.03, sort_by(DT2, ~a), data.table(a=c("B", "a", "c")))

# sort_by.data.table: list columns.
# NOTE 1: .formula2varlist works well with list columns.
# NOTE 2: 4 elem in DT of 3 row because forderv takes a list column as a DT.
test(2306.04, sort_by(DT3, ~b), DT3[order(b)]) # should be consistent.

# sort_by.data.table: additional C-locale sorting
test(2306.10, DT2[, sort_by(.SD, a)], data.table(a=c("B", "a", "c")))
test(2306.11, DT2[, sort_by(.SD, ~a)], data.table(a=c("B", "a", "c")))

# sort_by.data.table: various working interfaces
test(2306.20, sort_by(DT1, list(DT1$a, DT1$b)), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
test(2306.21, sort_by(DT1, DT1[, .(a, b)]), data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
test(2306.22, DT1[, sort_by(.SD, .(a, b))], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
test(2306.23, DT1[, sort_by(.SD, ~a + b)], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
test(2306.24, DT1[, sort_by(.SD, ~.(a, b))], data.table(a=c(1, 2, 3, 3, NA), b=c(4L, 2L, 0L, 3L, 1L)))
}

DT <- data.table(a = 1:2, b = 2:1)
setindex(DT, b)
# make sure that print(DT) doesn't warn due to the header missing index column types, #6806
# can't use output= here because the print() call is outside withCallingHandlers(...)
test(2307, { capture.output(print(DT, class = TRUE, show.indices = TRUE)); TRUE })
13 changes: 10 additions & 3 deletions man/setorder.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
\alias{fastorder}
\alias{forder}
\alias{forderv}
\alias{sort_by}

\title{Fast row reordering of a data.table by reference}
\description{
Expand Down Expand Up @@ -32,6 +33,7 @@ setorderv(x, cols = colnames(x), order=1L, na.last=FALSE)
# optimised to use data.table's internal fast order
# x[order(., na.last=TRUE)]
# x[order(., decreasing=TRUE)]
# sort_by(x, ., na.last=TRUE, decreasing=FALSE) # R >= 4.4.0
}
\arguments{
\item{x}{ A \code{data.table}. }
Expand All @@ -46,7 +48,7 @@ when \code{b} is of type \code{character} as well. }
\code{order} must be either \code{1} or equal to that of \code{cols}. If
\code{length(order) == 1}, it is recycled to \code{length(cols)}. }
\item{na.last}{ \code{logical}. If \code{TRUE}, missing values in the data are placed last; if \code{FALSE}, they are placed first; if \code{NA} they are removed.
\code{na.last=NA} is valid only for \code{x[order(., na.last)]} and its
\code{na.last=NA} is valid only for \code{x[order(., na.last)]} and related \code{sort_by(x, .)} (\eqn{\R \ge 4.4.0}) and its
default is \code{TRUE}. \code{setorder} and \code{setorderv} only accept
\code{TRUE}/\code{FALSE} with default \code{FALSE}. }
}
Expand All @@ -71,8 +73,8 @@ sets the \code{sorted} attribute.

\code{na.last} argument, by default, is \code{FALSE} for \code{setorder} and
\code{setorderv} to be consistent with \code{data.table}'s \code{setkey} and
is \code{TRUE} for \code{x[order(.)]} to be consistent with \code{base::order}.
Only \code{x[order(.)]} can have \code{na.last = NA} as it is a subset operation
is \code{TRUE} for \code{x[order(.)]} and \code{sort_by(x, .)} (\eqn{\R \ge 4.4.0}) to be consistent with \code{base::order}.
Only \code{x[order(.)]} (and related \code{sort_by(x, .)}) can have \code{na.last = NA} as it is a subset operation
as opposed to \code{setorder} or \code{setorderv} which reorders the data.table
by reference.
Expand All @@ -96,6 +98,11 @@ was started in. By contrast, \code{"america" < "BRAZIL"} is always \code{FALSE}
If \code{setorder} results in reordering of the rows of a keyed \code{data.table},
then its key will be set to \code{NULL}.
Starting from \R 4.4.0, \code{sort_by(x, y, \dots)} is the S3 method for the generic \code{sort_by} for \code{data.table}'s.
It uses the same formula or list interfaces as data.frame's \code{sort_by} but internally uses \code{data.table}'s fast ordering,
hence it behaves the same as \code{x[order(.)]} and takes the same optional named arguments and their defaults.

}
\value{
The input is modified by reference, and returned (invisibly) so it can be used
Expand Down
5 changes: 5 additions & 0 deletions src/dogroups.c
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,11 @@ SEXP growVector(SEXP x, const R_len_t newlen)
if (isNull(x)) error(_("growVector passed NULL"));
PROTECT(newx = allocVector(TYPEOF(x), newlen)); // TO DO: R_realloc(?) here?
if (newlen < len) len=newlen; // i.e. shrink
if (!len) { // cannot memcpy invalid pointer, #6819
keepattr(newx, x);
UNPROTECT(1);
return newx;
}
switch (TYPEOF(x)) {
case RAWSXP: memcpy(RAW(newx), RAW(x), len*SIZEOF(x)); break;
case LGLSXP: memcpy(LOGICAL(newx), LOGICAL(x), len*SIZEOF(x)); break;
Expand Down
11 changes: 8 additions & 3 deletions src/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,14 @@ SEXP copyAsPlain(SEXP x) {
}
const int64_t n = XLENGTH(x);
SEXP ans = PROTECT(allocVector(TYPEOF(x), n));
// aside: unlike R's duplicate we do not copy truelength here; important for dogroups.c which uses negative truelenth to mark its specials
if (ALTREP(ans))
internal_error(__func__, "copyAsPlain returning ALTREP for type '%s'", type2char(TYPEOF(x))); // # nocov
if (!n) { // cannot memcpy invalid pointer, #6819
DUPLICATE_ATTRIB(ans, x);
UNPROTECT(1);
return ans;
}
switch (TYPEOF(x)) {
case RAWSXP:
memcpy(RAW(ans), RAW(x), n*sizeof(Rbyte));
Expand Down Expand Up @@ -250,9 +258,6 @@ SEXP copyAsPlain(SEXP x) {
internal_error(__func__, "type '%s' not supported in %s", type2char(TYPEOF(x)), "copyAsPlain()"); // # nocov
}
DUPLICATE_ATTRIB(ans, x);
// aside: unlike R's duplicate we do not copy truelength here; important for dogroups.c which uses negative truelenth to mark its specials
if (ALTREP(ans))
internal_error(__func__, "copyAsPlain returning ALTREP for type '%s'", type2char(TYPEOF(x))); // # nocov
UNPROTECT(1);
return ans;
}
Expand Down
Loading

0 comments on commit 57cef15

Please sign in to comment.