Skip to content

Commit

Permalink
Fix transform slowness (#5493)
Browse files Browse the repository at this point in the history
* Fix 5492 by limiting the costly deparse to `nlines=1`

* Implementing PR feedbacks

* Added  inside

* Fix typo in name

* Idiomatic use of  inside

* Separating the deparse line limit to a different PR

---------

Co-authored-by: Michael Chirico <[email protected]>
  • Loading branch information
OfekShilon and MichaelChirico committed Feb 17, 2024
1 parent bf49909 commit 0d24afb
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 22 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

# data.table [v1.15.99](https://github.com/Rdatatable/data.table/milestone/30) (in development)

## NOTES

1. `transform` method for data.table sped up substantially when creating new columns on large tables. Thanks to @OfekShilon for the report and PR. The implemented solution was proposed by @ColeMiller1.

# data.table [v1.15.0](https://github.com/Rdatatable/data.table/milestone/29) (30 Jan 2024)

## BREAKING CHANGE
Expand Down
23 changes: 4 additions & 19 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2345,25 +2345,10 @@ transform.data.table = function (`_data`, ...)
# basically transform.data.frame with data.table instead of data.frame, and retains key
{
if (!cedta()) return(NextMethod()) # nocov
e = eval(substitute(list(...)), `_data`, parent.frame())
tags = names(e)
inx = chmatch(tags, names(`_data`))
matched = !is.na(inx)
if (any(matched)) {
.Call(C_unlock, `_data`) # fix for #1641, now covered by test 104.2
`_data`[,inx[matched]] = e[matched]
`_data` = as.data.table(`_data`)
}
if (!all(matched)) {
ans = do.call("data.table", c(list(`_data`), e[!matched]))
} else {
ans = `_data`
}
key.cols = key(`_data`)
if (!any(tags %chin% key.cols)) {
setattr(ans, "sorted", key.cols)
}
ans
`_data` = copy(`_data`)
e = eval(substitute(list(...)), `_data`, parent.frame())
set(`_data`, ,names(e), e)
`_data`
}

subset.data.table = function (x, subset, select, ...)
Expand Down
4 changes: 1 addition & 3 deletions man/transform.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@
\description{
Utilities for \code{data.table} transformation.

\strong{\code{transform} by group is particularly slow. Please use \code{:=} by group instead.}

\code{within}, \code{transform} and other similar functions in \code{data.table} are not just provided for users who expect them to work, but for non-data.table-aware packages to retain keys, for example. Hopefully the (much) faster and more convenient \code{data.table} syntax will be used in time. See examples.
\code{within}, \code{transform} and other similar functions in \code{data.table} are not just provided for users who expect them to work, but for non-data.table-aware packages to retain keys, for example. Hopefully the faster and more convenient \code{data.table} syntax will be used in time. See examples.
}
\usage{
\method{transform}{data.table}(`_data`, \ldots)
Expand Down

0 comments on commit 0d24afb

Please sign in to comment.