From fd0942ab7da9237c891b90f7d631335e30d5e5cc Mon Sep 17 00:00:00 2001
From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
Date: Thu, 27 Apr 2023 19:57:41 +0200
Subject: [PATCH 1/2] add subset and dogroups support for columns of type
 expression

---
 NEWS.md               | 2 ++
 inst/tests/tests.Rraw | 5 +++++
 src/dogroups.c        | 2 +-
 src/subset.c          | 2 +-
 4 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/NEWS.md b/NEWS.md
index 025a7651b..4fe5ac59b 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -561,6 +561,8 @@
     identical(DT1, DT2)                  # TRUE
     ```
 
+55. Subsetting or aggregating columns of type `expression` does now work [#5596](https://github.com/Rdatatable/data.table/issues/5596). Thanks to @tsp for the report, and @ben-schwen for the fix.
+
 ## NOTES
 
 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
index 9117c0fcb..e339c8d08 100644
--- a/inst/tests/tests.Rraw
+++ b/inst/tests/tests.Rraw
@@ -18094,3 +18094,8 @@ test(2238.6, "a" %notin% integer(), TRUE)
 test(2238.7, "a" %notin% NULL, TRUE)
 test(2238.8, NA %notin% 1:5, TRUE)
 test(2238.9, NA %notin% c(1:5, NA), FALSE)
+
+# support column type 'expression' #5596
+dt = data.table(a=1:2, b=expression(1,2))
+test(2239.1, dt[1,], data.table(a=1L, b=expression(1)))
+test(2239.2, dt[,b,a], dt)
diff --git a/src/dogroups.c b/src/dogroups.c
index 5ddd1f672..cf5e80153 100644
--- a/src/dogroups.c
+++ b/src/dogroups.c
@@ -136,7 +136,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
 
   for(int i=0; i<length(SDall); ++i) {
     SEXP this = VECTOR_ELT(SDall, i);
-    if (SIZEOF(this)==0)
+    if (SIZEOF(this)==0 && TYPEOF(this)!=EXPRSXP) // allow expr type columns in data.table #5596
       error(_("Internal error: size-0 type %d in .SD column %d should have been caught earlier"), TYPEOF(this), i); // # nocov
     if (LENGTH(this) != maxGrpSize)
       error(_("Internal error: SDall %d length = %d != %d"), i+1, LENGTH(this), maxGrpSize); // # nocov
diff --git a/src/subset.c b/src/subset.c
index 215845179..8f82a3225 100644
--- a/src/subset.c
+++ b/src/subset.c
@@ -79,7 +79,7 @@ void subsetVectorRaw(SEXP ans, SEXP source, SEXP idx, const bool anyNA)
       for (int i=0; i<n; i++) {                     SET_STRING_ELT(ans, i, sp[idxp[i]-1]); }
     }
   } break;
-  case VECSXP : {
+  case VECSXP: case EXPRSXP: {
     const SEXP *sp = SEXPPTR_RO(source);
     if (anyNA) {
       for (int i=0; i<n; i++) { int elem = idxp[i]; SET_VECTOR_ELT(ans, i, elem==NA_INTEGER ? R_NilValue : sp[elem-1]); }

From 2d019e51a443527c3ad003e3ff561ec553739ad7 Mon Sep 17 00:00:00 2001
From: Michael Chirico <michaelchirico4@gmail.com>
Date: Sun, 8 Sep 2024 12:18:44 -0700
Subject: [PATCH 2/2] narrow NEWS item

---
 NEWS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/NEWS.md b/NEWS.md
index c1a2610ea..872b847ac 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -25,7 +25,7 @@ rowwiseDT(
 #> 3:     5     6      c ~a + b
 ```
 
-2. Subsetting or aggregating columns of type `expression` works, [#5596](https://github.com/Rdatatable/data.table/issues/5596). Thanks to @tsp for the report, and @ben-schwen for the fix.
+2. Limited support for subsetting or aggregating columns of type `expression`, [#5596](https://github.com/Rdatatable/data.table/issues/5596). Thanks to @tsp for the report, and @ben-schwen for the fix.
 
 ## BUG FIXES