Skip to content

Commit

Permalink
rbindlist segfault for fill=TRUE and usenames=FALSE (#5468)
Browse files Browse the repository at this point in the history
* add fix for fill=TRUE and usenames=FALSE
  • Loading branch information
ben-schwen authored Dec 8, 2023
1 parent 05a1be8 commit 1b130ef
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 12 deletions.
8 changes: 4 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@
# v1.14.4 0.4826 0.5586 0.6586 0.6329 0.7348 1.318 100
```

31. `rbind()` and `rbindlist()` now support `fill=TRUE` with `use.names=FALSE` instead of issuing the warning `use.names= cannot be FALSE when fill is TRUE. Setting use.names=TRUE.`
31. `rbind()` and `rbindlist()` now support `fill=TRUE` with `use.names=FALSE` instead of issuing the warning `use.names= cannot be FALSE when fill is TRUE. Setting use.names=TRUE.`, [#5444](https://github.com/Rdatatable/data.table/issues/5444). Thanks to @sindribaldur for testing dev and filing a bug report which was fixed before release.

```R
DT1
Expand Down Expand Up @@ -249,7 +249,7 @@
# 3: 3 NA
# 4: 4 NA
```

32. `fread()` already made a good guess as to whether column names are present by comparing the type of the fields in row 1 to the type of the fields in the sample. This guess is now improved when a column contains a string in row 1 (i.e. a potential column name) but all blank in the sample rows, [#2526](https://github.com/Rdatatable/data.table/issues/2526). Thanks @st-pasha for reporting, and @ben-schwen for the PR.

33. `fread()` can now read `.zip` and `.tar` directly, [#3834](https://github.com/Rdatatable/data.table/issues/3834). Moreover, if a compressed file name is missing its extension, `fread()` now attempts to infer the correct filetype from its magic bytes. Thanks to Michael Chirico for the idea, and Benjamin Schwendinger for the PR.
Expand All @@ -265,7 +265,7 @@
# 1: 1 3 a
# 2: 2 4 b
```

35. `weighted.mean()` is now optimised by group, [#3977](https://github.com/Rdatatable/data.table/issues/3977). Thanks to @renkun-ken for requesting, and Benjamin Schwendinger for the PR.

36. `as.xts.data.table()` now supports non-numeric xts coredata matrixes, [5268](https://github.com/Rdatatable/data.table/issues/5268). Existing numeric only functionality is supported by a new `numeric.only` parameter, which defaults to `TRUE` for backward compatability and the most common use case. To convert non-numeric columns, set this parameter to `FALSE`. Conversions of `data.table` columns to a `matrix` now uses `data.table::as.matrix`, with all its performance benefits. Thanks to @ethanbsmith for the report and fix.
Expand All @@ -282,7 +282,7 @@
# <int> <int>
# 1: 3 5
# 2: 4 6

DT[, sum(.SD), by=.I]
# I V1
# <int> <int>
Expand Down
5 changes: 4 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -14336,7 +14336,10 @@ test(2003.3, rbindlist(list(data.table(a=1:2), data.table(b=3:4)), fill=TRUE, us
test(2003.4, rbindlist(list(data.table(a=1:2,c=5:6), data.table(b=3:4)), fill=TRUE, use.names=FALSE),
data.table(a=c(1:4), c=INT(5,6,NA,NA)))
test(2003.5, rbindlist(list(data.table(a=1:2), data.table(b=3:4, c=5:6)), fill=TRUE, use.names=FALSE),
data.table(a=c(1:4), V1=INT(NA,NA,5,6)))
data.table(a=c(1:4), c=INT(NA,NA,5,6)))
# rbindlist segfault with fill=TRUE and usenames=FALSE #5444
test(2003.6, rbindlist(list(list(1), list(2,3)), fill=TRUE, use.names=FALSE), data.table(c(1,2), c(NA, 3)))
test(2003.7, rbindlist(list(list(1), list(2,factor(3))), fill=TRUE, use.names=FALSE), data.table(c(1,2), factor(c(NA, 3))))

# chmatch coverage for two different non-ascii encodings matching; issues mentioned in comments in chmatch.c #69 #2538 #111
x1 = "fa\xE7ile"
Expand Down
16 changes: 9 additions & 7 deletions src/rbindlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
for (int i=0; i<LENGTH(l); ++i) {
SEXP li = VECTOR_ELT(l, i);
if (!length(li)) continue;
int w = usenames ? colMap[i*ncol + j] : j; // colMap tells us which item to fetch for each of the final result columns, so we can stack column-by-column
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // colMap tells us which item to fetch for each of the final result columns, so we can stack column-by-column // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) continue; // column j of final result has no input from this item (fill must be true)
if (!foundName) {
SEXP cn=PROTECT(getAttrib(li, R_NamesSymbol));
Expand Down Expand Up @@ -333,9 +333,10 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
// before the savetl_init() because we have no hook to clean up tl if coerceVector fails.
if (coercedForFactor==NULL) { coercedForFactor=PROTECT(allocVector(VECSXP, LENGTH(l))); nprotect++; }
for (int i=0; i<LENGTH(l); ++i) {
int w = usenames ? colMap[i*ncol + j] : j;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) continue;
SEXP thisCol = VECTOR_ELT(VECTOR_ELT(l, i), w);
SEXP thisCol = VECTOR_ELT(li, w);
if (!isFactor(thisCol) && !isString(thisCol)) {
SET_VECTOR_ELT(coercedForFactor, i, coerceVector(thisCol, STRSXP));
}
Expand Down Expand Up @@ -366,9 +367,10 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
SET_TRUELENGTH(s,-k-1);
}
for (int i=0; i<LENGTH(l); ++i) {
int w = usenames ? colMap[i*ncol + j] : j;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) continue;
SEXP thisCol = VECTOR_ELT(VECTOR_ELT(l, i), w);
SEXP thisCol = VECTOR_ELT(li, w);
if (isOrdered(thisCol)) {
SEXP levels = getAttrib(thisCol, R_LevelsSymbol);
const SEXP *levelsD = STRING_PTR(levels);
Expand Down Expand Up @@ -403,7 +405,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
const int thisnrow = eachMax[i];
SEXP li = VECTOR_ELT(l, i);
if (!length(li)) continue; // NULL items in the list() of DT/DF; not if thisnrow==0 because we need to retain (unused) factor levels (#3508)
int w = usenames ? colMap[i*ncol + j] : j;
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
if (w==-1) {
writeNA(target, ansloc, thisnrow, false);
} else {
Expand Down Expand Up @@ -508,7 +510,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg)
const int thisnrow = eachMax[i];
if (thisnrow==0) continue;
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : j;
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
SEXP thisCol;
if (w==-1 || !length(thisCol=VECTOR_ELT(li, w))) { // !length for zeroCol warning above; #1871
writeNA(target, ansloc, thisnrow, false); // writeNA is integer64 aware and writes INT64_MIN
Expand Down

0 comments on commit 1b130ef

Please sign in to comment.