You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Without an explicit declaration of the columns to be returned from j in a non-equi join, the values used as conditions in the join can be changed without warning in the return.
Example code:
dt_lhs <- data.table(abc = LETTERS[1:5], num = 1:5)
dt_lhs[, `:=`(num_lwr = num - 1, num_upr = num + 1)]
dt_rhs <- data.table(xyz = rev(LETTERS)[1:5], num = 1:5)
# Version 1
dt_lhs[dt_rhs, on = .(num_lwr <= num, num_upr >= num), allow.cartesian = TRUE]
# Version 2
dt_lhs[dt_rhs, .(abc, x.num, x.num_lwr, x.num_upr, xyz),
on = .(num_lwr <= num, num_upr >= num), allow.cartesian = TRUE]
Version 1 yields this:
abc num num_lwr num_upr xyz
1: A 1 1 1 Z
2: B 2 1 1 Z
3: A 1 2 2 Y
4: B 2 2 2 Y
5: C 3 2 2 Y
6: B 2 3 3 X
7: C 3 3 3 X
8: D 4 3 3 X
9: C 3 4 4 W
10: D 4 4 4 W
11: E 5 4 4 W
12: D 4 5 5 V
13: E 5 5 5 V
and Version 2 yields this:
abc x.num x.num_lwr x.num_upr xyz
1: A 1 0 2 Z
2: B 2 1 3 Z
3: A 1 0 2 Y
4: B 2 1 3 Y
5: C 3 2 4 Y
6: B 2 1 3 X
7: C 3 2 4 X
8: D 4 3 5 X
9: C 3 2 4 W
10: D 4 3 5 W
11: E 5 4 6 W
12: D 4 3 5 V
13: E 5 4 6 V
This behavior was completely unexpected to me -- I think it's a bug unless I'm somehow missing a use-case. I stumbled on Arun's SO response here with a hint to explicitly declare the returned columns and can get a correct answer that way, but I think Version 1 and Version 2 would naturally have identical behavior...
As always, thank you.
The text was updated successfully, but these errors were encountered:
Thank you for reporting. It is a duplicate of #1700 (and many others already closed). We have pending PR for that, your are welcome to provide your review there to ensure it address your needs #3093.
Without an explicit declaration of the columns to be returned from
j
in a non-equi join, the values used as conditions in the join can be changed without warning in the return.Example code:
Version 1 yields this:
and Version 2 yields this:
This behavior was completely unexpected to me -- I think it's a bug unless I'm somehow missing a use-case. I stumbled on Arun's SO response here with a hint to explicitly declare the returned columns and can get a correct answer that way, but I think Version 1 and Version 2 would naturally have identical behavior...
As always, thank you.
The text was updated successfully, but these errors were encountered: