Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result from non-equi join without explicit declaration of columns in j #4426

Closed
sheffe opened this issue May 4, 2020 · 1 comment
Labels
duplicate non-equi joins rolling, overlapping, non-equi joins

Comments

@sheffe
Copy link

sheffe commented May 4, 2020

Without an explicit declaration of the columns to be returned from j in a non-equi join, the values used as conditions in the join can be changed without warning in the return.

Example code:

dt_lhs <- data.table(abc = LETTERS[1:5], num = 1:5)
dt_lhs[, `:=`(num_lwr = num - 1, num_upr = num + 1)]
dt_rhs <- data.table(xyz = rev(LETTERS)[1:5], num = 1:5)

# Version 1
dt_lhs[dt_rhs, on = .(num_lwr <= num, num_upr >= num), allow.cartesian = TRUE]

# Version 2
dt_lhs[dt_rhs, .(abc, x.num, x.num_lwr, x.num_upr, xyz), 
        on = .(num_lwr <= num, num_upr >= num), allow.cartesian = TRUE]

Version 1 yields this:

    abc num num_lwr num_upr xyz
 1:   A   1       1       1   Z
 2:   B   2       1       1   Z
 3:   A   1       2       2   Y
 4:   B   2       2       2   Y
 5:   C   3       2       2   Y
 6:   B   2       3       3   X
 7:   C   3       3       3   X
 8:   D   4       3       3   X
 9:   C   3       4       4   W
10:   D   4       4       4   W
11:   E   5       4       4   W
12:   D   4       5       5   V
13:   E   5       5       5   V

and Version 2 yields this:

    abc x.num x.num_lwr x.num_upr xyz
 1:   A     1         0         2   Z
 2:   B     2         1         3   Z
 3:   A     1         0         2   Y
 4:   B     2         1         3   Y
 5:   C     3         2         4   Y
 6:   B     2         1         3   X
 7:   C     3         2         4   X
 8:   D     4         3         5   X
 9:   C     3         2         4   W
10:   D     4         3         5   W
11:   E     5         4         6   W
12:   D     4         3         5   V
13:   E     5         4         6   V

This behavior was completely unexpected to me -- I think it's a bug unless I'm somehow missing a use-case. I stumbled on Arun's SO response here with a hint to explicitly declare the returned columns and can get a correct answer that way, but I think Version 1 and Version 2 would naturally have identical behavior...

As always, thank you.

@jangorecki jangorecki added duplicate non-equi joins rolling, overlapping, non-equi joins labels May 4, 2020
@jangorecki
Copy link
Member

jangorecki commented May 4, 2020

Thank you for reporting. It is a duplicate of #1700 (and many others already closed). We have pending PR for that, your are welcome to provide your review there to ensure it address your needs #3093.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate non-equi joins rolling, overlapping, non-equi joins
Projects
None yet
Development

No branches or pull requests

2 participants