Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional left join | left table column data is wrong #6495

Closed
dimuth opened this issue Sep 13, 2024 · 3 comments
Closed

Conditional left join | left table column data is wrong #6495

dimuth opened this issue Sep 13, 2024 · 3 comments

Comments

@dimuth
Copy link

dimuth commented Sep 13, 2024

Left table

internal_id   age    name value       date
                       <char> <num>  <char> <num>     <Date>
1: 01J7MGV511S3852XQNYADS8AAF     1   Alice   100 2024-08-01
2: 01J7MGV5118XVW8APQWESGHT90     2     Bob   150 2024-08-02
3: 01J7MGV511Y6F0KFT30H0XR07S     3 Charlie   200 2024-08-03
4: 01J7MGV511BEF4G7BKD3S5N7V0     4   David   250 2024-08-04
5: 01J7MGV511ENGEPZDW39JCRBYW     5     Eve   300 2024-08-05

Right table

age_col score event_date event_date_p
     <num> <num>     <Date>       <Date>
1:       2    85 2024-08-02   2024-08-02
2:       3    90 2024-08-03   2024-08-03
3:       4   300 2024-08-04   2024-08-04
4:       6    80 2024-08-06   2024-08-06
5:       7    70 2024-08-07   2024-08-07
6:       8    56 2024-08-07   2024-08-07
7:       9    75 2024-08-07   2024-08-07
8:      10    70 2024-08-07   2024-08-07

Joined table

internal_id   age    name value       date age_col score event_date_p
                       <char> <num>  <char> <num>     <Date>   <num> <num>       <Date>
1: 01J7MHDQ9E7CZ6S7M12SFTB0BT     2     Bob   150 2024-08-02       2    85   2024-08-02
2: 01J7MHDQ9ER0Y95DDZQBHDPX5T     3 Charlie   200 2024-08-02       2    85   2024-08-02
3: 01J7MHDQ9EQNMENZ88YCTXZJX4     4   David   250 2024-08-02       2    85   2024-08-02
4: 01J7MHDQ9EXCSE878D653KPYWH     5     Eve   300 2024-08-02       2    85   2024-08-02
5: 01J7MHDQ9ER0Y95DDZQBHDPX5T     3 Charlie   200 2024-08-03       3    90   2024-08-03
6: 01J7MHDQ9EQNMENZ88YCTXZJX4     4   David   250 2024-08-03       3    90   2024-08-03
7: 01J7MHDQ9EXCSE878D653KPYWH     5     Eve   300 2024-08-03       3    90   2024-08-03
8: 01J7MHDQ9EQNMENZ88YCTXZJX4     4   David   250 2024-08-04       4   300   2024-08-04
9: 01J7MHDQ9EXCSE878D653KPYWH     5     Eve   300 2024-08-04       4   300   2024-08-04

Joined_table <- left_table[right_table, on = .(date >= event_date), nomatch = 0L]

You can see the date columns of the joined table is different than expected. Seems it is a copy of the relevant column (event_date) of right table not from the left table (date).

Highly appreciate your guidance to fix this issue.

@myoung3
Copy link
Contributor

myoung3 commented Sep 13, 2024

So we can get a better understanding of the issue, please provide some code to reproduce the tables as well as the code you used to make the join. If you have the objects assigned in your session, you can pass the object to the dput() function and R will print portable code which will re-create the table in a new session. Paste that here and we can take a look. Thanks!

@dimuth
Copy link
Author

dimuth commented Sep 13, 2024

Thanks @myoung3 I updated the issue

@myoung3
Copy link
Contributor

myoung3 commented Sep 13, 2024

without the dput output this still isn't reproducible, but now that I see this is a non-equi/inequality join I think I understand and this is known result of being consistent with how join columns are named in base R, which is especially confusing when performing an non-equi join (which doesn't even exist in base R).

There is a possible PR in progress is #3093, but in the meantime I suggest just making a copy of the column you want to keep and joining on that. Something like this:

left_table[, date_copy := date]
Joined_table <- left_table[right_table, on = .(date_copy >= event_date), nomatch = 0L]

Since this is a duplicate of #1615 (and others, linked in the PR), I suggest closing.

@dimuth dimuth closed this as completed Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants