Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling join keeps only one on column, takes name from x table and values from i table #4005

Open
moodymudskipper opened this issue Oct 28, 2019 · 7 comments
Labels
joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins non-equi joins rolling, overlapping, non-equi joins

Comments

@moodymudskipper
Copy link

moodymudskipper commented Oct 28, 2019

This got me very confused :

library(data.table)
x <- data.table(x1=letters[1:3], x2=c(10,20,30))
y <- data.table(y1=letters[4:6], y2=c(11,21,31))
y2 <- x[y, roll = "nearest", on = .(x2 = y2)]
y2
#>    x1 x2 y1
#> 1:  a 11  d
#> 2:  b 21  e
#> 3:  c 31  f

I would much prefer to keep both x2 and y2, and if we must keep only one column I would much rather have the column name fit the original values.

The workaround I found looks quite awful, can I do better ?

x_ <- copy(x)
x_[, x3 := x2]
y2 <- x_[y, roll = "nearest", on = .(x3 = y2)]
y2[, y2 := x3]
y2[, x3 := NULL]
rm(x_)
y2
#>    x1 x2 y1 y2
#> 1:  a 10  d 11
#> 2:  b 20  e 21
#> 3:  c 30  f 31
@shrektan
Copy link
Member

A slightly better way to do this is:

library(data.table)
x <- data.table(x1=letters[1:3], x2=c(10,20,30))
y <- data.table(y1=letters[4:6], y2=c(11,21,31))
x[y, roll = "nearest", on = .(x2 = y2), .(x1, x2 = x.x2, y1, y2 = i.y2)]

@MichaelChirico
Copy link
Member

At a glance that's this PR?

#3093

@moodymudskipper
Copy link
Author

moodymudskipper commented Oct 28, 2019

@MichaelChirico yes, I think keeping both columns is the best solution, as this is a right join it feels both wrong to me to remove from the original table (x) and to the left table(i).

@shrektan, thanks, it does improve things here, in my real case I have many more columns and wish to keep all on both sides however so I don't think this can work.

@shrektan
Copy link
Member

I tweaked a little bit and hope it may help before the feature finally gets there. Haven't read #3093 through but I think the simplest implementation for this feature (well, occationally, I need this feature as well) is to just add an option in x[i, allcols = TRUE] ...

A more general wrapper...

library(data.table)
x <- data.table(x1=letters[1:3], x2=c(10,20,30))
y <- data.table(y1=letters[4:6], y2=c(11,21,31))
vars <- c(names(x), names(y))
vars[vars == "x2"] <- "x.x2"
vars[vars == "y2"] <- "i.y2"
x[y, roll = "nearest", on = .(x2 = y2), c(vars), with = FALSE]

@moodymudskipper
Copy link
Author

moodymudskipper commented Oct 28, 2019

Thanks @shrektan
Re the PR. allcols = TRUE sounds good but I think the default should be changed too, it doesn't make any sense to me.

@shrektan
Copy link
Member

Well, the default doesn't that bad... at least for the finance guys... like myself, because many cases that I encounter are more about "query" the x about "y" and most of the time I don't really need the common matched columns.

However, I incline to your opinion: either specifying the only desired columns or just return all the columns from x and y...

@jangorecki jangorecki added the joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins label Oct 30, 2019
@jangorecki jangorecki added the non-equi joins rolling, overlapping, non-equi joins label Apr 5, 2020
@matthewgson
Copy link

I second @moodymudskipper that naming convention on join is confusing and less intuitive.

Since DT[DT2] join is basically a right-join, users expect that the name and value of the right-table remain intact.
Currently, DT right join keeps the data from the right table (i) but takes the column name from the left table (x) which makes confusion.

I think the default behavior should change and be consistent that it also keeps the right column name as join result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
joins Use label:"non-equi joins" for rolling, overlapping, and non-equi joins non-equi joins rolling, overlapping, non-equi joins
Projects
None yet
Development

No branches or pull requests

5 participants