Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table join operation 2x slower #69

Open
tdhock opened this issue Nov 15, 2024 · 1 comment
Open

data.table join operation 2x slower #69

tdhock opened this issue Nov 15, 2024 · 1 comment

Comments

@tdhock
Copy link
Owner

tdhock commented Nov 15, 2024

@DorisAmoakohene please create an atime test case and PR for Rdatatable/data.table#3928 (comment)

@DorisAmoakohene
Copy link
Collaborator

DorisAmoakohene commented Nov 18, 2024

@tdhock


atime.list.3928 <- atime::atime_versions(
  pkg.path=tdir,
  pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
    pkg_find_replace <- function(glob, FIND, REPLACE){
      atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
    }
    Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
    Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
    new.Package_ <- paste0(Package_, "_", sha)
    pkg_find_replace(
      "DESCRIPTION", 
      paste0("Package:\\s+", old.Package),
      paste("Package:", new.Package))
    pkg_find_replace(
      file.path("src","Makevars.*in"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      sprintf('packageVersion\\("%s"\\)', old.Package),
      sprintf('packageVersion\\("%s"\\)', new.Package))
    pkg_find_replace(
      file.path("src", "init.c"),
      paste0("R_init_", Package_regex),
      paste0("R_init_", gsub("[.]", "_", new.Package_)))
    pkg_find_replace(
      "NAMESPACE",
      sprintf('useDynLib\\("?%s"?', Package_regex),
      paste0('useDynLib(', new.Package_))
  },
  N=10^seq(1,3),
  setup={ 
    setDTthreads(1)  
    aa <- data.table(a = seq(1, 100), b = rep(0, 100))
    bb <- data.table(a = seq(1, 100), b = rep(1, 100))
  },

  expr=data.table:::`[.data.table`(aa, bb, b := i.b, on = .(a)),
  #"data.table_1.10.4-"="8b201fd28f5d4afcc4be026a5d9eb4bb6dd62955", #"https://github.com/Rdatatable/data.table/commit/8b201fd28f5d4afcc4be026a5d9eb4bb6dd62955
  "data.table_1.12.2"="86034855f9b305e948d83014af89352fc42e27f2"  #https://github.com/Rdatatable/data.table/commit/86034855f9b305e948d83014af89352fc42e27f2
) 


Error in atime_versions_install(Package, normalizePath(pkg.path), new.Package.vec,  : 
  "C:/PROGRA~1/R/R-44~1.1/bin/R" CMD INSTALL -l "C:/Users/amoak/AppData/Local/R/win-library/4.4" C:\Users\amoak\AppData\Local\Temp\RtmpigoXAp\file832c5ae9668c/file832cef66fdf.86034855f9b305e948d83014af89352fc42e27f2 returned error status code 1

the issues says data.table_1.10.4-3 + R version 3.4.0 (2017-04-21)
vs
data.table_1.12.2 + R version 3.6.1 (2019-07-05)

and have noticed that join operation almost 2 times slower in new version data.table (R?)
I think mostly depends on version of data.table,

I am not able to install the data.table versions for the issues above. This issue has come up before last semester
just to give heads up, I will try and install these version of R and run it on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants