Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use arrow format as the memory model #5670

Closed
waynelapierre opened this issue Jul 29, 2023 · 4 comments
Closed

use arrow format as the memory model #5670

waynelapierre opened this issue Jul 29, 2023 · 4 comments

Comments

@waynelapierre
Copy link

As the Python packages pandas and polars use arrow format as the memory model, any plan of having it in data.table?

@jangorecki
Copy link
Member

jangorecki commented Jul 30, 2023

I doubt. Algos in DT are designed for particular memory layout that R uses for data.frame. Switching this to arrow means that we cannot easily switch between DF and DT without doing a copy. And of course a lot of code rewriting. That of course could be considered, but what benefits do you expect? Without presenting a good reason for that it is rather unlikely.
Unfortunately portability for binary formats is a myth. At least till date, for feather arrow parquet etc.

@eddelbuettel
Copy link
Contributor

Duplicate of #5656

A couple of months ago I looked into arrow -> data.table exporting. I couldn't find anything obvious as the two memory models are simply very different there is no way I can see around a one-time copying / materialization cost.

@eitsupi
Copy link
Contributor

eitsupi commented Aug 5, 2023

It should also be noted that the Arrow libraries (Arrow C++ or arrow-rs or the arrow2 crate) can be much more expensive to build than the current data.table.
The Pandas dev team discussed whether to include pyarrow as a required dependency (pyarrow is very difficult to build from source).

The polars R package has been removed from CRAN due to build time issue (In other words, building polars takes longer than any R package on CRAN.).

@jangorecki
Copy link
Member

AFAIR @eddelbuettel can be good person to comment about building R's arrow package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants