-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive RAM usage for DBI::dbWriteTable()
and dplyr::collect()
#97
Comments
I have started to investigate memory consumption with I'll also review memory usage for reading the table, might reorganize that repository a bit. |
dplyr::collect()
DBI::dbWriteTable()
and dplyr::collect()
Added an analysis of the reading behavior to https://github.com/krlmlr/duckdb-mem. Indeed, it seems that reading also consumes at least twice the size of the data read, which is surprising. We'll need to trace memory allocations to understand what's going on here. This is likely an issue in the glue, whereas the |
Thanks for looking into this. I looked at the https://github.com/krlmlr/duckdb-mem, repo and I didnt see a reading scenario that includes the to_arrow() function like in that comment. I was under the impression that it only requires 16GB for reading the 16GB file instead of 32GB. TIL in your repo that you can use "/usr/bin/time" to know how much memory a process used at its peak so I'll go check if my to_arrow() solution actually works. bigdata <- tbl(con, "straight_from_memory") %>% collect() # 32 GB peak |
Thanks, added Arrow. The usage is still 2x the data size, unfortunately. |
32GB RAM use when writing a 16GB file, and also RAM use reaching 32GB momentarily when reading the same 16GB file. From #72 (comment), by @SimonCoulombe.
The text was updated successfully, but these errors were encountered: