-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative DataFrame class(es) for OOC + speed #137
Comments
BTW pandas 2.0 will have a pyarrow backend... I wonder how that will work for bioframe. |
Yup, I've already opened issues around the release candidate😅. Not actually that sure how much the current pyarrow backend is changing, or if it's just not experimental anymore. But, while pyarrow will probably have better performance than pandas (especially with strings), I think backends like duckdb or polars have the much larger benefit of being able to work with out-of-core data efficiently. |
I am collaborating with the bioframe authors on this project (not in a usable state yet): https://github.com/endrebak/poranges |
Related to this a request for input on defining a dataframe standard: https://data-apis.org/blog/dataframe_standard_rfc/ |
Hey all,
I was wondering if you had considered supporting alternative dataframe classes in this library? In particular I was thinking about the lazy/ accelerated ones built on arrow (e.g. polars, datafusion).
I would hope that the current API could be amenable to this by
@singledispatch
ing functions to different backends. It could also be nice to take advantage of a backend that was able to make work with out-of-core amounts of data and do optimizations based column order.I've also been having a good time interacting with annotation resources via
ibis
which could integrate nicely with this kind of approach.The text was updated successfully, but these errors were encountered: