You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
pyarrow is a massive, monolithic dependency. It can be hard to install in some places, and can't currently be installed in Pyodide. It's certainly a monumental effort to get it to work in Pyodide, but I think it would be valuable for lonboard to wean off of pyarrow.
The core enabling factor here is the Arrow PyCapsule Interface. It allows Python Arrow libraries to exchange Arrow data at the C level at no cost. This means that we can interface at no cost with any user who's already using pyarrow, but not be required to use pyarrow ourselves. I've been promoting its use throughout the Python Arrow ecosystem (apache/arrow#39195 (comment)), and hoping this grows into something as core to tabular data processing as the buffer protocol is to numpy.
As part of working to build the ecosystem, I created arro3, a new, very minimal Python Arrow implementation that wraps the Rust Arrow implementation.
I think that it should be possible to swap out pyarrow for arro3, which is about 1% of the normal pyarrow installation size.
It's also symbiotic for the ecosystem if Lonboard shows the benefits of modular Arrow libraries in Python.
Describe the solution you'd like
We'll keep pyarrow as a required dependency for GeoPandas/Pandas interop. pyarrow has implemented pyarrow.Table.from_pandas and that's not something I want to even think about replicating.
But aside from that, pretty much everything is doable in arro3 and geoarrow-rust.
Is your feature request related to a problem? Please describe.
pyarrow is a massive, monolithic dependency. It can be hard to install in some places, and can't currently be installed in Pyodide. It's certainly a monumental effort to get it to work in Pyodide, but I think it would be valuable for lonboard to wean off of pyarrow.
The core enabling factor here is the Arrow PyCapsule Interface. It allows Python Arrow libraries to exchange Arrow data at the C level at no cost. This means that we can interface at no cost with any user who's already using pyarrow, but not be required to use pyarrow ourselves. I've been promoting its use throughout the Python Arrow ecosystem (apache/arrow#39195 (comment)), and hoping this grows into something as core to tabular data processing as the buffer protocol is to numpy.
As part of working to build the ecosystem, I created arro3, a new, very minimal Python Arrow implementation that wraps the Rust Arrow implementation.
I think that it should be possible to swap out pyarrow for arro3, which is about 1% of the normal pyarrow installation size.
It's also symbiotic for the ecosystem if Lonboard shows the benefits of modular Arrow libraries in Python.
Describe the solution you'd like
We'll keep pyarrow as a required dependency for GeoPandas/Pandas interop. pyarrow has implemented
pyarrow.Table.from_pandas
and that's not something I want to even think about replicating.But aside from that, pretty much everything is doable in arro3 and geoarrow-rust.
pa.Table.from_arrays
CLI only:
Other notes:
The text was updated successfully, but these errors were encountered: