-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support int_range(s) in streaming mode #20015
Comments
The behavior in the example can be approximated with: ...
# get indices
lf = lf.with_columns(
indices=pl.lit(list(range(1, 32)))
).explode("indices").filter(
pl.col("indices") <= pl.col("messages").list.len()
) But only if you know the maximum length of that list. Otherwise, I don't think this is possible without UDFs. |
This has come up multiple times for me, would also find it very useful |
It works for me on the new streaming engine: #20947 import polars as pl
import os
os.environ["POLARS_FORCE_NEW_STREAMING"] = "1"
lf = pl.LazyFrame({"messages": [[1, 2], [3, 4, 5], [4], [78, 9, 10]]})
(
lf.with_columns(
indices=pl.int_ranges(pl.col("messages").list.len())
)
.explode("indices")
.select(
messages=pl.col("messages").list.slice(0, pl.col("indices"))
)
.sink_parquet("20015.parquet")
) |
That's good news - unfortunately I also need some of the other functionality not yet implemented like |
Description
Using
int_range
orint_ranges
in streaming mode (LazyFrame.sink_*
) causes apolars.exceptions.InvalidOperationError
. That makes some operations (e.g. prefix explodes) impossible within streaming mode.Example:
The text was updated successfully, but these errors were encountered: