-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for the new streaming engine #20947
Comments
Information on the new streaming engine: pola-rs#20947.
what's the difference between out-of-core group_by/equi-join and streaming nodes group_by/equi-join? |
@deanm0000 Out-of-core operators will automatically spill to disk when the data you're processing is larger than available memory. |
What happens to the other non-supported features like the broken categoricals? Is there any way to opt out of streaming as soon as they are encountered? |
@daviskirk Categoricals are currently the only thing that are simply broken on the new streaming engine (to my knowledge). Almost all unsupported things automatically fall back to the eager engine (just for the parts of the computation that aren't supported), some give a hard error. My plan is to fix categoricals on the new streaming engine soon. |
Note
TLDR
Starting 1.23.0, the old streaming engine will no longer be maintained and might start becoming less usable with time. People currently using it should pin their versions to 1.22.0 or before.
The old streaming engine is being deprecated, and the new streaming engine is coming to replace it soon™. The new streaming engine is far along, but does not yet have 100% feature parity with the old streaming engine. At the same time, the maintenance burden of maintaining both the old and new streaming engine is becoming too large. Therefore, we are deprecating the old streaming engine starting on Polars version 1.23.0 and telling people to pin their Polars versions to any version before 1.23.0.
This issue tracks the progress on the new streaming engine. All non-supported (unless mentioned otherwise) features fall back to the in-memory engine.
The streaming engine can be used already by using
.collect(new_streaming=True)
or by setting thePOLARS_FORCE_NEW_STREAMING=1
. The physical plan can be visualized as a dot graph by usingPOLARS_VISUALIZE_PHYSICAL_PLAN=filename.dot
.Sources
Sinks
Out-of-core
Streaming Nodes
head
tail
(feat: Add negative slice support to new-streaming engine #21001)Aggregates
Plan translation to streaming
.over()
to group-by + joinDatatypes
The text was updated successfully, but these errors were encountered: