Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/polars with columns , async with_columns pandas #1234

Merged
merged 6 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions docs/reference/decorators/with_columns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,35 @@
with_columns
=======================

Pandas
--------------
We support the `with_columns` operation that appends the results as new columns to the original dataframe for several libraries:

We have a ``with_columns`` option to run operations on columns of a Pandas dataframe and append the results as new columns.
Pandas
-----------------------

**Reference Documentation**

.. autoclass:: hamilton.plugins.h_pandas.with_columns
:special-members: __init__


Polar (Eager)
-----------------------

**Reference Documentation**

.. autoclass:: hamilton.plugins.h_polars.with_columns
:special-members: __init__


Polars (Lazy)
-----------------------

**Reference Documentation**

.. autoclass:: hamilton.plugins.h_polars_lazyframe.with_columns
:special-members: __init__


PySpark
--------------

Expand Down
2 changes: 1 addition & 1 deletion examples/pandas/with_columns/README
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

We show the ability to use the familiar `with_columns` from either `pyspark` or `polars` on a Pandas dataframe.

To see the example look at the notebook.
To see the example look at the [notebook](notebook.ipynb).

![image info](./dag.png)
866 changes: 419 additions & 447 deletions examples/pandas/with_columns/notebook.ipynb

Large diffs are not rendered by default.

240 changes: 121 additions & 119 deletions examples/polars/notebook.ipynb

Large diffs are not rendered by default.

Binary file added examples/polars/with_columns/DAG_DataFrame.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/polars/with_columns/DAG_lazy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions examples/polars/with_columns/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Using with_columns with Polars

We show the ability to use the familiar `with_columns` from `polars`. Supported for both: `pl.DataFrame` and `pl.LazyFrame`.

To see the example look at the [notebook](notebook.ipynb).

![image info](./DAG_DataFrame.png)
![image info](./DAG_lazy.png)
51 changes: 51 additions & 0 deletions examples/polars/with_columns/my_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import polars as pl

from hamilton.function_modifiers import config

"""
Notes:
1. This file is used for all the [ray|dask|spark]/hello_world examples.
2. It therefore show cases how you can write something once and not only scale it, but port it
to different frameworks with ease!
"""


@config.when(case="millions")
def avg_3wk_spend__millions(spend: pl.Series) -> pl.Series:
"""Rolling 3 week average spend."""
return (
spend.to_frame("spend").select(pl.col("spend").rolling_mean(window_size=3) / 1e6)
).to_series(0)


@config.when(case="thousands")
def avg_3wk_spend__thousands(spend: pl.Series) -> pl.Series:
"""Rolling 3 week average spend."""
return (
spend.to_frame("spend").select(pl.col("spend").rolling_mean(window_size=3) / 1e3)
).to_series(0)


def spend_per_signup(spend: pl.Series, signups: pl.Series) -> pl.Series:
"""The cost per signup in relation to spend."""
return spend / signups


def spend_mean(spend: pl.Series) -> float:
"""Shows function creating a scalar. In this case it computes the mean of the entire column."""
return spend.mean()


def spend_zero_mean(spend: pl.Series, spend_mean: float) -> pl.Series:
"""Shows function that takes a scalar. In this case to zero mean spend."""
return spend - spend_mean


def spend_std_dev(spend: pl.Series) -> float:
"""Function that computes the standard deviation of the spend column."""
return spend.std()


def spend_zero_mean_unit_variance(spend_zero_mean: pl.Series, spend_std_dev: float) -> pl.Series:
"""Function showing one way to make spend have zero mean and unit variance."""
return spend_zero_mean / spend_std_dev
47 changes: 47 additions & 0 deletions examples/polars/with_columns/my_functions_lazy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import polars as pl

from hamilton.function_modifiers import config

"""
Notes:
1. This file is used for all the [ray|dask|spark]/hello_world examples.
2. It therefore show cases how you can write something once and not only scale it, but port it
to different frameworks with ease!
"""


@config.when(case="millions")
def avg_3wk_spend__millions(spend: pl.Expr) -> pl.Expr:
"""Rolling 3 week average spend."""
return spend.rolling_mean(window_size=3) / 1e6


@config.when(case="thousands")
def avg_3wk_spend__thousands(spend: pl.Expr) -> pl.Expr:
"""Rolling 3 week average spend."""
return spend.rolling_mean(window_size=3) / 1e3


def spend_per_signup(spend: pl.Expr, signups: pl.Expr) -> pl.Expr:
"""The cost per signup in relation to spend."""
return spend / signups


def spend_mean(spend: pl.Expr) -> float:
"""Shows function creating a scalar. In this case it computes the mean of the entire column."""
return spend.mean()


def spend_zero_mean(spend: pl.Expr, spend_mean: float) -> pl.Expr:
"""Shows function that takes a scalar. In this case to zero mean spend."""
return spend - spend_mean


def spend_std_dev(spend: pl.Expr) -> float:
"""Function that computes the standard deviation of the spend column."""
return spend.std()


def spend_zero_mean_unit_variance(spend_zero_mean: pl.Expr, spend_std_dev: float) -> pl.Expr:
"""Function showing one way to make spend have zero mean and unit variance."""
return spend_zero_mean / spend_std_dev
Loading