-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support adjust=True
in ewm_mean_by
#21015
Comments
pandas uses Polars supports You can match pandas' output like this: In [30]: tbl.with_columns(x_ewm=pl.col('x').ewm_mean(half_life=1))
Out[30]:
shape: (2, 3)
┌─────┬────────────┬──────────┐
│ x ┆ date ┆ x_ewm │
│ --- ┆ --- ┆ --- │
│ i64 ┆ date ┆ f64 │
╞═════╪════════════╪══════════╡
│ 1 ┆ 2025-01-01 ┆ 1.0 │
│ 3 ┆ 2025-01-02 ┆ 2.333333 │
└─────┴────────────┴──────────┘
In [31]: tbl.with_columns(x_ewm=pl.col('x').ewm_mean(half_life=1, adjust=False))
Out[31]:
shape: (2, 3)
┌─────┬────────────┬───────┐
│ x ┆ date ┆ x_ewm │
│ --- ┆ --- ┆ --- │
│ i64 ┆ date ┆ f64 │
╞═════╪════════════╪═══════╡
│ 1 ┆ 2025-01-01 ┆ 1.0 │
│ 3 ┆ 2025-01-02 ┆ 2.0 │
└─────┴────────────┴───────┘ I think this issue may need repurposing as a request to have a properly-implemented |
adjus=True
in ewm_mean_by
Thank you! Until the feature gets implemented, do you currently happen to have any proposed hack for obtaining By the way, while I understand that we match the parallel to To drive the point home, in the example below (using the same table as the original post) the average comes out to be almost 1. Instead, it should be close to 2. tbl.with_columns(
x_ewm = pl.col('x').ewm_mean_by('date', half_life='1000d')
)
# shape: (2, 3)
# ┌─────┬────────────┬──────────┐
# │ x ┆ date ┆ x_ewm │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ date ┆ f64 │
# ╞═════╪════════════╪══════════╡
# │ 1 ┆ 2025-01-01 ┆ 1.0 │
# │ 3 ┆ 2025-01-02 ┆ 1.001386 │
# └─────┴────────────┴──────────┘ |
Do you have a formula reference for how |
There is a suggestion in pandas-dev/pandas#54328 (comment) but I'm not expert enough on this to whether it's a justified approach @alexander-beedie any chance I could get you or any quant you know to weigh on this please? |
adjus=True
in ewm_mean_by
adjust=True
in ewm_mean_by
Without looking through that entire thread, my proposal is this. If you'd like to use a non-recursive formula, just use the ones from the pandas documentation for adjust=True: If you instead prefer to do this recursively, I would use this formula: The results should match between these two approaches. The explanation is this:
|
Polars already uses that for It's for the time-based one that I'm asking what should be done. As highlighted in pandas-dev/pandas#54328, pandas' time-based The suggestion for time-based adjusted ewm in pandas-dev/pandas#54328 (comment) seems reasonable, I'd just appreciate it if an expert in the could confirm that |
Thank you. Looking at the proposed discrete formula of: Perhaps I'm being obtuse but I don't understand the rationale behind delta_t included in the multiplication, in both the numerator and the denominator. The 0.5^((t1-t2)/lambda) already takes care of the appropriate time decay (assuming here that lambda is supposed to be half_life). I can see a justification for the proposed formula if you assume that the the time series should be considered to "retain" their previous value for the duration of each interval. This seems kind of strange. A more direct and natural interpretation is that you just observe different values at discrete points in time and you decay each observation accordingly by how long ago it was made, when averaging. |
Checks
Reproducible example
Log output
Issue description
In the exponential moving average formula, you've forgotten to divide by the sum of weights. In the example above, this results in the average for 2025/1/2 being calculated as 2.0, which means it weighs the two observations equally, even though one is current and the other is as old as the halflife so it should get half the relative weight. The value should instead be 2.333. Note that pandas gets this right.
Expected behavior
The formula should be corrected to divide by sum(a_i) as the denominator.
Installed versions
The text was updated successfully, but these errors were encountered: