Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolating without avering first. #146

Open
epa095 opened this issue Feb 9, 2022 · 2 comments
Open

Interpolating without avering first. #146

epa095 opened this issue Feb 9, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@epa095
Copy link

epa095 commented Feb 9, 2022

This is exactly the same question as this stackoverflow question (named "Pandas timeseries resampling and interpolating together"), except I am wondering if it is solvable with tempo instead of pandas.

Here is the gist of it:
Data looks like this

    tstamp               val
0  2016-09-01 00:00:00  57
1  2016-09-01 00:01:00  57
2  2016-09-01 00:02:23  57
3  2016-09-01 00:03:04  57
4  2016-09-01 00:03:58  58
5  2016-09-01 00:05:00  60

We want to resample to every minute. Notice that 00:04:00 is missing, so interpolation is needed. BUT we want to use the fact that 2 seconds before (00:03:58) the value was 58, and 60 seconds later it is 60, so the value at 00:04:00 should be 58+((2/62)*2) = 58.064516.
So we do not want to first resample with e.g. mean into 1-min buckets and then interpolate between them, we instead want to find the "correct" value (by interpolating the values we have) at every minute point.

The pandas solution is relatively easy:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({"tstamp": [
    datetime(2016, 9, 1, 0, 0, 0),
    datetime(2016, 9, 1, 0, 1, 0),
    datetime(2016, 9, 1, 0, 2, 23),
    datetime(2016, 9, 1, 0, 3, 4),
    datetime(2016, 9, 1, 0, 3, 58),
    datetime(2016, 9, 1, 0, 5, 0)], 
    "val": [57, 57, 57, 57, 58, 60]})


d = df.set_index('tstamp')
t = d.index

r = pd.date_range(t.min(), t.max(), freq='T')

d = d.reindex(t.union(r)).interpolate('index').loc[r]

d:

                           val
2016-09-01 00:00:00  57.000000
2016-09-01 00:01:00  57.000000
2016-09-01 00:02:00  57.000000
2016-09-01 00:03:00  57.000000
2016-09-01 00:04:00  58.064516
2016-09-01 00:05:00  60.000000
@epa095
Copy link
Author

epa095 commented Feb 11, 2022

A common usecase for this is when you have sensors which only fires on some state change, e.g. a indicator if a valve is open or closed. So you have e.g.

    tstamp               val
0  2016-09-01 00:00:00  0
1  2016-09-01 00:01:02  1

Then it is important that at 2016-09-01 00:01:00 the correct value is 0 (the valve is closed).

This case could also be solved if the resample function had a method "last" which used the last value before the current bucket, but I still add it to this issue since I think that a solution to this issue will give that interpolation with ffill gives the right answer as well, and is maybe more generall.

@tnixon tnixon added the enhancement New feature or request label Apr 13, 2022
@TorSy
Copy link

TorSy commented Aug 25, 2022

Im looking at the same issue as described here.
I've solved this in pandas through re-indexing onto a time-index with both old time-stamps as well as new "regular" timestamps.
Then interpolate to fill the new regular time-stamps. Last, remove the original datapoints, leaving only the points at regular intervals.
I've not been able to reproduce this methodology in Tempo, due to tsdf.interpolate requiring a prior resampling, not being able to work on an generic index of NaN values.
I apprecioate the challanges in building a distributable framework. but I wonder how hard it would be to implement this?

python code for pandas solution:

def _create_new_index(df_in, freq="1min"):
    old_idx = df_in.index
    new_idx = pd.date_range(old_idx.min(), old_idx.max(), freq=freq)
    tot_idx = old_idx.union(new_idx).drop_duplicates()
    return new_idx, tot_idx

new_idx, tot_idx = _create_new_index(df)
df_resamp = df.reindex(tot_idx).interpolate('index').reindex(new_idx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants