use bfast for one dimentional time series data #42

TiaHao · 2022-02-03T10:30:26Z

Hi,

Thanks for the package, I am just wondering could I apply the algorithm on 1-d time series data? I checked the format of input data which should be (N,W,H) only for image data. Thank you!

mortvest · 2022-02-03T10:37:35Z

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

TiaHao · 2022-02-03T11:33:21Z

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

I just tried and it works! I would want to follow up on some other questions in case I understood wrongly:

The history period is for training the model, right? and the returned index is numbered for the monitoring period, which means the first date in the monitoring period is indexed as 0.
if the breaks returned only contain 1 non-negative value, does it means only one breakpoint is detected? how could I tune parameters to detect more breakpoints?
are the breakpoints detected for the trend component only or for both trend and seasonal components or for the original time series?

Thank you!

mortvest · 2022-02-03T11:42:20Z

Correct
The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory.
In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

TiaHao · 2022-02-03T14:54:33Z

Correct

The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory.

In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

Thank you! I found the branch and want to test on my data set, but I kept meeting this error:
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

my y is array([508104., 525865., 556608., ..., 538247., 534160., 553917.]),
x is 0 2017-01-01
1 2017-01-02
2 2017-01-03
3 2017-01-04
4 2017-01-05
...
1456 2020-12-27
1457 2020-12-28
1458 2020-12-29
1459 2020-12-30
1460 2020-12-31
Name: Date, Length: 1461, dtype: datetime64[ns]

Is the format of x not correct?

mortvest · 2022-02-03T15:12:28Z

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:

def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)

TiaHao · 2022-02-03T16:10:13Z

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:
def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)

Thank you so much, however this function can not generate 365 numbers of data points for daily data in a year since the maximum number np.linespace can generate between 2016 and 2017 is 100? would you have other suggestions to represent daily timestep? (would be 20170101,20170102...20201231 works?)

mortvest · 2022-02-03T16:18:18Z

I don't know if I understood your question completely, but this call:

arr = r_style_interval((2016, 1), (2016, 365), 365)

would return a vector of 365 values:

[2016. , 2016.00273973, ...., 2016.99726027]

TiaHao · 2022-02-03T16:38:40Z

I found the problem, I write np.set_printoptions(precision=2, linewidth=120) before, and did not restart the kernel, after restarting the kernel, it works exactly as it should be! Thank you so much!

Then I encounter this error again:
ValueError: All-NaN slice encountered

sorry to ask so many questions since I am quite fresh to python and bfast...

mortvest · 2022-02-04T15:30:21Z

Do you have NaNs in your dataset?
Can you include the whole error message and the code snippet that you have ran?

TiaHao · 2022-02-04T15:59:25Z

Hi,

I do not have NaNs in dataset.

I copy the code in bfast_1d_example.py in the bfast-experimental branch, and write the main as:
if name == "main":
plot("daily data", data, dates, 365, "harmonic")

data: an array contains daily data from 2017-2020, 1461 data points in total
dates = np.linspace(2017, 2021, num=365*3+366, endpoint=False)

It first runs and shows:
Plotting daily data
Writing to 1
Writing to 2
Writing to 3
Writing to 4
Writing to 5
Writing to 6
Writing to 7
Writing to 8
Writing to 9
Writing to 10
Writing to 11
Writing to 12
Writing to 13
...
many matrix and some contain lots of NaNs

Then error information:
ValueError Traceback (most recent call last)
/var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in
114
115 if name == "main":
--> 116 plot("daily data", data, dates, 365, "harmonic")

/var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in plot(name, y, x, f, season, level, h, max_iter, nan_clr)
37 y = y.reshape((y.shape[0], 1, 1))
38 vo = bfast.BFAST(frequency=f, season_type=season, level=level, h=h, max_iter=max_iter)
---> 39 vo.fit(y, x)
40 nans = np.isnan(y).any()
41

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/models.py in fit(self, Yt, ti)
413
414 # fit BFASTMonitor models
--> 415 self._model.fit(Yt=Yt,
416 ti=ti)
417 self._model_fitted = True

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit(self, Yt, ti)
119 pix_season_br,
120 pix_n_trend_br,
--> 121 pix_n_season_br = self.fit_single(y, ti, smod)
122
123 trend_global[:, i, j] = pix_trend

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit_single(self, Yt, ti, smod)
209 print("Breakpoints in trend detected")
210 print("Finding breakpoints in trend")
--> 211 bp_Vt = Breakpoints(sm.add_constant(ti1),
212 Vt1,
213 h=self.h,

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in init(self, X, y, h, max_breaks, use_mp, verbose)
63
64 ## breaks >= 2
---> 65 SSR_table = self.extend_SSR_table(SSR_table, self.max_breaks)
66
67 opt = self.extract_breaks(SSR_table, self.max_breaks).astype(int)

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in extend_SSR_table(self, SSR_table, breaks)
110 # map
111 break_SSR = np.vectorize(fun)(pot_index)
--> 112 opt = np.nanargmin(break_SSR)
113 my_SSR_table[i - h + 1, np.array((2, 3))] =
114 np.array((pot_index[opt], break_SSR[opt]))

<array_function internals> in nanargmin(*args, **kwargs)

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py in nanargmin(a, axis)
497 mask = np.all(mask, axis=axis)
498 if np.any(mask):
--> 499 raise ValueError("All-NaN slice encountered")
500 return res
501

ValueError: All-NaN slice encountered

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use bfast for one dimentional time series data #42

use bfast for one dimentional time series data #42

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 4, 2022

TiaHao commented Feb 4, 2022

use bfast for one dimentional time series data #42

use bfast for one dimentional time series data #42

Comments

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 3, 2022

TiaHao commented Feb 3, 2022

mortvest commented Feb 4, 2022

TiaHao commented Feb 4, 2022