Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use bfast for one dimentional time series data #42

Open
TiaHao opened this issue Feb 3, 2022 · 10 comments
Open

use bfast for one dimentional time series data #42

TiaHao opened this issue Feb 3, 2022 · 10 comments

Comments

@TiaHao
Copy link

TiaHao commented Feb 3, 2022

Hi,

Thanks for the package, I am just wondering could I apply the algorithm on 1-d time series data? I checked the format of input data which should be (N,W,H) only for image data. Thank you!

@mortvest
Copy link
Collaborator

mortvest commented Feb 3, 2022

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

@TiaHao
Copy link
Author

TiaHao commented Feb 3, 2022

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

I just tried and it works! I would want to follow up on some other questions in case I understood wrongly:

  1. The history period is for training the model, right? and the returned index is numbered for the monitoring period, which means the first date in the monitoring period is indexed as 0.
  2. if the breaks returned only contain 1 non-negative value, does it means only one breakpoint is detected? how could I tune parameters to detect more breakpoints?
  3. are the breakpoints detected for the trend component only or for both trend and seasonal components or for the original time series?

Thank you!

@mortvest
Copy link
Collaborator

mortvest commented Feb 3, 2022

  1. Correct
  2. The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory.
  3. In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

@TiaHao
Copy link
Author

TiaHao commented Feb 3, 2022

  1. Correct
  2. The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory.
  3. In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

Thank you! I found the branch and want to test on my data set, but I kept meeting this error:
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

my y is array([508104., 525865., 556608., ..., 538247., 534160., 553917.]),
x is 0 2017-01-01
1 2017-01-02
2 2017-01-03
3 2017-01-04
4 2017-01-05
...
1456 2020-12-27
1457 2020-12-28
1458 2020-12-29
1459 2020-12-30
1460 2020-12-31
Name: Date, Length: 1461, dtype: datetime64[ns]

Is the format of x not correct?

@mortvest
Copy link
Collaborator

mortvest commented Feb 3, 2022

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:

def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)

@TiaHao
Copy link
Author

TiaHao commented Feb 3, 2022

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:

def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)

Thank you so much, however this function can not generate 365 numbers of data points for daily data in a year since the maximum number np.linespace can generate between 2016 and 2017 is 100? would you have other suggestions to represent daily timestep? (would be 20170101,20170102...20201231 works?)

@mortvest
Copy link
Collaborator

mortvest commented Feb 3, 2022

I don't know if I understood your question completely, but this call:

arr = r_style_interval((2016, 1), (2016, 365), 365)

would return a vector of 365 values:

[2016. , 2016.00273973, ...., 2016.99726027]

@TiaHao
Copy link
Author

TiaHao commented Feb 3, 2022

I found the problem, I write np.set_printoptions(precision=2, linewidth=120) before, and did not restart the kernel, after restarting the kernel, it works exactly as it should be! Thank you so much!

Then I encounter this error again:
ValueError: All-NaN slice encountered

sorry to ask so many questions since I am quite fresh to python and bfast...

@mortvest
Copy link
Collaborator

mortvest commented Feb 4, 2022

Do you have NaNs in your dataset?
Can you include the whole error message and the code snippet that you have ran?

@TiaHao
Copy link
Author

TiaHao commented Feb 4, 2022

Hi,

I do not have NaNs in dataset.

I copy the code in bfast_1d_example.py in the bfast-experimental branch, and write the main as:
if name == "main":
plot("daily data", data, dates, 365, "harmonic")

data: an array contains daily data from 2017-2020, 1461 data points in total
dates = np.linspace(2017, 2021, num=365*3+366, endpoint=False)

It first runs and shows:
Plotting daily data
Writing to 1
Writing to 2
Writing to 3
Writing to 4
Writing to 5
Writing to 6
Writing to 7
Writing to 8
Writing to 9
Writing to 10
Writing to 11
Writing to 12
Writing to 13
...
many matrix and some contain lots of NaNs

Then error information:
ValueError Traceback (most recent call last)
/var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in
114
115 if name == "main":
--> 116 plot("daily data", data, dates, 365, "harmonic")

/var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in plot(name, y, x, f, season, level, h, max_iter, nan_clr)
37 y = y.reshape((y.shape[0], 1, 1))
38 vo = bfast.BFAST(frequency=f, season_type=season, level=level, h=h, max_iter=max_iter)
---> 39 vo.fit(y, x)
40 nans = np.isnan(y).any()
41

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/models.py in fit(self, Yt, ti)
413
414 # fit BFASTMonitor models
--> 415 self._model.fit(Yt=Yt,
416 ti=ti)
417 self._model_fitted = True

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit(self, Yt, ti)
119 pix_season_br,
120 pix_n_trend_br,
--> 121 pix_n_season_br = self.fit_single(y, ti, smod)
122
123 trend_global[:, i, j] = pix_trend

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit_single(self, Yt, ti, smod)
209 print("Breakpoints in trend detected")
210 print("Finding breakpoints in trend")
--> 211 bp_Vt = Breakpoints(sm.add_constant(ti1),
212 Vt1,
213 h=self.h,

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in init(self, X, y, h, max_breaks, use_mp, verbose)
63
64 ## breaks >= 2
---> 65 SSR_table = self.extend_SSR_table(SSR_table, self.max_breaks)
66
67 opt = self.extract_breaks(SSR_table, self.max_breaks).astype(int)

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in extend_SSR_table(self, SSR_table, breaks)
110 # map
111 break_SSR = np.vectorize(fun)(pot_index)
--> 112 opt = np.nanargmin(break_SSR)
113 my_SSR_table[i - h + 1, np.array((2, 3))] =
114 np.array((pot_index[opt], break_SSR[opt]))

<array_function internals> in nanargmin(*args, **kwargs)

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py in nanargmin(a, axis)
497 mask = np.all(mask, axis=axis)
498 if np.any(mask):
--> 499 raise ValueError("All-NaN slice encountered")
500 return res
501

ValueError: All-NaN slice encountered

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants