Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential to specify time series splitter #23

Closed
almostintuitive opened this issue May 24, 2023 · 7 comments
Closed

potential to specify time series splitter #23

almostintuitive opened this issue May 24, 2023 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@almostintuitive
Copy link

Hi! Thank you for the super useful library!
We'd love to use it on time series tasks, but for that, we'd need the internal splitter to respect the temporal dimension of the data.
We're happy to contribute a PR that makes this possible.
Is there anything specific regarding the public API that we should keep in mind of, in order for our PR to be accepted?

Thank you!

@ThomasBury
Copy link
Owner

ThomasBury commented May 25, 2023

Dear @almostintuitive, thank you for your kind words. I'm glad to hear that you find it valuable.

Regarding GrootCV, the intended usage of the cross-validation scheme is indeed for tabular data. While it is possible to add additional columns (lags) to the data, as you pointed out, the RepeatedKFold approach will not work in that case.

However, if you can devise a way to modularize the fold generator, specifically the code snippet you referred to here, the remaining code should remain unaffected.

In terms of coding standards, I strive to adhere to PEP and scikit-learn conventions, as well as leveraging existing generators and objects. Regarding the preprocessing, I have duplicated (and improved a bit) some transformers from Feature-Engine. The reason for this is that I needed sample weights. Although ARFS may not be on par with the same level of quality, I make efforts to align with those standards.

At the moment, I don't have comprehensive unit testing in place. However, I do provide NB (notebooks) for ad-hoc testing purposes, which serve as integration tests for the new feature(s) and I re-run them to check that they don't break existing ones and to update the documentation and tutorial at the same time.

If that sounds reasonable, please feel free to submit a PR (or several). It would be best if the PR is kept simple and focused, with smaller changes that are easier to track. This approach will be especially helpful since I mainly work on this package during nighttime/evening. Thank you for your understanding and for contributing!

@ThomasBury ThomasBury added the question Further information is requested label May 25, 2023
@ThomasBury ThomasBury self-assigned this May 25, 2023
@Jnorm911
Copy link

Bump

@CMobley7
Copy link

Optuna's implementation here, https://optuna.readthedocs.io/en/stable/reference/generated/optuna.integration.lightgbm.LightGBMTunerCV.html, allows you to send in your own folds; so,

tscv = TimeSeriesSplit(n_splits=cv if cv is not None else 5)
folds = list(tscv.split(X_train))

Though mlxtend's SFS, https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.feature_selection/#sequentialfeatureselector, allows you to use TimeSeriesSplit directly with the cv argument. Optuna's would probably be a quick add, while mlxtend's would likely take longer to add. Not sure though.

@notuntoward
Copy link

Tabular methods are very commonly used for time series forecasting -- boosted trees are quite often the best time series forecasting methods available. So it would be very useful if ARFS had a way to specify causal cross-validation.

As @cmobley mentioned, optuna allows causal cross validation, and so does the tuner I use with boosted trees, scikit-optimize. For example, skopt.BayesSearchCV accepts a standard sklearn TimeSeriesSplit object.

Would be great if ARFS function did too.

@ThomasBury
Copy link
Owner

Hello, I finally had time to add support for user-defined splitter 3771d49

@notuntoward
Copy link

Thanks, should we be seeing this change on the website docs yet, or in pip?

@ThomasBury
Copy link
Owner

Thanks, should we be seeing this change on the website docs yet, or in pip?

Hi @notuntoward , both:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants