-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential to specify time series splitter #23
Comments
Dear @almostintuitive, thank you for your kind words. I'm glad to hear that you find it valuable. Regarding GrootCV, the intended usage of the cross-validation scheme is indeed for tabular data. While it is possible to add additional columns (lags) to the data, as you pointed out, the RepeatedKFold approach will not work in that case. However, if you can devise a way to modularize the fold generator, specifically the code snippet you referred to here, the remaining code should remain unaffected. In terms of coding standards, I strive to adhere to PEP and scikit-learn conventions, as well as leveraging existing generators and objects. Regarding the preprocessing, I have duplicated (and improved a bit) some transformers from Feature-Engine. The reason for this is that I needed sample weights. Although ARFS may not be on par with the same level of quality, I make efforts to align with those standards. At the moment, I don't have comprehensive unit testing in place. However, I do provide NB (notebooks) for ad-hoc testing purposes, which serve as integration tests for the new feature(s) and I re-run them to check that they don't break existing ones and to update the documentation and tutorial at the same time. If that sounds reasonable, please feel free to submit a PR (or several). It would be best if the PR is kept simple and focused, with smaller changes that are easier to track. This approach will be especially helpful since I mainly work on this package during nighttime/evening. Thank you for your understanding and for contributing! |
Bump |
Optuna's implementation here, https://optuna.readthedocs.io/en/stable/reference/generated/optuna.integration.lightgbm.LightGBMTunerCV.html, allows you to send in your own folds; so,
Though mlxtend's SFS, https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.feature_selection/#sequentialfeatureselector, allows you to use TimeSeriesSplit directly with the cv argument. Optuna's would probably be a quick add, while mlxtend's would likely take longer to add. Not sure though. |
Tabular methods are very commonly used for time series forecasting -- boosted trees are quite often the best time series forecasting methods available. So it would be very useful if ARFS had a way to specify causal cross-validation. As @cmobley mentioned, optuna allows causal cross validation, and so does the tuner I use with boosted trees, scikit-optimize. For example, skopt.BayesSearchCV accepts a standard sklearn TimeSeriesSplit object. Would be great if ARFS function did too. |
Hello, I finally had time to add support for user-defined splitter 3771d49 |
Thanks, should we be seeing this change on the website docs yet, or in pip? |
Hi @notuntoward , both:
|
Hi! Thank you for the super useful library!
We'd love to use it on time series tasks, but for that, we'd need the internal splitter to respect the temporal dimension of the data.
We're happy to contribute a PR that makes this possible.
Is there anything specific regarding the public API that we should keep in mind of, in order for our PR to be accepted?
Thank you!
The text was updated successfully, but these errors were encountered: