-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop player projection system #2
Comments
After thinking about this some more -- the problem with hierarchical time-series modelling is that you need models for every time-series at the lowest level (i.e. one model per player). This is... not ideal. I could
One large model will present some challenges with train/test splitting. Contrast encoders like James-Stein use knowledge about the target to create ordering between the categorical levels (the players). Ideally, you would fit your encoder on some training set and then only transform your test set. For multiple time-series projection, this means we can't exclude players from the training set at all, only specific observations. We would have to implement cross-validation similar to this article by Hydnman; each fold in cross-validation would contain successively larger training sets (could use this splitter by Building several smaller models based on player clusters would solve this problem, since we wouldn't encode any player identifier. In this case, we can combine Hyndman's rolling window cross-validation with a "holdout" strategy, where each fold contains multiple iterations. I.e., if we have 5 players in the pool,
This iterative approach tests how well the model generalizes across players but also how well it forecasts based on how much data it gathers -- we can use |
More things to consider with this approach:
|
Looking at the documentation for target encoding and the James-Stein encoder -- it looks like they both assume the target variable has a normal distribution. It might be worth creating a transformer for beta target encoding (their implementation is here). NOTE: I should read up on beta target encoding from this paper. There are improvements to this procedure proposed here. UPDATE I've read up on the procedure. In the initial paper,
In the proposal, steps 1-3 are the same, and
In this scenario, our final prediction for the target will be the average of the prediction from the K submodels. |
Once I've established the SPA ratings per game, I'd like to build out a player projection system. At a base level, we have a hierarchical time-series forecasting problem. However, I think that using hierarchical clustering with the time series can help develop the groups within the player-level data (perhaps we can create groups according to position, play-style, etc.) before looking at using optimal reconciliation. Open questions:
The text was updated successfully, but these errors were encountered: