Develop player projection system #2

ak-gupta · 2021-02-21T16:41:39Z

Once I've established the SPA ratings per game, I'd like to build out a player projection system. At a base level, we have a hierarchical time-series forecasting problem. However, I think that using hierarchical clustering with the time series can help develop the groups within the player-level data (perhaps we can create groups according to position, play-style, etc.) before looking at using optimal reconciliation. Open questions:

Unified time labels: we'll have players with different ages/experience. How should we align them?
SPA transformations: should we transform the raw SPA ratings?

ak-gupta · 2021-02-21T19:24:33Z

Potentially useful links:

ak-gupta · 2021-11-26T16:01:24Z

After thinking about this some more -- the problem with hierarchical time-series modelling is that you need models for every time-series at the lowest level (i.e. one model per player). This is... not ideal. I could

create a larger model and use the James-Stein encoder to treat player identifier. This way, we have one large projection model. I'd investigate
- Simple regression with age and player ID predicting impact (xgboost as well as elastic net and spline models), and
- Vector auto-regression models (VAR) with age and player ID predicting impact.
use unsupervised clustering with the dynamic time-warping and barycenter averaging to create clusters of players based on their current career arc. Then, I can investigate cluster-specific projection models with no contrast encoding.

One large model will present some challenges with train/test splitting. Contrast encoders like James-Stein use knowledge about the target to create ordering between the categorical levels (the players). Ideally, you would fit your encoder on some training set and then only transform your test set. For multiple time-series projection, this means we can't exclude players from the training set at all, only specific observations. We would have to implement cross-validation similar to this article by Hydnman; each fold in cross-validation would contain successively larger training sets (could use this splitter by scikit-learn).

Building several smaller models based on player clusters would solve this problem, since we wouldn't encode any player identifier. In this case, we can combine Hyndman's rolling window cross-validation with a "holdout" strategy, where each fold contains multiple iterations. I.e., if we have 5 players in the pool,

Fold 1 trains on players 1-4; tests on player 5
- Iteration 1 trains using 3 seasons of data, projects to season 6
- Iteration 2 trains using 4 seasons of data, projects to season 7
- Iteration 3 trains using 5 seasons of data, projects to season 8
- Iteration 4 trains using 6 seasons of data, projects to season 9
- ...
Fold 2 trains on players 1-3, 5; tests on player 4
Fold 3 trains on players 1, 2, 4, 5; tests on player 3
...

This iterative approach tests how well the model generalizes across players but also how well it forecasts based on how much data it gathers -- we can use scikit-learn's RFE model selection methodology for inspiration on how we can handle the iterative nature.

ak-gupta · 2021-11-27T03:54:46Z

* create a larger model and use the [James-Stein encoder](https://kiwidamien.github.io/james-stein-encoder.html) to treat player identifier.
...
One large model will present some challenges with train/test splitting. Contrast encoders like James-Stein use knowledge about the target to create ordering between the categorical levels (the players). Ideally, you would fit your encoder on some training set and then only transform your test set. For multiple time-series projection, this means we can't exclude players from the training set at all, only specific observations.

More things to consider with this approach:

We'd have to create our own "rolling" James-Stein contrast encoder since we're analyzing the same players over time. For example, the James-Stein encoding for a given player at age 30 should only use an encoding value based on previous performance.
Refitting the encoder -- we would need to refit the encoder when a new player enters the pool. This might not be an issue (maybe we refit the entire model every time we generate projections so inaccuracies from the previous round of projections aren't carried into the next set).

ak-gupta · 2021-11-27T16:12:32Z

Looking at the documentation for target encoding and the James-Stein encoder -- it looks like they both assume the target variable has a normal distribution. It might be worth creating a transformer for beta target encoding (their implementation is here).

NOTE: I should read up on beta target encoding from this paper. There are improvements to this procedure proposed here.

UPDATE

I've read up on the procedure. In the initial paper,

Empirically evaluate the posterior distribution for the target variable,
Choose the conjugate prior distribution based on the observed posterior,
Use the formulation for how to parameterize the posterior distribution based on the conjugate prior
For each categorical variable,
- For each level in the variable
  - Calculate the posterior distribution.
  - Represent the categorical variable by Q moments from the posterior distribution for the level.

In the proposal, steps 1-3 are the same, and

For each categorical variable,
- For each level in the categorical variable
  - Calculate the posterior distribution
Generate K copies of the training set, where the each categorical variable is encoded by sampling from the posterior distribution described by the level.

In this scenario, our final prediction for the target will be the average of the prediction from the K submodels.

ak-gupta added this to the v0.2.0 milestone Mar 6, 2021

ak-gupta mentioned this issue Mar 6, 2021

Build a "team builder" projection system #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop player projection system #2

Develop player projection system #2

ak-gupta commented Feb 21, 2021

ak-gupta commented Feb 21, 2021 •

edited

Loading

ak-gupta commented Nov 26, 2021 •

edited

Loading

ak-gupta commented Nov 27, 2021

ak-gupta commented Nov 27, 2021 •

edited

Loading

Develop player projection system #2

Develop player projection system #2

Comments

ak-gupta commented Feb 21, 2021

ak-gupta commented Feb 21, 2021 • edited Loading

ak-gupta commented Nov 26, 2021 • edited Loading

ak-gupta commented Nov 27, 2021

ak-gupta commented Nov 27, 2021 • edited Loading

ak-gupta commented Feb 21, 2021 •

edited

Loading

ak-gupta commented Nov 26, 2021 •

edited

Loading

ak-gupta commented Nov 27, 2021 •

edited

Loading