Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to get test set predictions from evaluate #837

Open
ericphanson opened this issue Sep 12, 2022 · 7 comments
Open

Provide a way to get test set predictions from evaluate #837

ericphanson opened this issue Sep 12, 2022 · 7 comments

Comments

@ericphanson
Copy link

Like https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html (pointed out by @josephsdavid!)

Currently, I am doing it manually, which works fine:

X = DataFrame(df.features)
y = df.label

stratified_cv = StratifiedCV(; nfolds=6,
                                 shuffle=true,
                                 rng=StableRNG(123))

tt_pairs = MLJBase.train_test_pairs(stratified_cv, 1:nrow(X), y)

cv = []
predictions = DataFrame()
for (train_indices, test_indices) in tt_pairs
        model = ...
        mach = machine(model, X[train_indices, :], y[train_indices])
        MLJ.fit!(mach)

        push!(cv, (; machine=mach, train_indices, test_indices))

        ŷ = MLJ.predict(mach, X[test_indices, :])

        append!(predictions, hcat(df[test_indices, :], DataFrame(:prediction => ŷ)))

end

It would be nice if evaluate could give the predictions as well, since it needs to generate them anyway.

@ablaom
Copy link
Member

ablaom commented Sep 12, 2022

Thanks @ericphanson for flagging this. There was a request for this a while ago by @CameronBieganek, but I can't find it just now.

Sometimes this might introduce scaling issues, for large datasets, in particular ones with multi-targets (think of time-series, for example), which becomes worse if we are doing nested resampling, as in evaluating a TunedModel. So probably including predictions in output to evaluate should be an option. Or, like sk-learn, we could have a separate function?

Another minor issue, is which "prediction" to return, or whether to return more than one kind. For a probabilistic predictor, some metrics will require predict_mode (or predict_mean/predict_median) and some just predict. Exposing the output of predict makes the most sense, but I think it's possible for the user to limit operations to, say, just predict_mode, so that predict is not actually ever called. Probably the simplest design is to force the predict call anyway (if our return-predictions option is on) and always return that?

The function where all this is happening, which will need to add the desired predictions to it's return value is here.

@ericphanson
Copy link
Author

I am not very familiar with the predict_* functions; is it ever more than just post-processing predict? Anyway, I do see operations is passed into evaluate! so maybe that can determine what kind of predictions you get back?

It sounds like the most straightforward approach is to add a return_predictions keyword arg that if true, we add an extra table w/ something like row index and prediction to the output object.

However that kind of design always feels like perhaps we aren't "inverting control to the caller" and that a more compositional flow might be better overall. E.g. I could imagine evaluate being implemented as the simple composition of training over folds, predicting over folds, and evaluating those w/ metrics, and exposing each layer with an API function.

@ablaom
Copy link
Member

ablaom commented Sep 14, 2022

However that kind of design always feels like perhaps we aren't "inverting control to the caller" and that a more compositional flow might be better overall. E.g. I could imagine evaluate being implemented as the simple composition of training over folds, predicting over folds, and evaluating those w/ metrics, and exposing each layer with an API function.

Yes, a compositional approach sounds better. I probably don't have the bandwidth for that kind of a refactor but if someone else was interested...

@ablaom
Copy link
Member

ablaom commented Sep 14, 2022

I'm curious, what is your use case for collecting the out-of-sample predictons? Are you doing some kind of model stacking perhaps? We have do have Stack for that.

@ericphanson
Copy link
Author

No, I just want to do my own evaluation on the predictions. In this case, I have multichannel data, and my model is trained to work on each channel independently. But in addition to the evaluation on that task, I want to also combine predictions over channels and then evaluate the aggregated results. I could probably do this by formulating a new composite model (I think?) but if I could just get the predictions directly, I can do whatever evaluation I want.

I have also come across this need other times, e.g. I want to plot prediction vs label for my whole dataset (can be important if you don't have a lot of data). CV lets you get useful predictions for all data points, even if there are really n_folds different models supplying them.

Another case can be if you want to evaluate on different stratifications of the data. E.g. what if I wanted to know how my performance varies by channel (on models trained on all channels- I don't want to move one channel all to the test set, e.g.). If I have all the predictions it's easy to do any kind of evaluation needed.

@BenjaminDoran
Copy link

Just wanted to add that I would also find it very helpful to be able to access the out-of-fold predictions from evaluate for the same reasons listed by Eric.

@ablaom
Copy link
Member

ablaom commented Jun 6, 2024

Just a note that this is more doable now that we have a separate PerformanceEvaluation and CompactPerformanceEvaluation types. Target predictions could be recorded in the first case but dropped in the second. A kwarg compact controls which is returned by evaluate!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants