[ENH] proba regression: reduction to multiclass classification #378

fkiraly · 2024-06-07T20:13:40Z

From the discussion today, a short design for a reducer to multiclass classification mentioned in #7.

Parameters are:

an sklearn classifier clf capable of multiclass classification
a bins arg, default = 10. Possible values are int, or an ordered list of float.

The algortihm does as follows:

if bins is int, replaces this arg internally by that many bins, at the bins + 1 equally spaced quantiles of the empirical training distribution.
sorts the training labels into a multiclass label according to which bin it is in
in fit, fits clf to this binned training data
in predict_proba, uses clf.predict.proba to obtain class probabilities, and uses these together with the bins from bins to obtain a Histogram distribution

One could also think about another algorithm where the bins are cumulative, i.e., being contained in the bin defined by lowest point to i-th bin. This is also valid but one needs to be careful that the resulting cdf is monotonic. Could be a choice of strategy.

FYI @ShreeshaM07, @SaiRevanth25.

The text was updated successfully, but these errors were encountered:

ShreeshaM07 · 2024-06-09T09:44:03Z

Yes I think this would be a good thing to implement once #335 is complete and merged.

ShreeshaM07 · 2024-06-26T07:41:16Z

I will be making the PR for this today had some few doubts needing clarification

since bins is going to represent the number of classes wouldn't it make more sense to fetch it from the sklearn classifier using the classes_ attribute?
How do we take input of the other parameters to the different available classifiers in sklearn as they are going to be different for each do I take it as a kwargs argument from the user?

fkiraly · 2024-06-26T08:17:43Z

since bins is going to represent the number of classes wouldn't it make more sense to fetch it from the sklearn classifier using the classes_ attribute?

But that's available only once you've fitted it, which is later than construction. How would that work, logically?

How do we take input of the other parameters to the different available classifiers in sklearn as they are going to be different for each do I take it as a kwargs argument from the user?

No, you pass the entire classifier instance. As I'm saying above, parameters are clf - a classifier instance with its own parameters - and bins. I did not state expressly that clf is an instance, though that would follow the common pattern of composition in sklearn-like manner, you use instances, not the class, so parameters of the instance are passed along with it.

ShreeshaM07 · 2024-06-26T08:20:08Z

No, you pass the entire classifier instance. As I'm saying above, parameters are clf - a classifier instance with its own parameters - and bins.

Oh I thought I had to take input as strings like I did in case of statsmodels. If I take the input as a sklearn classifier instance then thats not an issue at all.

ShreeshaM07 · 2024-06-26T08:21:50Z

But that's available only once you've fitted it, which is later than construction. How would that work, logically?

Since we are constructing the Histogram distribution only when we call predict_proba that would mean it is already fitted. Is that not how we want it ?

fkiraly · 2024-06-26T10:36:54Z

Oh I thought I had to take input as strings like I did in case of statsmodels. If I take the input as a sklearn classifier instance then thats not an issue at all.

Yes, inputs being strings is "bad design" if a viable alternative is the composition/strategy patterns. Because with strings, you always have to add the encoding manually, whereas in composition you can pass any component that is API compliant.

fkiraly · 2024-06-26T10:37:34Z

Since we are constructing the Histogram distribution only when we call predict_proba that would mean it is already fitted. Is that not how we want it ?

I think you still need the exact bins because you need to pass them to bins of the histogram distribution - knowing their number is not enough.

fkiraly added good first issue Good for newcomers module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro feature request New feature or request labels Jun 7, 2024

Abhay-Lejith added this to 2024 May-Sep workstreams Jun 26, 2024

Abhay-Lejith moved this to In Progress in 2024 May-Sep workstreams Jun 26, 2024

fkiraly assigned ShreeshaM07 Jun 26, 2024

ShreeshaM07 mentioned this issue Jun 26, 2024

[ENH] Multiclass classification reduction using Histograms #410

Merged

5 tasks

fkiraly closed this as completed in d3edf9f Jul 9, 2024

fkiraly closed this as completed in #410 Jul 9, 2024

github-project-automation bot moved this from In Progress to Done in 2024 May-Sep workstreams Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] proba regression: reduction to multiclass classification #378

[ENH] proba regression: reduction to multiclass classification #378

fkiraly commented Jun 7, 2024

ShreeshaM07 commented Jun 9, 2024

ShreeshaM07 commented Jun 26, 2024

fkiraly commented Jun 26, 2024 •

edited

Loading

ShreeshaM07 commented Jun 26, 2024

ShreeshaM07 commented Jun 26, 2024

fkiraly commented Jun 26, 2024

fkiraly commented Jun 26, 2024

[ENH] proba regression: reduction to multiclass classification #378

[ENH] proba regression: reduction to multiclass classification #378

Comments

fkiraly commented Jun 7, 2024

ShreeshaM07 commented Jun 9, 2024

ShreeshaM07 commented Jun 26, 2024

fkiraly commented Jun 26, 2024 • edited Loading

ShreeshaM07 commented Jun 26, 2024

ShreeshaM07 commented Jun 26, 2024

fkiraly commented Jun 26, 2024

fkiraly commented Jun 26, 2024

fkiraly commented Jun 26, 2024 •

edited

Loading