You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Uniform Count Discretization requires breaking a set of values into $k$ bins of a roughly equal number of entries. This works great for most continuous data, but has some corner cases if you have a lot of repeated values.
I have a problem with "a roughly equal number of entries" and would like to more rigorously define an optimal discretization scheme.
We ideally want M/k entries per bin, where M is the number of data points and k is the number of bins.
If we use an L2 loss, the score of a particular discretization is merely sum (b - M/k)^2, where b is the size of each bin.
This results in a dynamic programming problem.
The text was updated successfully, but these errors were encountered:
Uniform Count Discretization requires breaking a set of values into$k$ bins of a roughly equal number of entries. This works great for most continuous data, but has some corner cases if you have a lot of repeated values.
I have a problem with "a roughly equal number of entries" and would like to more rigorously define an optimal discretization scheme.
We ideally want M/k entries per bin, where M is the number of data points and k is the number of bins.
If we use an L2 loss, the score of a particular discretization is merely sum (b - M/k)^2, where b is the size of each bin.
This results in a dynamic programming problem.
The text was updated successfully, but these errors were encountered: