-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TableReport outliers improvement? #1167
Comments
thanks @Vincent-Maladiere . I had considered turning off outlier detection when there are few unique values but actually having a wide range of values is still a problem if you have few. imagine you had values 1, 2, 3, and -1000 say to indicate some invalid value. if you don't cut the axis due to the -1000 1, 2, and 3 will all get squished together and you will lose the information. what we would like in this case is to realize that the actual values don't matter and treat the variable as categorical as you say. but I'm not sure that its' reasonable to assume that is the case whenever there are few unique values 🤔 |
Yes, that's tricky I agree. The best scenario would be for the user to notice that and convert to string or category dtypes. The main difference between my screenshot and yours is that on mine the outlier segment is bigger than the category displayed on the left, so the outliers are not really outliers, if that makes sense?
Let's wait a bit to get feedback and decide about this. |
Describe the bug
I don't think this is a bug, I'm just creating this issue so that we can observe a different behaviour I just saw.
Here is a series where the outlier detection is slightly off. This is an integer series comprised of {0, 80, 100}. Here, the values are actually categorical because they just distinguish states, and the choice of the state values 0 and 80 is arbitrary (it could have been 0, 1, 2, but for some reason it's 0, 80, 100).
Unless we tune the heuristic a little bit to avoid counting outliers when the cardinality is "very low", I don't see an easy improvement.
WDYT?
The column dataset:
state.csv
Steps/Code to Reproduce
Expected Results
No outliers
Actual Results
Some outliers
Versions
The text was updated successfully, but these errors were encountered: