-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Make dx
histogram behavior consistent with px
#1002
base: main
Are you sure you want to change the base?
Conversation
@@ -47,6 +47,27 @@ hist_3_bins = dx.histogram(setosa, x="SepalLength", nbins=3) | |||
hist_8_bins = dx.histogram(setosa, x="SepalLength", nbins=8) | |||
``` | |||
|
|||
### Bin and aggregate on different columns | |||
|
|||
If both `x` and `y` are specified, the histogram will be binned across one column and aggregated on the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention something like "If the plot orientation is vertical, the x
column will be binned and the y
column aggregated. The operations are flipped if the plot orientation is horizontal."
I don't know which orientation corresponds with which pairing
# only one should be set | ||
if hist_agg_label_h: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a bit clearer to make the args hist_agg_label
and hist_orientation
Column values must be numeric. If x is specified, | ||
the bars are drawn vertically by default. | ||
y: A column name or list of columns that contain y-axis values. | ||
Only one of x or y can be specified. If y is specified, the | ||
bars are drawn vertically. | ||
Column values must be numeric. If only y is specified, | ||
the bars are drawn horizontally by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure this has different behavior from bar in that the columns must be numeric. From the bar orientation docstring it looks like 1 value can be non-numeric
relabeled_agg_col = ( | ||
labels.get(self.agg_col, self.agg_col) if labels else self.agg_col | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ternary necessary here? Since there's a default empty dict for labels
, I think you can just get rid of the ternary since the else
should never hit? I guess unless a user specified labels=None
maybe?
elif self.histnorm == "probability": | ||
hist_agg_label = f"fraction of sum of {hist_agg_label}" | ||
elif self.histnorm == "percent": | ||
hist_agg_label = f"percent of sum of {hist_agg_label}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these X of sum of Y
? Looks like this is the case where histfunc
is not count or sum. So is there a histfunc
that should be required for these cases?
class UnivariateAwarePreprocessor: | ||
""" | ||
A preprocessor that stores useful args for plots where possibly one of x or y or both can be specified, | ||
which impacts the orientation of the plot in ways that affect the preprocessing. | ||
Should be inherited from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be abstract?
pivot_vars: The vars with new column names if a list was passed in | ||
list_var: The var that was passed in as a list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by what these do. Is pivot_vars
column renames? Is list_var
supposed to be something like x
or y
?
if self.args.get(self.agg_var): | ||
self.agg_col: str = ( | ||
pivot_vars["value"] | ||
if pivot_vars and list_var and list_var == self.agg_var | ||
else args[self.agg_var] | ||
) | ||
else: | ||
# if bar_var is not set, the value column is the same as the axis column | ||
# because both the axis bins and value are computed from the same inputs | ||
self.agg_col = self.bin_col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the comment in the else say agg_var
? The if only checks agg_var
, not bar_var
Fixes: #668
The following examples are now basically consistent.
During testing I discovered #1001 so the legend titles are not set in a couple of these, but that is beyond the scope of this ticket as that is a general plot by issue, not a histogram specific one.
Note a few intentional differences in the args. It's not expected that
dx
histograms andpx
histograms aggregate in the same exact way. Plotly infers the equivalent ofrange_bins
hence the extra argument. Additionally we havebarmode="group"
as our default whereas in Plotly it isbarmode="relative"
. I believe this was an intentional choice on our end because it's hard to compare the different traces withbarmode="relative"
although I don't have a record of it.