Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make dx histogram behavior consistent with px #1002

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jnumainville
Copy link
Collaborator

Fixes: #668

The following examples are now basically consistent.
During testing I discovered #1001 so the legend titles are not set in a couple of these, but that is beyond the scope of this ticket as that is a general plot by issue, not a histogram specific one.

Note a few intentional differences in the args. It's not expected that dx histograms and px histograms aggregate in the same exact way. Plotly infers the equivalent of range_bins hence the extra argument. Additionally we have barmode="group" as our default whereas in Plotly it is barmode="relative". I believe this was an intentional choice on our end because it's hard to compare the different traces with barmode="relative" although I don't have a record of it.

import plotly.express as px
import deephaven.plot.express as dx

df = px.data.tips()
fig = px.histogram(df, x="size", y="total_bill")
fig2 = px.histogram(df, x="size", y="total_bill", histfunc="count", nbins=6)
fig3 = px.histogram(df, x="size", y="total_bill", orientation="h", nbins=6)
fig4 = px.histogram(df, x="size", y="total_bill", orientation="h", histfunc="count", nbins=6)
fig5 = px.histogram(df, x=["tip", "total_bill"], y="size", nbins=6)
fig6 = px.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", nbins=6)
fig7 = px.histogram(df, x="size", y=["tip", "total_bill"], nbins=6)
fig8 = px.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", nbins=6)
fig9 = px.histogram(df, x=["tip", "total_bill"], y="size", histfunc="count", nbins=6)
fig10 = px.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", histfunc="count")
fig11 = px.histogram(df, x="size", y=["tip", "total_bill"], histfunc="count", nbins=6)
fig12 = px.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", histfunc="count", nbins=6)
fig13 = px.histogram(df, x="size", y=["tip", "total_bill"], histnorm="probability", barnorm="percent")
fig14 = px.histogram(df, x="size")
fig15 = px.histogram(df, y="size")

fig1d = dx.histogram(df, x="size", y="total_bill", nbins=6, range_bins=[0, 7])
fig2d = dx.histogram(df, x="size", y="total_bill", histfunc="count", nbins=6, range_bins=[0, 7])
fig3d = dx.histogram(df, x="size", y="total_bill", orientation="h", nbins=6, range_bins=[0, 60])
fig4d = dx.histogram(df, x="size", y="total_bill", orientation="h", histfunc="count", nbins=6, range_bins=[0, 60])
fig5d = dx.histogram(df, x=["tip", "total_bill"], y="size", nbins=6, range_bins=[0, 60], barmode="relative")
fig6d = dx.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", nbins=6, range_bins=[0, 7], barmode="relative")
fig7d = dx.histogram(df, x="size", y=["tip", "total_bill"], nbins=6, range_bins=[0, 7], barmode="relative")
fig8d = dx.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", nbins=6, range_bins=[0, 60], barmode="relative")
fig9d = dx.histogram(df, x=["tip", "total_bill"], y="size", histfunc="count", nbins=6, range_bins=[0, 60], barmode="relative")
fig10d = dx.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", histfunc="count", nbins=6, range_bins=[0, 7], barmode="relative")
fig11d = dx.histogram(df, x="size", y=["tip", "total_bill"], histfunc="count", nbins=6, range_bins=[0, 7], barmode="relative")
fig12d = dx.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", histfunc="count", nbins=6, range_bins=[0, 60], barmode="relative")
fig13d = dx.histogram(df, x="size", y=["tip", "total_bill"], histnorm="probability", barnorm="percent", range_bins=[0,7], nbins=6, barmode="relative")
fig14d = dx.histogram(df, x="size", nbins=6)
fig15d = dx.histogram(df, y="size", nbins=6)

@jnumainville jnumainville requested review from a team and mattrunyon and removed request for a team November 8, 2024 16:24
@@ -47,6 +47,27 @@ hist_3_bins = dx.histogram(setosa, x="SepalLength", nbins=3)
hist_8_bins = dx.histogram(setosa, x="SepalLength", nbins=8)
```

### Bin and aggregate on different columns

If both `x` and `y` are specified, the histogram will be binned across one column and aggregated on the other.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention something like "If the plot orientation is vertical, the x column will be binned and the y column aggregated. The operations are flipped if the plot orientation is horizontal."

I don't know which orientation corresponds with which pairing

Comment on lines +898 to +899
# only one should be set
if hist_agg_label_h:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a bit clearer to make the args hist_agg_label and hist_orientation

Comment on lines +346 to +350
Column values must be numeric. If x is specified,
the bars are drawn vertically by default.
y: A column name or list of columns that contain y-axis values.
Only one of x or y can be specified. If y is specified, the
bars are drawn vertically.
Column values must be numeric. If only y is specified,
the bars are drawn horizontally by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure this has different behavior from bar in that the columns must be numeric. From the bar orientation docstring it looks like 1 value can be non-numeric

Comment on lines +162 to +164
relabeled_agg_col = (
labels.get(self.agg_col, self.agg_col) if labels else self.agg_col
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ternary necessary here? Since there's a default empty dict for labels, I think you can just get rid of the ternary since the else should never hit? I guess unless a user specified labels=None maybe?

Comment on lines +185 to +188
elif self.histnorm == "probability":
hist_agg_label = f"fraction of sum of {hist_agg_label}"
elif self.histnorm == "percent":
hist_agg_label = f"percent of sum of {hist_agg_label}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these X of sum of Y? Looks like this is the case where histfunc is not count or sum. So is there a histfunc that should be required for these cases?

Comment on lines +6 to +10
class UnivariateAwarePreprocessor:
"""
A preprocessor that stores useful args for plots where possibly one of x or y or both can be specified,
which impacts the orientation of the plot in ways that affect the preprocessing.
Should be inherited from.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be abstract?

Comment on lines +14 to +15
pivot_vars: The vars with new column names if a list was passed in
list_var: The var that was passed in as a list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by what these do. Is pivot_vars column renames? Is list_var supposed to be something like x or y?

Comment on lines +49 to +58
if self.args.get(self.agg_var):
self.agg_col: str = (
pivot_vars["value"]
if pivot_vars and list_var and list_var == self.agg_var
else args[self.agg_var]
)
else:
# if bar_var is not set, the value column is the same as the axis column
# because both the axis bins and value are computed from the same inputs
self.agg_col = self.bin_col
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the comment in the else say agg_var? The if only checks agg_var, not bar_var

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

x and y behavior in dx.histogram is not fully consistent with px.histogram
3 participants