-
Notifications
You must be signed in to change notification settings - Fork 0
BEP 3: Charts interface
BEP 3 | Issues and PRs management. |
---|---|
Authors | Bryan Van de Ven, Fabio Pliger and Damián Avila |
Status | WIP |
Discussion | https://github.com/bokeh/bokeh/issues/1373 |
Implementation | https://github.com/bokeh/bokeh/issues/1387 |
This is a discussion BEP to discuss and design the next iteration in the charts interface.
Here I am really only concerned with the data formats that each plot accepts. Am including "facet" (and "group" and "stack") options to show what should be possible at some point (but not necessarily immediately) because they can relate to the structure of the data.
generate a single histogram of a single series on one plot plot
data = # 1d iterable of scalars
Histogram(data)
generate multiple histograms of multiple series overlaid on one plot (a new color for each series):
data = # list of lists, dict/ordered dict of lists, data frame
Histogram(data)
generate multiple histograms of multiple series on multiple separate plots:
data = # list of lists, dict/ordered dict of lists, data frame
Histogram(data, facet=True)
NOTE: the "single series" is the special case. It should promote itself to the "multiple series" case automatically and transparently.
For time series it is important to distinguish the index. The "column names" of the "data frame" are the names of the series, and all the points from each series are plotted.
There are two possibilities: one common index shared by all the series, or each series with a distinct index of its own.
If all the series have the exact same "x" coordinates, here are some ideas:
generate a single time series of on one plot plot
index = # 1d iterable of any sort (of datetime values)
data = # 1d iterable of any sort
TimeSeries(index, data)
generate possibly multiple time series overlaid on one plot (a new color for each series):
index = # 1d iterable of datetime values
data = # list of lists, dict/ordered dict of lists
TimeSeries(index, data)
# OR
data = # dataframe
TimeSeries(data)
generate multiple time series on multiple separate plots:
index = # 1d iterable of datetime values
data = # list of lists, dict/ordered dict of lists
TimeSeries(index, data, facet=True)
# OR
data = # dataframe
TimeSeries(data, facet=True)
By definition this only has the "multiple series case"
generate multiple time series overlaid on one plot (a new color for each series):
# here, index and series are both 1d iterables of scalars
data = # 1d iterable of (index, series) pairs
TimeSeries(data)
# OR
TimeSeries(index0, series0, index1, series1, ...)
# OR
data = # 1d iterable of dataframe
TimeSeries(data)
# OR
Timeseries(df0, df1, df2, ...)
generate multiple time series on multiple separate plots:
# here, index and data are both 1d iterables of scalars
data = # 1d iterable of (index, series) pairs
TimeSeries(data, facet=True)
# OR
data = # 1d iterable of dataframes
TimeSeries(data, facet=True)
In the last case it's possible that there are fewer indices than series, because each data frame may have more than one non-index column. But conceptually this is no different that the general case.
Here there is no need to distinguish the index, what is always needed is pairs of x/y sequences that have the same length.
generate a single time series of on one plot plot
x = # 1d iterable of any sort
y = # 1d iterable of any sort
Scatter(x, y)
generate possibly multiple time series overlaid on one plot (a new color for each series):
# all vars = 1d iterable of scalars
Scatter((x0, y0), (x1, y1), (x2, y2))
# OR
data = # groupby of a data frame (a new color for each group)
Scatter(data, x="x_column_name", y="y_column_name")
NOTE: also accepts facet=True
to facet multiple scatters on different plots.
Box plots have a categorical X-axis with a box summary for the series associated with each category. In this case, the "column names" of the "data frame" are the categories, and the data in each column is reduced to a few statistical measures that define the summary box dimensions.
generate a box plot with summary boxes for each category:
data = # list of lists, dict/ordered dict of lists, data frame, or groupby
BoxPlot(data)
NOTE: might also accept an order
parameter to order the categorical axis
NOTE: can colormap by category
Violin
inputs identical to BoxPlot
(draws violin summaries instead)
Dot
inputs identical to BoxPlot
(draws a dot for each point in the series)
NOTE options for jittering along categorical dimension
Acceptable inputs basically identical to Scatter
except also has an additional version that auto-generates a range(N)
x-values:
generate a single line plot but implicitly use x=range(len(y))
y = # 1d iterable of scalars
Line(y)
generate a multiple line plots but implicitly use x=range(len(y))
for each y
series
yn = # 1d iterable of scalars
Line(y0, y1, y2)
NOTE: important generalization: each positional argument is for a single line in all of these cases
NOTE: also accepts facet=True
With regards to data inputs, I believe this is largely identical to Line
With regards to data inputs, I believe this is largely similar to TimeSeries
+ Line
NOTE offers the option to stack
. The hassle here will be computing the intermediate coordinates in the general case where there is more than one set of x
coordinates (i.e. where is more like Line
with multiple lines, instead of like TimeSeries
with a common index)
NOTE: we should deprecate the CategoricalHeatmap
option. There should just be one HeatMap
and it should "do the right thing" with either categorial or scalar ranges.
This one is a little different: the main input conceptually needs to be a dense 2D array of data, but there's a couple of more generalized cases we should accept.
For the dense 2d array of data case: x
and y
ranges can either be lists of categories, OR scalar (numerical) bounds.
data = # dense 2D array of data
x = # sequence of categories OR (start, end) range
y = # sequence of categories OR (start, end) range
Heatmap(data, x, y)
NOTE: data can either be scalar (numerical) data, or can be categorical data, examples:
(country * year * primary export [category])
(country * year * total rainfall [scalar])
This really will only affect the auto-choice of color mapper, from a data perspective they are conceptually identical.
[To be completed]
Because of all the stacking/grouping/faceting possibilities, and how these actually directly relate to the structure of the input data, Bar
is actually one of the most complicated cases.
One important distinction to maintain is that the x-axis is categorical by nature.
NOTE: includes "pie" chart as a special case
I believe this is largely similar to Bar
, but with some fewer options for grouping.
The basic and always available interface is the "pile of kwargs" interface:
Histogram(
df, width=400, height=400,
title="some cool stuff", facet="wrap",
server="localhost", name="mychart",
notebook=True, file="mychart.html"
)
However!
Epiphany: The current "method chaining" can be kept, and used perfectly in service of context managers:
with Histogram(df) as chart:
chart.width(400).height(400)
chart.title("some cool stuff").facet("wrap")
chart.server("localhost").name("mychart")
chart.notebook()
chart.file("mychart.html")
For static embedding cases, all charts will expose chart.plot
which is bokeh.objects.Plot
that can be passed to any of the static embedding functions:
components(chart.plot, INLINE)
file_html(chart.plot, CDN, "my plot")
autoload_static(chart.plot, CDN, "static/plots")
For server embedding, all charts will expose chart.script
which has an implementation along the lines of:
@property
def script(self):
if self.session is None:
return None
return autoload_server(self.plot, self.session)