Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README updates for Altair 5.3, add GOVERNANCE.md #480

Merged
merged 6 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions .github/workflows/semgrep.yml

This file was deleted.

1 change: 1 addition & 0 deletions GOVERNANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The VegaFusion project is governed by the documents that reside in the [Vega Organizational GitHub repository](https://github.com/vega/.github/).
65 changes: 31 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

---

VegaFusion provides serverside acceleration for the [Vega](https://vega.github.io/) visualization grammar. While not limited to Python, an initial application of VegaFusion is the acceleration of the [Altair](https://altair-viz.github.io/) Python interface to [Vega-Lite](https://vega.github.io/vega-lite/).
VegaFusion provides serverside acceleration for the [Vega](https://vega.github.io/) visualization grammar. While not limited to Python, an initial application of VegaFusion is the acceleration of the [Vega-Altair](https://altair-viz.github.io/) Python interface to [Vega-Lite](https://vega.github.io/vega-lite/).

The core VegaFusion algorithms are implemented in Rust. Python integration is provided using [PyO3](https://pyo3.rs/v0.15.1/) and JavaScript integration is provided using [wasm-bindgen](https://github.com/rustwasm/wasm-bindgen).

Expand All @@ -13,14 +13,16 @@ The core VegaFusion algorithms are implemented in Rust. Python integration is pr
## Documentation
See the documentation at https://vegafusion.io

## Project Status
VegaFusion is a young project, but it is already fairly well tested and used in production at Hex. The integration test suite includes image comparisons with over 600 specifications from the Vega, Vega-Lite, and Altair galleries.
## History
VegaFusion was developed by Jon Mease and acquired by [Hex Technologies](https://hex.tech/) in 2022. Hex donated VegaFusion to the Vega Project in 2024 and continues to support its development and maintenance.

VegaFusion's integration with Vega-Altair was initially developed outside of Altair in the `vegafusion` Python package. As of Vega-Altair version 5.3, all of these integrations have been incorporated into the upstream Vega-Altair package.

## Quickstart 1: Overcome `MaxRowsError` with VegaFusion
The VegaFusion mime renderer can be used to overcome the Altair [`MaxRowsError`](https://altair-viz.github.io/user_guide/faq.html#maxrowserror-how-can-i-plot-large-datasets) by performing data-intensive aggregations on the server and pruning unused columns from the source dataset. First install the `vegafusion` Python package with the `embed` extras enabled
The Vega-Altair [`"vegafusion"` data transformer](https://altair-viz.github.io/user_guide/large_datasets.html#vegafusion-data-transformer) can be used to overcome the Altair [`MaxRowsError`](https://altair-viz.github.io/user_guide/faq.html#maxrowserror-how-can-i-plot-large-datasets) by performing data-intensive aggregations on the server and pruning unused columns from the source dataset. First install the `altiar` Python package with the `all` extras enabled
jonmmease marked this conversation as resolved.
Show resolved Hide resolved

```bash
pip install "vegafusion[embed]"
pip install "altair[all]>=5.3"
```

Then open a Jupyter notebook (either the classic notebook or a notebook inside JupyterLab), and create an Altair histogram of a 1 million row flights dataset
Expand All @@ -41,18 +43,23 @@ delay_hist
```
```
---------------------------------------------------------------------------
MaxRowsError Traceback (most recent call last)
...
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000). For information on how to plot larger datasets in Altair, see the documentation
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
>> import altair as alt
>> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.
```

This results in an Altair `MaxRowsError`, as by default Altair is configured to allow no more than 5,000 rows of data to be sent to the browser. This is a safety measure to avoid crashing the user's browser. The VegaFusion mime renderer can be used to overcome this limitation by performing data intensive transforms (e.g. filtering, binning, aggregation, etc.) in the Python kernel before the resulting data is sent to the web browser.
This results in an Altair `MaxRowsError`, as by default Altair is configured to allow no more than 5,000 rows of data to be sent to the browser. This is a safety measure to avoid crashing the user's browser. The `"vegafusion"` data transformer can be used to overcome this limitation by performing data intensive transforms (e.g. filtering, binning, aggregation, etc.) in the Python kernel before the resulting data is sent to the web browser.

Run these two lines to import and enable the VegaFusion mime renderer
The `"vegafusion"` data transformer is enabled like this:

```python
import vegafusion as vf
vf.enable()
alt.data_transformers.enable("vegafusion")
```

Now the chart displays quickly without errors
Expand All @@ -62,14 +69,13 @@ delay_hist
![Flight Delay Histogram](https://user-images.githubusercontent.com/15064365/209973961-948b9d10-4202-4547-bbc8-d1981dcc8c4e.png)

## Quickstart 2: Extract transformed data
By default, data transforms in an Altair chart (e.g. filtering, binning, aggregation, etc.) are performed by the Vega JavaScript library running in the browser. This has the advantage of making the charts produced by Altair fully standalone, not requiring access to a running Python kernel to render properly. But it has the disadvantage of making it difficult to access the transformed data (e.g. the histogram bin edges and count values) from Python. Since VegaFusion evaluates these transforms in the Python kernel, it's possible to access then from Python using the `vegafusion.transformed_data()` function.
By default, data transforms in an Altair chart (e.g. filtering, binning, aggregation, etc.) are performed by the Vega JavaScript library running in the browser. This has the advantage of making the charts produced by Altair fully standalone, not requiring access to a running Python kernel to render properly. But it has the disadvantage of making it difficult to access the transformed data (e.g. the histogram bin edges and count values) from Python. Since VegaFusion evaluates these transforms in the Python kernel, it's possible to access them from Python using the `chart.transformed_data()` function.

For example, the following code demonstrates how to access the histogram bin edges and counts for the example above:

```python
import pandas as pd
import altair as alt
import vegafusion as vf

flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
Expand All @@ -79,7 +85,7 @@ delay_hist = alt.Chart(flights).mark_bar().encode(
alt.X("delay", bin=alt.Bin(maxbins=30)),
alt.Y("count()")
)
vf.transformed_data(delay_hist)
delay_hist.transformed_data()
```
| | bin_maxbins_30_delay | bin_maxbins_30_delay_end | __count |
|---:|-----------------------:|---------------------------:|----------:|
Expand All @@ -103,30 +109,23 @@ vf.transformed_data(delay_hist)
| 17 | 360 | 380 | 100 |

## Quickstart 3: Accelerate interactive charts
While the VegaFusion mime renderer works great for non-interactive Altair charts, it's not as well suited for [interactive](https://altair-viz.github.io/user_guide/interactions.html) charts visualizing large datasets. This is because the mime renderer does not maintain a live connection between the browser and the python kernel, so all the data that participates in an interaction must be sent to the browser.

To address this situation, VegaFusion provides a [Jupyter Widget](https://ipywidgets.readthedocs.io/en/stable/) based renderer that does maintain a live connection between the chart in the browser and the Python kernel. In this configuration, selection operations (e.g. filtering to the extents of a brush selection) can be evaluated interactively in the Python kernel, which eliminates the need to transfer the full dataset to the client in order to maintain interactivity.

The VegaFusion widget renderer is provided by the `vegafusion-jupyter` package.

```bash
pip install "vegafusion-jupyter[embed]"
```
As shown above, the `"vegafusion"` data transformer can be combined with Vega-Altair's standard [renderers](https://altair-viz.github.io/user_guide/display_frontends.html), and this configuration will work well to scale non-interactive charts. However, because the standard renderers do not support passing information from the browser back to the Python kernel, VegaFusion is unable to evaluate transforms that are referenced by selections. To support this use case, the `"vegafusion"` data transformer may be combined with Altair's [`JupyterChart`](https://altair-viz.github.io/user_guide/jupyter_chart.html). Because `JupyterChart` (when used in an environment that support Jupyter Widgets) provides a two-way connection between the browser and the Python kernel, selection operations (e.g. filtering to the extents of a brush selection) can be evaluated interactively in the Python kernel, which eliminates the need to transfer the full dataset to the browser in order to maintain interactivity.

Instead of enabling the mime render with `vf.enable()`, the widget renderer is enabled with `vf.enable_widget()`. Here is a full example that uses the widget renderer to display an interactive Altair chart that implements linked histogram brushing for a 1 million row flights dataset.
This mode is activate by enabling the `"vegafusion"` data transformer and either enabling the `"jupyter"` renderer, or using `JupyterChart` directly.

```python
import pandas as pd
import altair as alt
import vegafusion as vf

vf.enable_widget()
alt.data_transformers.enable("vegafusion")
alt.renderers.enable("jupyter")

flights = pd.read_parquet(
"https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)

brush = alt.selection(type='interval', encodings=['x'])
brush = alt.selection_interval(encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
Expand All @@ -141,7 +140,7 @@ base = alt.Chart().mark_bar().encode(
# gray background with selection
background = base.encode(
color=alt.value('#ddd')
).add_selection(brush)
).add_params(brush)

# blue highlights on the selected data
highlight = base.transform_filter(brush)
Expand All @@ -155,28 +154,26 @@ chart = alt.layer(
"time",
"hours(datum.date)"
).repeat(column=["distance", "delay", "time"])

chart
```

https://user-images.githubusercontent.com/15064365/209974420-480121b4-b206-4bb2-b473-0c663e38ea5e.mov


Histogram binning, aggregation, and selection filtering are now evaluated in the Python kernel process with efficient parallelization, and only the aggregated data (one row per histogram bar) is sent to the browser.

You can see that the VegaFusion widget renderer maintains a live connection to the Python kernel by noticing that the Python [kernel is running](https://experienceleague.adobe.com/docs/experience-platform/data-science-workspace/jupyterlab/overview.html?lang=en#kernel-sessions) as the selection region is created or moved. You can also notice the VegaFusion logo in the dropdown menu button.
You can see that the JupyterChart widget maintains a live connection to the Python kernel by noticing that the Python [kernel is running](https://experienceleague.adobe.com/docs/experience-platform/data-science-workspace/jupyterlab/overview.html?lang=en#kernel-sessions) as the selection region is created or moved.

## Motivation for VegaFusion
Vega makes it possible to create declarative JSON specifications for rich interactive visualizations that are fully self-contained. They can run entirely in a web browser without requiring access to an external database or a Python kernel.

For datasets of a few thousand rows or fewer, this architecture results in extremely smooth and responsive interactivity. However, this architecture does not scale very well to datasets of hundreds of thousands of rows or more. This is the problem that VegaFusion aims to solve.

## DataFusion integration
[Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow. VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language. In addition to being quite fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic. For example, VegaFusion defines many custom UDFs that are designed to implement the precise semantics of the Vega expression language and the Vega expression functions.
[Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow. VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language. In addition to being quite fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic. For example, VegaFusion defines many custom UDFs that are designed to implement the precise semantics of the Vega expression language and Vega expression functions.

# License
As of version 1.0, VegaFusion is licensed under the [BSD-3](https://opensource.org/licenses/BSD-3-Clause) license. This is the same license used by Vega, Vega-Lite, and Altair.

Prior versions were released under the [AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.en.html).
VegaFusion is licensed under the [BSD-3](https://opensource.org/licenses/BSD-3-Clause) license. This is the same license used by Vega, Vega-Lite, and Vega-Altair.

# About the Name
There are two meanings behind the name "VegaFusion"
Expand Down
Loading
Loading