Is it possible to use Vegafusion outside jupyter #266

prenigma · 2023-03-20T12:14:30Z

prenigma
Mar 20, 2023

Hi Team,

first of all, thanks a lot for the great work you are putting into Vegafusion, i really like it!

As the source has been evolving a lot in the last couple of months, I am wondering whether it is feasible now to use Vegafusion and the brushing and selection capabilities of Vega outside jupyter, for example in a javascript web application where the Vegafusion client is communicating to Vegafusion server via grpc? Is vegafusion-embed the intended solution? Can you please share an example of the above?

Thanks a lot in advance,
Fouad

Answered by jonmmease

Mar 20, 2023

Hi @prenigma, thanks for the kind works!

The architecture of VegaFusion is certainly designed to support the scenario you're describing. But I'll be honest that this hasn't been proven in practice yet, and we don't have a solid example on this scenario at the moment. The closest example is in javascript/vegafusion-chart-editor.

This example includes a super-simple Vega editor that communicates with a VegaFusion server instance over gRPC-Web. And yes, vegafusion-embed is the intended client entry point for non-Jupyter usage. Unfortunately, I'm currently having issues getting around a CORS policy error when running the example (FWIW, I created this example more than a year ago on a linux de…

View full answer

jonmmease · 2023-03-20T13:10:19Z

jonmmease
Mar 20, 2023
Collaborator

Hi @prenigma, thanks for the kind works!

The architecture of VegaFusion is certainly designed to support the scenario you're describing. But I'll be honest that this hasn't been proven in practice yet, and we don't have a solid example on this scenario at the moment. The closest example is in javascript/vegafusion-chart-editor.

This example includes a super-simple Vega editor that communicates with a VegaFusion server instance over gRPC-Web. And yes, vegafusion-embed is the intended client entry point for non-Jupyter usage. Unfortunately, I'm currently having issues getting around a CORS policy error when running the example (FWIW, I created this example more than a year ago on a linux desktop, and I'm now hitting the CORS issue on a mac). Definitely give that a try and see if you run into the same error. Using gRPC-Web isn't a requirement for client-server communication, but (apart from this CORS error) it's the easiest way to have the client communicate directly with the VegaFusion server without any middleware.

The Jupyter Widget's usage of vegafusion-embed shows the alternative to using gRPC-Web.

https://github.com/hex-inc/vegafusion/blob/c172637a965ea13ef89035129b76bb28c8b2e847/python/vegafusion-jupyter/src/widget.ts#L126-L139

Here, the vegafusion-embed constructor is passed a callback function which accepts a protobuf message in binary form, and is responsible for routing those bytes to an instance of the vegafusion-runtime. In the Jupyter widget case, these messages are routed over Jupyter comms to the vegafusion-runtime instance running in the Python kernel (in the vegafusion-python-embed package). But these bytes could also be routed over websockets to some middleware application and then dispatched to a vegafusion-server instance over (non-web) gRPC.

If you have more thoughts on the architecture you'd like to end up with, I'd be happy to work with you on building an example that fits your use case.

0 replies

prenigma · 2023-03-20T14:39:27Z

prenigma
Mar 20, 2023
Author

Thanks a lot, Jon for the quick and detailed answer. I will try what you suggested and turn back to you as soon I try the proposed directions!

0 replies

jonmmease · 2023-03-24T00:38:15Z

jonmmease
Mar 24, 2023
Collaborator

Hi @prenigma, I think I got all of the issues worked out in the chart editor example over in #276. Let me know if you're interested in trying to get this working. The changes to vegafusion-server will be included in version 1.1.0 early next week.

0 replies

prenigma · 2023-03-24T12:51:09Z

prenigma
Mar 24, 2023
Author

Hi Jon @jonmmease, thanks a lot for the update! I am really interested in trying it out! Can you also share the steps you executed to replicate what you presented on the PR?

Thanks!

0 replies

jonmmease · 2023-03-24T13:17:53Z

jonmmease
Mar 24, 2023
Collaborator

Sure. At the moment you'll need to clone the VegaFusion repo and compile vegafusion-server and vegafusion-wasm from source (when 1.1.0 is released you'll be able to download VegaFusion server from the [GitHub Releases](https://github.com/hex-inc/vegafusion/releases, and vegafusion-wasm and vegafusion-embed will be published to npm) page.

There are development setup instructions in BUILD.md. I know there's a lot there, so let me know if you run into issues. Once you have the development environment here are the steps:

Compile vegafusion-wasm

In the vegafusion-wasm/ directory, run:

npm run build

Compile vegafusion-embed

In the javascript/vegafusion-embed directory, run:

npm install
npm run build

Compile and run vegafusion-server

In the vegafusion-server directory, run:

cargo run -- --port 50051 --web

Launch the chart editor

In the javascript/vegafusion-chart-editor directory, run:

npm install
npm run start

Open http://localhost:8081/

Once 1.1.0 is released, I think I'll move this chart editor example out into a separate repository and add instructions that only depend on the published versions of vegafusion-server, vegafusion-wasm, and vegafusion-embed. So feel free to wait until next week to try out that approach.

1 reply

prenigma Mar 25, 2023
Author

Thanks a lot Jon! it works like a charm!

jonmmease · 2023-03-25T01:04:02Z

jonmmease
Mar 25, 2023
Collaborator

I got 1.1.0 published this evening and moved the editor demo to https://github.com/hex-inc/vegafusion-demos/tree/main/apps/vegafusion-editor-grpc-web. See the README instructions there. With this approach you just download vegafusion-server 1.1.0 from GitHub releases, and launch the app with npm. No need to build any part of VegaFusion itself.

Let me know how it works out for you!

0 replies

prenigma · 2023-04-06T01:08:30Z

prenigma
Apr 6, 2023
Author

Hi Jon, thanks a lot for the work you put into the demo, it works very well! I spent the last couple of days understanding further your code. My goal is to create the following architecture demo:

The python server will allow loading data/table e.g., pandas dataframe, and register this dataframe in the vegafusion-server.

The JS client/Webapp, like the vegafusion-editor example, will render the chart for a specific dataframe registered in the vegafusion-server.

I have found under the vegafusion python module, in the method pre_transform_datasets for example a mention of inline datasets under the syntax 'vegafusion+dataset://{dataset_name}' or 'table://{dataset_name}'

I also found the following 2 methods set_connection and grpc_connect under the class runtime in the module vegafusion which are expected to connect to vegafusion-server and return the list of tables of a connection.

Can you please guide me further on how to build such an example architecture? I believe most of the functionalities/methods are available in your implementation, but i need your guidance on how to bring the different blocs together to make this architecture work. An example of how vegafusion python module, vegafusion-server and vegafusion-embed work together to render a chart for a specfic table ('vegafusion+dataset://{dataset_name}' or 'table://{dataset_name}') will be very helpful

Thanks a lot

0 replies

jonmmease · 2023-04-06T12:55:08Z

jonmmease
Apr 6, 2023
Collaborator

Hi @prenigma, this looks great. I'm happy to make another example, let me just clarify a few things.

vegafusion-server vs vegafusion in Python

If you're happy with the server portion of the app being written in Python, then it might be easiest to use VegaFusion embedded in Python (as the vegafusion + vegafusion-python-embed Python packages) instead of VegaFusion Server. I'm working toward giving the vegafusion Python package the ability to dispatch operations to either vegafusion-python-embed (Which has the VegaFusion runtime embedded in Python) or vegafusion-server over gRPC. Interacting with vegafusion-python-embed is more efficient as there's less serialization and no network communication. There are basically two use cases I have in mind for the VegaFusion server workflow:

VegaFusion server could be running on a different machine across the network. This machine could be more powerful than the machine Python is running on, and multiple Python processes could connect to it simultaneously. Or Python may be running on an architecture that vegafusion-python-embed isn't built for, so this would make it possible to still use the pure python vegafusion package and connect to VegaFusion Server running on a separate machine over the network.
If you're in a situation where multiple Python workers are serving the same app (e.g. using Gunicorn), then using VegaFusion Server avoids the need for every worker to load all of the data into memory.

Do either of those apply to your usecase?

Also to note, the set_connection mechanism that the new DuckDB connection uses is only available for the embedded configuration (not when using VegaFusion server).

Pre-transform vs Live connection workflows

VegaFusion has two primary workflows:

pre-transform workflow

The pre-transform workflow starts with a Vega spec, pre-evaluates all of the transforms and creates a new Vega spec that has the transformed data inlined. This is how the Mime Renderer works, and the advantage is that it doesn't require a live connection between the client and server, and it doesn't require the vegafusion-embed package on the client since the resulting Vega spec can be rendered using the regular vega-embed library.

The vf.runtime.pre_transform_spec function can accept inline Pandas/Arrow DataFrames that are referenced from the Vega spec using table://{name} URLs.

The downside of this approach is that, because there's no live connection between the client and server, any data that is filtered based on selection interactions must be shipped to the client in the transformed spec. So it's not a good choice for examples like Interactive Average, Crossfilter, or Cross Highlight.

Live connection workflow

This needs a better name, but this refers to the workflow that VegaFusion Widget and the gRPC-Web demo use. In this case the vegafusion-embed library is used in the client to render the chart and maintain a live connection between the client and the VegaFusion runtime (either running inside vegafusion-python-embed or in VegaFusion Server). This has the advantage that interactive filtering can happen on the server, so the full dataset isn't sent to the client in the examples noted above.

A downside of this approach is that there's not a good way to register inline datasets and use table:// references. We should make this possible in the future, but it's a bit more complicated compared to the pre-transform workflow because it's hard to know when the runtime can delete the dataset, because there no good way to know when a client is finished referencing it. The alternative (which VegaFusion widget currently uses) is to write datasets to feather/arrow files on disk and reference them from the Vega spec with URLs like "datasets/my_data.arrow".

For the live connection workflow, a choice needs to be made for how the messages are transported between the Python server and the client. gRPC-Web is the most integrated approach, but this can be customized. For example, VegaFusion Widget routes these messages over Jupyter Comms. If gRPC-Web isn't an option, then it would be also possible to route the messages to a rest endpoint by base64 encoding the protobuf messages to strings (at the loss of some efficiency of course).

Summary

As a summary, given a Python server, here are the choices to make:

VegaFusion Server vs VegaFusion embedded in Python
pre_transform vs live connection workflow
a. If a live connection, gRPC-Web or some other communication protocol

Let me know what you think!
-Jon

7 replies

prenigma Apr 10, 2023
Author

This is amazing Jon! Thanks a lot again!

jonmmease Sep 7, 2023
Collaborator

Hi @prenigma, hope all is well with you. This discussion crossed my mind today, and I was wondering whether you ended up trying to build anything on VegaFusion. No pressure, just interested to hear any feedback you'd care to share.

prenigma Sep 18, 2023
Author

Hi Dear Jon, My apologies that I missed your message! Indeed we worked on an internal POC, and it worked smoothly based on the example you shared using the vegafusionserver. I am planning in the next weeks to extend the POC by making it work for not only datasets in local storage but also pointing to a Database or an S3 bucket. I have seen that your code supports SQL but still not sure how. to make this work with/through the server

jonmmease Sep 19, 2023
Collaborator

Sounds good! The DataFusion library that VegaFusion uses by default has support for reading from S3, though I haven't experimented with that yet.

If you can share, what SQL database are you considering using? The SQL support isn't fully wired up through VegaFusion server yet. What's still required is to write a Rust connection to the database that inputs an SQL string and returns the result as Arrow RecordBatches.

prenigma Sep 26, 2023
Author

Yes Jon, if you can point me to where to start to experiment with reading from S3 that would be very helpful. Regarding SQL for the start Postgres or MySQL and later plan is Snowflake. Thanks a lot again for your continuous support!!!!

prenigma · 2023-11-09T09:56:31Z

prenigma
Nov 9, 2023
Author

Hi Jon, can you please support me by pointing me where to start to point the VegaFusion server to S3 instead of pointing to local files? Many thanks

7 replies

jonmmease Nov 12, 2023
Collaborator

Started working on this in #417. I'll publish an RC once I get the tests passing so that you can try it out.

prenigma Nov 15, 2023
Author

Thanks a lot Jon! I will wait until the RC is published!

jonmmease Nov 16, 2023
Collaborator

I ended up publishing this in version 1.4.4. To try it out, set the environment variables from https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html#method.from_env and provide a URL to VegaFusion of the form s3://bucket/path.arrow (or .json, .csv, .parquet).

This release also adds experimental parquet support, so you should be able to load parquet from s3 as well, in which case DataFusion does a bunch of smart stuff to minimize what gets downloaded.

Let me know how it goes!

prenigma Nov 20, 2023
Author

Thanks a lot dear Jon for the great work, I am testing it and I will update you very soon!

prenigma Nov 23, 2023
Author

Thanks a lot Jon, the S3 connection together with parquets works very good! I will update you as I execute further tests/scenarios. Thanks again for the amazing work!

jonmmease · 2024-10-17T12:49:42Z

jonmmease
Oct 17, 2024
Collaborator

Hi @prenigma, I wanted to check in to see if you're still making use of this workflow. I'm working on simplifying VegaFusion for version 2, and I'm debating whether to keep vegafusion-server and vegafusion-wasm around in their current form. But I'd welcome any feedback you have here.

0 replies

prenigma · 2024-11-04T10:25:18Z

prenigma
Nov 4, 2024
Author

Hi Jon, apologies missed your message! Yes, i am still using the workflow of having a vegafusion-server and vegafusion-wasm on the frontend. I am also thinking whether we could further scale the vegafusion-server by integrating with datafusion-ballista. I would be happy to further brainstorm with you. Thanks

1 reply

jonmmease Nov 10, 2024
Collaborator

Thanks for getting back to me. It's great to hear that this approach is working for you! This will continue to work fine in VegaFusion 2.0. Integrating VegaFusion with Ballista is a very interesting prospect, though I'll admit I haven't looked closely into how interfacing with Ballista works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use Vegafusion outside jupyter #266

{{title}}

Replies: 11 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to use Vegafusion outside jupyter #266

prenigma Mar 20, 2023

Replies: 11 comments · 16 replies

jonmmease Mar 20, 2023 Collaborator

prenigma Mar 20, 2023 Author

jonmmease Mar 24, 2023 Collaborator

prenigma Mar 24, 2023 Author

jonmmease Mar 24, 2023 Collaborator

Compile vegafusion-wasm

Compile vegafusion-embed

Compile and run vegafusion-server

Launch the chart editor

prenigma Mar 25, 2023 Author

jonmmease Mar 25, 2023 Collaborator

prenigma Apr 6, 2023 Author

jonmmease Apr 6, 2023 Collaborator

vegafusion-server vs vegafusion in Python

Pre-transform vs Live connection workflows

pre-transform workflow

Live connection workflow

Summary

prenigma Apr 10, 2023 Author

jonmmease Sep 7, 2023 Collaborator

prenigma Sep 18, 2023 Author

jonmmease Sep 19, 2023 Collaborator

prenigma Sep 26, 2023 Author

prenigma Nov 9, 2023 Author

jonmmease Nov 12, 2023 Collaborator

prenigma Nov 15, 2023 Author

jonmmease Nov 16, 2023 Collaborator

prenigma Nov 20, 2023 Author

prenigma Nov 23, 2023 Author

jonmmease Oct 17, 2024 Collaborator

prenigma Nov 4, 2024 Author

jonmmease Nov 10, 2024 Collaborator

prenigma
Mar 20, 2023

Replies: 11 comments 16 replies

jonmmease
Mar 20, 2023
Collaborator

prenigma
Mar 20, 2023
Author

jonmmease
Mar 24, 2023
Collaborator

prenigma
Mar 24, 2023
Author

jonmmease
Mar 24, 2023
Collaborator

prenigma Mar 25, 2023
Author

jonmmease
Mar 25, 2023
Collaborator

prenigma
Apr 6, 2023
Author

jonmmease
Apr 6, 2023
Collaborator

prenigma Apr 10, 2023
Author

jonmmease Sep 7, 2023
Collaborator

prenigma Sep 18, 2023
Author

jonmmease Sep 19, 2023
Collaborator

prenigma Sep 26, 2023
Author

prenigma
Nov 9, 2023
Author

jonmmease Nov 12, 2023
Collaborator

prenigma Nov 15, 2023
Author

jonmmease Nov 16, 2023
Collaborator

prenigma Nov 20, 2023
Author

prenigma Nov 23, 2023
Author

jonmmease
Oct 17, 2024
Collaborator

prenigma
Nov 4, 2024
Author

jonmmease Nov 10, 2024
Collaborator