The goal for this section is for you to be able to create the same chart from a local data source (csv file on your computer) and an external data source (csv file on the web).
To get started, you’ll be using the mpg
dataset which can be found here. If you are unfamiliar with this dataset, you can read the documentation which describes the MPG dataset and gives greater detail as to what each variable is.
Go ahead and download the MPG csv to a folder on your hard drive. It doesn't matter where you put it, just make sure you can get to it. You can then create a variable for the file location. For example:
mpg_file = "~Documents/data/mpg.csv"
Before you can work with your CSV file you will need to know what a DataFrame is. According to Databricks, DataFrames are “a data structure that organizes data into a 2-dimensional table of rows and columns.”. You can consider DataFrames to be a table in a spreadsheet without the GUI. Each column is a variable (or field) and each row is a record (or observation). DataFrames are the fundamental structure that Data Scientists use to work with, transform, and even model data.
To work with DataFrames in Elixir, you will need to use the Explorer package. Explorer allows you to read data from different formats (e.g., CSV, Text, Parquet, etc…) into a DataFrame for further work. Explorer is built upon the Rust package Polars, which is one of the fastest DataFrame libraries around. Here’s a quick table of stats to show you how Polars compares to many other DataFrame libraries.

Note: You can find more Polars benchmarks here.
The point is, Explorer will give you plenty of horsepower to manipulate, change and edit your data however you see fit by building upon one of the most powerful DataFrame libraries around.
Like with most libraries in Elixir, you will probably want to alias
in your package, so you don’t have to write Explorer.DataFrame
or Explorer.Series
every time. This can be accomplished with the following code:
alias Explorer.{DataFrame, Series}
Explorer makes creating DataFrames simple and straightforward. To create your MPG DataFrame, use the from_csv!()
function and give it the mpg_file
variable you created earlier.
mpg = DataFrame.from_csv!(mpg_file)
You should see a Livebook output something like this:

If your first column name happens to be blank, it will look like what is highlighted in the red box above. This no-named column is supposed to be the row ID. This shows up blank because Polars (and Explorer) is a column-oriented DataFrame rather than a row-oriented DataFrame where row IDs matter. You can read more about how Polars differs from the “typical” DataFrame libraries like Pandas here.
Since it is unnecessary going forward, you can add the following code to remove the column.
mpg =
DataFrame.from_csv!(mpg_file)
# Remove old row id column
|> DataFrame.select([""], :drop)
Livebook should now output this:

Note: Explorer automatically shows you the number of rows and columns in the top-left corner. For example, the MPG DataFrame above has 234 rows and 11 columns (234 × 11).
Smart Cells are an option for you to create some predefined functionality in Livebook with no code. Where the Code
button lets you write code and the Block
button lets us write Markdown, the Smart
button provides a GUI to create a Smart Cell.

Smart Cells can do multiple things like create charts, connect to databases, create maps and generate SQL queries.

For this use case, you’ll be using the Chart
option.
As you begin to dig into the mpg
DataFrame, say you want to understand the relationship between engine displacement and highway miles per gallon. To accomplish that, you’re going to want to create a Scatterplot that shows displ
on the x-axis and hwy
on the y-axis.
To get started, simply click on the Chart
option is shown below.

You should be presented with a form that looks like this:

To create the chart, simply change the x and y fields to displ
and hwy
as well as the types to quantitative
, and click Evaluate
in the top-left corner to generate the chart.

You should now see a chart below your Smart Cell like this:

The beauty of Smart Cells is you don’t need to know anything about creating charts (or even writing Elixir) to start generating great-looking charts. In addition, Smart Cells have these cool buttons, on the top right, that, when clicked, will let you see the underlying code that was generated for you.

Playing around and seeing what code is generated is a great way to start learning how to use the VegaLite library.
Note: As of the time of this writing, Smart Cells require that your data must be saved on your computer. There is an issue on the kino_vega_lite repository to add external data sources.
External data sources are those that are not located directly on your computer. For instance, the R community has an excellent collection of datasets here. Each dataset has a link to the CSV file and documentation that describes the data as well as defines what each variable is. For your next step, remake the same chart using the same dataset that is now located externally.
You already brought VegaLite into your environment in the last post when you installed it. To save yourself from typing VegaLite
every time you want to use a function you can alias
the package as Vl
. Your code will look like this:
alias VegaLite, as: Vl
You’re going to create a variable, mpg_url
, with the url for the MPG dataset from the collection R datasets mentioned above.
mpg_url = "https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/mpg.csv"
When using VegaLite, the first step is always to define a new chart with the Vl.new()
function.
Note: This is also where you can pass the width and height parameters to define the
*width*
and*height*
of the chart.
For this chart, your code should look like this:
VL.new()
Instead of downloading, you can let the VegaLite package handle that for you with the Vl.data_from_url()
function. All you need to pass the mpg_url
variable to where the data is hosted. When completed your code for this step should look like this:
|> Vl.data_from_url(mpg_url)
No chart can be created without specifying the mark type. Marks are used for specifying the shapes to create any chart and can be created with the Vl.mark()
function. There are a multitude of marks you can use, but today the :point
type is what we need. When completed, the code should look like this:
|> Vl.mark(:point)
For any encoding to work properly, you need to specify a variable from your dataset and what type that variable is. It is possible to ignore the type, but you may get some wonky results. This is due to Vega-Lite defaulting to a :nominal
(string or category) data type. For this example, you want to define x as displ
and define its data type as :quantitative
. Your code should look like this:
|> Vl.encode_field(:x, "displ", type: :quantitative)
For Y you can follow the same process as X except that the variable you need to encode is hwy
. Your code should look like this:
|> Vl.encode_field(:y, "hwy", type: :quantitative)
That’s it! Because Elixir is a functional language that uses pipes (\|>
), you simply combine the steps much like you would a baking recipe. The whole thing should look like this:
Vl.new()
|> Vl.data_from_url(mpg_url)
|> Vl.mark(:point)
|> Vl.encode_field(:x, "displ", type: :quantitative)
|> Vl.encode_field(:y, "hwy", type: :quantitative)
Note: Elixir uses the pipe operator (
*\|>*
) to pass the output of one step to the input of the next step. This saves you from having to create intermediate variable names and makes your code more concise, and readable by showing the transformation of data from top to bottom.
VegaLite has a typical pattern it expects and follows. Below is an example showing the normal “recipe” used to create a chart.
Vl.new() # Initialize a Vega-Lite Chart
|> Vl.data_from_*(data) # Read in data
|> Vl.transform() # Manipulate the data - optional
|> Vl.param() # Add interactivity - optional
|> Vl.mark(:mark) # Specify the shape used
|> Vl.encode_field(:var, "field_name", type: :quantitative) # Encode channels (:x, :y, etc...) with specific fields from the dataset
... # encode as many channels as you wish
Congratulations! You just created your first DataFrame and Chart! Now that you know how to use Smart Cells, you can create great-looking charts without coding. This can be a great way to learn how the library works, and a great starting place to customize further. You finished up by creating your chart from scratch that used an external data source.
In the next section, you’ll learn how to work with encoding different channels (i.e., x, y, color, size, etc…) in a chart.