Skip to content

17‐visualization

Gabi Keane edited this page Jan 12, 2024 · 2 revisions

[wip]

Our final stage!

If you’re reading this, congratulations on getting to (or skipping to) the end of our tutorial set. Data visualizations were the last topic we wanted to cover, not because they are an optional or unimportant aspect of a research edition, but because the technical skills they require are largely dependent on the individual research data. For this reason, we are focusing on two kinds of data visualization skills: third-party tools like Mapbox, and do-it-yourself SVG (scalable vector graphics) visualizations like the one we will make in this tutorial. Both approaches require some learning in order to evaluate if the tool and its implementation is the right one for your project. These tutorials don’t include a ton of new XQuery or other coding instruction, so we can demonstrate how we approach the goals without making it unnecessarily complicated.

Planning a visualization

At this point, it should be no surprise: start from the research questions when planning a visualization. When you start from a type of graph or chart, selected from a library or modelled after a chart you have seen elsewhere, you are automatically bound by the restrictions of that specific visualization. If you set out to make a bar graph, you are stuck with just two variables, arranged in a way that allows for minimal comparison along the x-axis. Often, the graphs we look at are on a time-based x-axis. For much edition research data, time is not a primary variable.

When we first began drafting visualizations for this project, we took a similar approach we did when wireframing. After stating our research questions and goals for the visualization, we drafted by hand on notecards. Many of those notecards featured confusing, simplistic, or otherwise unreadable graphs. The process of drawing them helped us refine what worked. If we had tried to do that much iteration in our application, it would have been a lot more work.

Our research questions for this visualization were: Is there a relationship (beyond mere coincidence) between an article’s length and its relationship to physical space? Are short, reported articles more or less likely to make significant use of specific place reference than the longform pieces of the later decades?

We can answer this question without visualizing it. We could report average length and average place count, take maximums and minimums, read any outliers, and draw conclusions. But this approach minimizes the outliers which are perhaps the most interesting, forecloses any discovery or closer reading by the user, and summarizes in places where complexity might be useful. We want the edition to open up discovery, so our visualization should include a way to link back to articles themselves. If you’re running the application on the stage 17 branch, you can take a look at our final visualization and click the links that read back to reading views of the articles. This feature is also present on our web hosted edition: https://hoax.obdurodon.org/visualize

Drafting the pipeline

As always, we follow the same model we have before: we want to isolate and organize information in the model and format it in the view. In this case, that means we will draw SVG in the view.

Here is a basic guide to writing SVG: https://jenkov.com/tutorials/svg/index.html. Like HTML, SVG is a controlled XML schema. Like all new technologies we introduced in this guide, we recommend becoming only as familiar with it as you need to be to achieve your specific goals. In this case, we will be drawing lines, circles, rectangles, and text; it is not very often we need to use SVG elements beyond those four.

Creating the model

This visualization reformulates and displays a lot of data we have already used in other interfaces. That makes writing the model for this visualization a lot easier; we calculated and stored all of the facets we want to use already.

You can view the full module code for visualization.xql in this branch. Rather than repeat the XQuery writing process here, we challenge you to try to reverse engineer the model code based on what you can see in the final visualization. Is it what you expected?

We added quite a few extra data points to the model, in part because we wanted to enable some experimentation. This graph could also be used to understand the relationships between ghost mentions, time, and article length. Our research questions were more aligned with questions of space and place, so we chose to display one visualization that explores and exposes that question.

Creating the view

Here is the final view code: https://github.com/Pittsburgh-NEH-Institute/hoaXed/blob/17-visualization/views/visualization-to-html.xql

Rather than record each iterative step of building this visualization, we will provide a few strategies and examples below for approaching SVG visualizations in general. This advice will be largely programming language agnostic, even though we are using XQuery. However, if you are using a visualization library, the advice may not be as relevant.

Draw axes and define the view window

One of the most challenging aspects of SVG when you begin is defining the viewBox and drawing in the right coordinate space. This viewBox tutorial introduces those complexities and provides some solutions. It is most effective to figure this part out before you begin writing any XQuery to draw SVG. If you cannot see what you are drawing, it becomes a lot more difficult to troubleshoot.

Build with test data first

At the top of the document, you can find a small sample of the XML output that comes from the model. Initially, we built the component that visualizes a single article first— a rectangle, two circles, and a line. Next, we left the test data in place, but built the same rectangle for each <m:article> element present in the model. This provided a few opportunities for troubleshooting: we knew the SVG would render properly before introducing any variables to it. It also helped us experiment with how to position each rectangle, because our initial for loop drew them all on top of each other. They are equal width, so we can easily predict how big the x-axis should be, but how do we know where each individual rectangle should start?

We solved this using the following for loop:

for $article at $pos in ($data/descendant::m:by-article/m:article =>
         sort((),function($a){$a/m:date}))
        let $title as xs:string := $article/m:title ! string(.)
        let $decade as xs:integer := $article/m:date ! substring(., 3, 1) ! xs:integer(.)
        let $year as xs:string := $article/m:date ! substring(., 3, 2)
        let $link as xs:string := ("read?id=" || $article/m:id)        
        let $word-count as xs:integer := $article/m:word-count ! xs:integer(.)
        let $place-count as xs:integer := $article/m:place-count ! xs:integer(.)  
        let $x-pos as xs:integer := $pos * 30

The $pos variable provides the position in the sequence we are looping over, sorted by date. We then multiply that by 30, or the width of each rectangle, to determine where the rectangle should be drawn.

Iterate; or make one change at a time

If you take one thing away from this tutorial, it is to make one change, test it, troubleshoot, and then move on. When we introduce two new variables at once, it makes everything more complex. In the example above, we drew all the rectangles on top of each other before we calculated the position. We did both of those things before trying to sort the articles by date.

Revising and adding to the visualization