Skip to content

17‐visualization

djbpitt edited this page Dec 29, 2024 · 4 revisions

The final stage!

If you’re reading this, congratulations on getting (or skipping) to the end of our tutorial set. Data visualizations were the last topic we wanted to cover, not because they are an optional or unimportant aspect of a research edition, but because the technical skills they require are largely dependent on project-specific research data. For this reason, we are focusing on two kinds of data visualization skills: 1) third-party tools like Mapbox and 2) do-it-yourself SVG (scalable vector graphics) visualizations like the one we will make in this tutorial. Both approaches require some learning in order to evaluate whether the tool and its implementation is a good fit for your project. These tasks don’t include much new XQuery or other code, so in this section we focus on how we approach the goals, rather than on details of the code itself.

Planning a visualization

By now the following principle should not come as a surprise: start from the research questions when planning a visualization. When you start from a type of graph or chart that you’ve selected from a library or modeled after an existing visualization, you implicitly commit yourself to inherited design decisions. For example, if you set out to create a bar chart, you are committed to variables represented on the X and Y axes. You can enrich your visualization by coloring the bars or varying their width (see, for example, our Results of 2012 US presidential election), but a bar chart is fundamentally a representation of the relationship between a range of X values and an associated range of Y values.

When we first began drafting visualizations for this project, we approached the task similarly to the way we approach wireframing. After stating our research questions and goals for the visualization, we drafted potential visual representations by hand on notecards. Many of those notecards featured confusing, simplistic, or otherwise unreadable graphs, which is part of the process, since drawing, assessing, and revising helped us arrive on a visualization that worked. Performing that much iteration within our application would have meant writing a lot of code that we would ultimately discard, and because we were focused at this point in the development on the result, rather than on how to create it, sketching by hand was more efficient and more effective than implementing all of the variations we considered.

Our research questions for this visualization were: Is there a relationship (beyond mere coincidence) between an article’s length and its relationship to physical space? Are short, reported articles more or less likely to make significant use of specific place references than the longform pieces of the later decades?

We can answer these questions without visualizing the data. For example, we could report average length and average place count, identify maximum and minimum values, read any outliers, and draw conclusions. But this approach sidelines the outliers, forecloses any discovery or closer reading by the user, and summarizes in places where detail might be useful. We want the edition to open up discovery, so our visualization should include a way to link back to articles themselves. If you’re running the application on the stage 17 branch, you can take a look at our final visualization and click the links that point to reading views of the articles. This feature is also present on our web hosted edition: https://hoax.obdurodon.org/visualize, reproduced below:

Screenshot 2024-12-29 at 11 57 00 AM

We discuss how to read the visualization, and the reasons behind specific design decisions, at https://hoax.obdurodon.org/visualize, but key features include:

  • The red dots represent word count (note the red labeling on the left-side Y axis) and the blue dots represent place count (note the blue labeling on the right-side Y axis).
  • We let the data determine the scaling: the maximum word and place counts are both at the maximum Y height.
  • For most articles the blue dot is above the red one; where that is reversed, it shows a relatively large place count with respect to word count, and the length of the connecting bar between the two dots makes it easier to see the relative proportions.
  • The background color changes by decade, which makes it easier to see that the decade with the largest representation is the thirties, followed by the naughts, followed by the fifties.
  • We don’t see much of a trend in terms of the count of place references over time, but word count trends downward except for an 1852 outlier, which reveals, when read, that it is a relatively literary (rather than journalistic) text.

Drafting the pipeline

The pipeline for this view is the same as for any other: we isolate and organize information in the model and then format it in the view. In this case, that means we draw SVG in the view.

If you are new to SVG, you can find a good basic guide at https://jenkov.com/tutorials/svg/index.html. Like HTML, SVG is an XML schema. As with other technologies that we introduce in this guide, you can accomplish a lot without learning all of the details, and we recommend focusing first on the features you need in order to achieve your specific goals. In this case, we will be drawing lines, circles, rectangles, and text, and these four types of objects are often all you need to create informative and insightful visualizations.

Creating the model

This visualization reformulates and displays a lot of data we have already used in other interfaces. That we are reusing data that we already processed elsewhere simplifies creating the model for this visualization, since we calculated and stored all of the facets we want to use already. You can view the full module code for visualization.xql in this branch. Rather than repeat the XQuery writing process here, we challenge you to try to reverse engineer the model code based on what you can see in the final visualization. Is it what you expected?

We added quite a few extra data points to the model, in part because we wanted to enable some experimentation. This graph could also be used to understand the relationships between ghost mentions, time, and article length. Our research questions were more aligned with questions of space and place, so we chose to display one visualization that explores and exposes that question.

Creating the view

Here is the final view code: https://github.com/Pittsburgh-NEH-Institute/hoaXed/blob/17-visualization/views/visualization-to-html.xql

Rather than record each iterative step of building this visualization, we outline a few strategies and examples below for approaching SVG visualizations in general. This advice will be largely programming-language agnostic, and although a visualization library, if you are using one, might manage some of the details for you, knowing how SVG works can nonetheless help you make decisions and troubleshoot unexpected results.

Draw axes and define the view window

One of the most challenging aspects of SVG when you begin is defining the viewBox and drawing in the correct area of the infinite coordinate space. Our viewBox tutorial introduces those complexities and provides some solutions, and we recommend reading that now if don’t have any prior experience with SVG. Most importantly, if your screen doesn’t render what you think you are drawing, understanding the SVG coordinate space can help you figure out whether you are failing to draw what you what to draw vs whether you are drawing it correctly, but in a part of the space that is not rendered on the screen.

Build with test data first

At the top of the document you can find a small sample of the XML output that comes from the model. Initially, we built the component that visualizes a single article first: a rectangle, two circles, and a line. Next, we left the test data in place, but let our XQuery build the same rectangle for each <m:article> element in the model. This two-step approach meant that we verified that we could render our SVG before introducing any variables to it, that is, had our XQuery-derived data failed to render, we could have narrowed the troubleshooting to the output of the XQuery code.

Our initial for loop drew all of the rectangles on top of one another, that is, at the same X position. They are of equal width, so we can easily predict how big the x-axis should be, but how do we know the X position where each individual rectangle should start? We answered this question using the following for expression:

for $article at $pos in (
    $data/descendant::m:by-article/m:article =>
    sort((),function($a){$a/m:date})
  )
  let $title as xs:string := $article/m:title ! string(.)
  let $decade as xs:integer := $article/m:date ! substring(., 3, 1) ! xs:integer(.)
  let $year as xs:string := $article/m:date ! substring(., 3, 2)
  let $link as xs:string := ("read?id=" || $article/m:id)        
  let $word-count as xs:integer := $article/m:word-count ! xs:integer(.)
  let $place-count as xs:integer := $article/m:place-count ! xs:integer(.)  
  let $x-pos as xs:integer := $pos * 30

The $pos variable provides the offset within the sequence of articles after sorting by date. We multiply the offset position by 30, or the width of each rectangle, to determine where the rectangle should be drawn; the rectangle will be drawn at an X position of 30, the next at 60, etc.

Iterate; or make one change at a time

If you take one thing away from this tutorial, it is to make one change, test it, troubleshoot, and then move on. When we introduce two new variables at once, it makes everything more complex. In the example above, we started by ignoring the X position and drawing all the rectangles on top of one another so that we could verify that we were extracting the data values correctly. Once we could see that result, we calculated the X position and spread out the rectangles along the X axis. We did both of those things before going back and sorting the articles by date.