Lecture 6

Multi-view composition, Interactivity and Themes

Abhijit Dasgupta, Jeff Jacobs, Anderson Monken, and Marck Vaisman

Georgetown University

Spring 2024

Agenda and Goals for Today

Lecture

  • Interaction providing a multi-dimensional view of the data
    • The purpose is to add context and understanding, primarily
    • Change the viewpoint
    • Look at subsets and linkages (filter, select, link)
    • Compare across visualization types and encodings
  • Providing users a modicum of control
    • Control structures
    • Linking multiple graphs (brushing/filtering)
  • Thematic elements and customization
    • Utilizing CSS via JS
    • Adding annotations, changing idioms

Lab

  • The coffee dataset to explore interactive visualizations
  • Discussing different decision choices to improve visualizations

A static multi-viewpoint approach

  • We’re quite familiar with this
    • facets, scatterplot matrices
    • layered graphics
  • This can get quite muddy, depending on how many things we’re trying to compare
  • This typically shows aggregate or overall patterns, which can be misleading
    • We can’t separate out individual observations from the whole
    • We can’t see the which points correspond to the same observations

Making sense in a cluttered visualization

Code
library(plotly)
data(txhousing, package = "ggplot2")
tx <- highlight_key(txhousing, ~city)
base <- plot_ly(tx, color = I("black")) %>% 
  group_by(city)
time_series <- base |> 
  group_by(city) |> 
  add_lines(x = ~date, y = ~median) |> 
  layout(
    title = "Housing prices in Texas",
    xaxis = list(title = ""),
    yaxis = list(title = "Median house price ($)"),
    width = 800,
    height=500
  )
highlight(
  time_series,
  on = "plotly_click",
  selectize=TRUE,
  dynamic=TRUE,
  persistent=TRUE
)

We had seen this spaghetti plot earlier. We can use interactivity to select particular trajectories and identify the corresponding cities. We’ll see the reverse in a bit, when we can filter the cities to highlight them.

Introducing Vega-Lite/Altair

```{ojs}
txhousing = transpose(tx)
{
txhousing = transpose(tx)
selection = vl.selectPoint();

viewof line=vl.markLine()
  .data(txhousing)
  .params(selection)
  .encode(
    vl.x().fieldQ('date').axis({format:"d"}),
    vl.y().fieldQ("median").axis({format: "$d", title: "Median house price"}),
    vl.detail().fieldN("city"),
    vl.opacity().if(selection, vl.value(0.8)).value(0.1),
    vl.tooltip().fieldN("city")
)
  .width(400)
  .height(300)
  .render()

}
```
```{python}
import altair as alt
import pandas as pd
from rdatasets import data
txhousing = data('ggplot2','txhousing')
alt.data_transformers.disable_max_rows()

selection = alt.selection_single(on='mouseover', nearest=True)
alt.Chart(txhousing).mark_line().encode(
  x = "date",
  y = "median",
  detail = "city",
  opacity = alt.condition(selection,0.8, alt.value(0.1))
  tooltip = "city"
).add_selection(
  selection
)



```

Full size

Vega-Lite/Altair work better with linked plots and some interactions

Note that you might be better directly dealing with Vega-Lite, since Altair imposes a somewhat artificial limit on the number of rows in your dataset

Interactivity and multiple views

A graphic is not ‘drawn’ once and for all; it is ‘constructed’ and reconstructed until it reveals all the relationships constituted by the interplay of the data. The best graphic operations are those carried out by the decision-maker themselves.

– Jacques Bertin

Hotels and multiple views

Hotels and multiple views

Hotels and multiple views

Hotels and multiple views

Re-arrange views to make sense …

and interpret

and interpret

Aspects of change

  • Change the idioms
  • Change parameters for idioms
  • Ordering or choice of spatial arrangement
  • Using different visual channels
    • color, size, shape, orientation, etc.
  • Level of aggregation
  • Data partitions
  • Zooming in and out

Note

There is a large variety of attributes that you can change in a visualization. The choice of which to change depends on the data and the story you want to tell. We also consider the transition effects from one choice to the next, to make the transitions less jarring

Multiple views and interactivity

  • Recall the three blind men and the elephant
  • You’re trying to shift your viewpoint (your camera, so to speak) so that you can get different and perhaps more complete views of your data
    • Data today is sufficiently rich and complex that a single view cannot do it justice. You either miss things or make things so cluttered that discerning detail becomes impossible
  • Reordering or sorting the data appropriately can give us insights into different patterns. This is especially true for categorical data
    • “The power of reordering lies in the privileged status of spatial position as the highest ranked visual channel”
  • However, you do want to still maintain the sanctity of the observation
    • We’re really interested in relationships between observations
    • We’ll see how filtering, brushing and linking allows this to happen

Introducing Vega, Vega-Lite and Altair

Interactive plotting in Python

https://sites.northwestern.edu/researchcomputing/2022/02/03/what-is-the-best-interactive-plotting-package-in-python/

First, let’s re-visit the Grammar of Graphics

Statistical graphic specifications are expressed in six statements:

Element Description
DATA a set of data operations that create variables from datasets
TRANS variable transformation
SCALE scale transformations
COORD a coordinate system
ELEMENT graphs (e.g. points) and their aesthetic attributes (e.g. color)
GUIDE one or more guides (axes, legends, etc.)
  • Hadley Wickham implemented Wilkinson’s grammar in R with the popular ggplot2 package.

  • We get to re-use this mental model with Vega-Lite and altair

Declarative vs. Imperative

Imperative

  • Specifies how something should be done

  • The specification and the execution are intertwined

  • e.g. “Put a red circle here and a blue circle there”

Declarative

  • Specifies what should be done (the system figure out how to get there)

  • Separates the specification from the execution

  • e.g. “Map <x> and <y> to a position and <attribute_z> to a color”

Declarative visualization lets you think about the data, the mapping and the relationships rather than figuring all of it out.

Today’s journey

Vega

A visualization grammar

Vega

Vega is a declarative language for

  • creating,
  • saving,
  • sharing

interactive visualizations.

It is built on D3.js, but adds a layer of abstraction

It is still quite granular, providing building blocks like data loading and transformation, scales, maps, axes, legends, and marks.

Declarative programming

Declarative programming is a non-imperative style of programming in which programs describe their desired results without explicitly listing commands or steps that must be performed

– Wikipedia

Vega

The Vega specification is written in JSON

Recall that JSON provides a hierarchical structure to record data.

We will demonstrate the capabilities of Vega and Vega-lite through ojs cells in Quarto. These use the capabilities of Observable to embed Javascript graphics into Quarto documents.

JSON looks like a Python dictionary in many ways, but note that JSON is a data storage format while the dictionary is a Python object

Vega

We start by including the Vega definition in our document

```{ojs}
//| echo: fenced
//| code-fold: false
vega = require("https://cdn.jsdelivr.net/npm/vega@4/build/vega.js")
```

One of the interesting things about Javascript libraries like Vega and Vega-Lite and Observable and D3 is that the order of the commands doesn’t matter. In fact, if you look at Observable notebooks, the call to Vega or Vega-Lite or D3 is often at the bottom of the notebook!!

Note we are calling a CDN, or content delivery network to access the Vega JS specification. Alternatives would be to download the specification locally and import it from there.

This method requires that you be connected to the internet.

Note also the different syntax for chunk options in ojs

Vega

We said that the definition for a Vega graph is written in JSON. Here’s an example of a full specification:

inputSpec = ({
  "$schema": "https://vega.github.io/schema/vega/v4.json",
  "width": 400,
  "height": 200,
  "padding": 5,

  "data": [
    {
      "name": "table",
      "values": [
        {"category": "A", "amount": 28},
        {"category": "B", "amount": 55},
        {"category": "C", "amount": 43},
        {"category": "D", "amount": 91},
        {"category": "E", "amount": 81},
        {"category": "F", "amount": 53},
        {"category": "G", "amount": 19},
        {"category": "H", "amount": 87}
      ]
    }
  ],
  "signals": [
    {
      "name": "tooltip",
      "value": {},
      "on": [
        {"events": "rect:mouseover", "update": "datum"},
        {"events": "rect:mouseout",  "update": "{}"}
      ]
    }
  ],
     
  "scales": [
    {
      "name": "xscale",
      "type": "band",
      "domain": {"data": "table", "field": "category"},
      "range": "width",
      "padding": 0.05,
      "round": true
    },
    {
      "name": "yscale",
      "domain": {"data": "table", "field": "amount"},
      "nice": true,
      "range": "height"
    }
  ],

  "axes": [
    { "orient": "bottom", "scale": "xscale" },
    { "orient": "left", "scale": "yscale" }
  ],

  "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "encode": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": 1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value": 0}
        },
        "update": {
          "fill": {"value": "steelblue"}
        },
        "hover": {
          "fill": {"value": "red"}
        }
      }
    },
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "bottom"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category", "band": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -2},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": [
            {"test": "isNaN(tooltip.amount)", "value": 0},
            {"value": 1}
          ]
        }
      }
    }
  ]
}
)

Vega

Since we’re using OJS, we first have to parse the input specification to a live dataflow.

parsedSpec = vega.parse(inputSpec)

This results in the following plot:

viewof view={
   const div = document.createElement('div');
   div.value = new vega.View(parsedSpec)
      .initialize(div)
      .run();
   return div;
}

Note, we’re using JS to

  • define a div
  • populate the div with a View of the parsed VegaJS specification
  • run Vega on that div
  • return the div to the web page

A note on parsing the Vega JSON specification

Vega parses an input specification to produce a dataflow graph

This graph is the basis of all necessary computations to visually encode the data

Nodes

  • These are operators that perform operations
    • calculate an aggregate
    • create a scale mapping

Edges

  • Dependencies between nodes

Once the input specification is parsed into a dataflow graph, you can instatiate a View component that makes an interactive graph using the vega-runtime library

A major advantage to modeling computation as a dataflow graph is the ability to perform efficient reactive updates. When parameters change or the input data is modified, the dataflow can re-evaluate only those nodes affected by the update.

Vega

Let’s get back to the Vega graph specification.

Visualization size

 "width": 400,
 "height": 200,
 "padding": 5,
 "autosize": "pad",

The width and height determine the size of the canvas where the data will be plotted.

The padding determines the margin between the plot and the border of the view

The autosize property allows either

  1. extra space to accommodate all visual marks ("pad"),
  2. fits the entire plot into the provided width and height ("fit"), or
  3. does no automatic sizing ("none")

Vega: Data


  "data": [
    {
      "name": "table",
      "values": [
        {"category": "A", "amount": 28},
        {"category": "B", "amount": 55},
        {"category": "C", "amount": 43},
        {"category": "D", "amount": 91},
        {"category": "E", "amount": 81},
        {"category": "F", "amount": 53},
        {"category": "G", "amount": 19},
        {"category": "H", "amount": 87}
      ]
    }
  ],

We have an array of data objects with fields named category (a string label) and amount (a number)

Data can be

  • loaded from the web using the url property (including JSON and CSV)
  • derived from a previously defined data set using the source property
  • left undefined and dynamically set when the visualization is constructed

Only one of the values, url or source properties can be defined

You can also modify data using transforms, like filtering, aggregation and layout operations.

Vega: scales

Scales map data to visual values like positions and colors (think ggplot2)

"scales": [
    {
      "name": "xscale",
      "type": "band",
      "domain": {"data": "table", "field": "category"},
      "range": "width",
      "padding": 0.05,
      "round": true
    },
    {
      "name": "yscale",
      "domain": {"data": "table", "field": "amount"},
      "nice": true,
      "range": "height"
    }
  ],
  • domain specifies the data that is being encoded in that scale
    • Here we specify it dynamically from the data
    • You can also use an array of values
  • By default, quantitative domains include 0. To disable, use "zero": false in the scale definition
  • padding puts space between bars
  • nice: true makes the scale domain more readable and human-friendly

The range settings of width and height are conveniences provided by Vega, and in this case map to the arrays defined by the size of the visualization.

Each scale needs a unique name attribute.

Vega: axes

"axes": [
 { "orient": "bottom", "scale": "xscale" },
 { "orient": "left", "scale": "yscale" }
],

You can further customize axes; see the axes documentation

Vega: Marks

 "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "encode": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": 1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value": 0}
        },
        "update": {
          "fill": {"value": "steelblue"}
        },
        "hover": {
          "fill": {"value": "red"}
        }
      }
    },

This provides the specification of the marks. There are different kinds of encoding sets within the encode property

  • enter specifies properties when a mark is first created

  • exit specifies properties when a mark is removed

  • update specifies updates

  • hover specifies properties upon mouse hover

  • y and y2 refer to the top and bottom of the bars, respectively.

Vega: signals

Signals are dynamic variables: expressions that are automatically re-evaluated when other signal values change or when input events occur

Each signal must have a unique name and an inital value

"signals": [
    {
      "name": "tooltip",
      "value": {},
      "on": [
        {"events": "rect:mouseover", "update": "datum"},
        {"events": "rect:mouseout",  "update": "{}"}
      ]
    }
  ],

tooltip changes in response to mouseover and mouseout events on rect marks

"marks": [
    ...,
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "bottom"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category", "band": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -2},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": [
            {"test": "isNaN(tooltip.amount)", "value": 0},
            {"value": 1}
          ]
        }
      }
    }

Vega: transforms

There is a lot of granular specification possible in Vega

It’s not as granular as D3.js, in that there are some abstractions like loess and pivot and quantile

Developing in Vega

  • We’ve been using ojs chunks in a Quarto document to develop Vega graphics. This is a modern solution

  • Vega can be developed online using the Vega Editor

    • Solutions can be transferred to Github quite easily
  • A great listing of how to specify axes and legends in Vega is available here

Moving from Vega to Vega-Lite

Vega-Lite

Vega-Lite is a high-level grammar of interactive graphics

It uses a declarative JSON syntax to specify visualizations for data analysis and presentation

Differences with Vega

  • Automatically produces components like axes, legends and scales using carefully designed rules
  • Meant for quick visualization authoring
  • Supports data transformations (aggregation, filtering, binning, sorting) and visual transformations (stacking, faceting)
  • More concise specification

Can you still use Vega?

Yes. Vega-Lite :: Vega as seaborn :: matplotlib. You can create graphics quickly and then drop down for more fine control.

Using Vega-Lite in ojs

We have to first load the Vega-Lite specification into our environment.

```{ojs}
//| echo: fenced
//| code-fold: false
import {vl} from "@vega/vega-lite-api-v5"
```

Data for Vega-Lite

Data is assumed to be a tidy data frame with named data columns. After importing, it is stored as an array of JavaScript objects.

As in Vega, you can import data as a URL, or an array of objects

You can play with a variety of standard “book” data available in the vega-datasets repo. These can be accessed in OJS by data = require('vega-datasets@1'). You can also access these from Python using pip install vega_datasets and then importing the vega_dataset library

Data types

There are four basic data types:

Type Description Function
Nominal (N) Categorical data

fieldN

Ordinal (O) Ordinal data

fieldO

Quantitative (Q) Quantitative data

fieldQ

Temporal (T) Temporal data

fieldN

These are specified in the encoding steps to ensure the right kind of plotting is done.

Vega-Lite: API vs JSON

exdat = data['cars.json']()
vl.markCircle()
   .params(vl.selectInterval().bind('scales'))
   .encode(
   vl.x().fieldQ('Horsepower').scale({'domain': [75, 150]}),
   vl.y().fieldQ("Miles_per_Gallon").scale({'domain': [20,40]}),
   vl.size().fieldQ("Cylinders"),
)
.data(exdat).render()
embed({
  "$schema":"https://vega.github.io/schema/vega-lite/v4.json",
  "data": {"values": exdat},
  "mark": "circle",
  "params": [
    {
      "name": "name4",
      "bind": "scales",
      "select": {"type": "interval"}
    }
  ],
  "encoding": {
    "x": {
      "field":"Horsepower", "type": "quantitative",
      "scale": {"domain": [75,150]}
    },
    "y": {
      "field": "Miles_per_Gallon", "type": "quantitative",
      "scale": {"domain": [20,40]}
    },
    "size": {
      "field": "Cylinders", "type":"quantitative"
    }
  }
})
embed = require('vega-embed')

Typically we prefer the API rather than the JSON

Seattle temperatures

data = require('vega-datasets@1')
seattle_temps = data['seattle-weather.csv']()
printTable(seattle_temps.slice(0,5))
vl.markPoint()
   .data(seattle_temps)
   .encode(
      vl.x().fieldT('date')
         .axis({title: "Date", format: "%b %Y"}),
      vl.y().fieldQ('temp_max')
         .axis({title: "Maximum temperature (C)"})
   ) 
   .render()

Aggregation

vl.markPoint()
   .data(seattle_temps)
   .encode(
      vl.x().month('date')
         .axis({title: "Month", format: "%b"}), 
      vl.y().mean('temp_max')
         .scale({domain: [-5,40]})
         .axis({title: "Average Maximum Daily Temperature (C)"})
   )
   .render()
  • month extracts the month from the date
  • mean computes the average by month

Layering

```{ojs}
//| code-fold: false
//| output-location: column

line1= vl.markLine({color: "red"})
   .data(seattle_temps)
   .encode(
      vl.x().month('date'),
      vl.y().mean('temp_max')
   )

points1 vl.markCircle()
   .data(seattle_temps)
   .encode(
      vl.x().month('date').axis({title: "Month", format: "%b"}),
      vl.y().fieldQ('temp_max').axis({title:"Maximum temperature"})
      )

   
 vl.layer(line1, points1).render()
```

OJS Syntax Error (line 10, column 9)Unexpected token

A bit more complex

```{ojs}
//| output-location: column
//| code-fold: false
plot1 = vl.markPoint({filled: true})
   .encode(
      vl.color().fieldN("weather").title("Weather"),
      vl.size().fieldQ("precipitation").scale({domain: [-1,50], range:[10,500]}).title("Precipitation"),
      vl.order().fieldQ("precipitation").sort("descending"),
      vl.x().timeMD("date").axis({title: "Date", format: "%b"}),
      vl.y().fieldQ("temp_max").scale({domain: [-5,40]}).axis({title:"Max Daily Temp (C)"})
)
   .width(800)

plot2 = vl.markBar()
   .encode(
      vl.color().title("Weather"),
      vl.x().count(),
      vl.y().fieldN("weather").title("Weather")
).width(800)

vl.vconcat(plot1, plot2)
   .data(seattle_temps)
   .autosize({type:'fit-x', contains: 'padding'})
   .render()
```

Interaction: brushing

brush1 = vl.selectInterval().encodings('x')
click1 = vl.selectPoint().encodings('color')

plot11 = vl.markPoint({filled: true})
   .encode(
      vl.color().value('lightgray')
         .if(brush, vl.color().fieldN('weather').title("Weather")),
      vl.size().fieldQ("precipitation").scale({domain: [-1,50], range:[10,500]}).title("Precipitation"),
      vl.order().fieldQ("precipitation").sort("descending"),
      vl.x().timeMD("date").axis({title: "Date", format: "%b"}),
      vl.y().fieldQ("temp_max").scale({domain: [-5,40]}).axis({title:"Max Daily Temp (C)"})
)
   .width(800)
   .height(300)
   .params(brush1)
   .transform(vl.filter(click1))

plot21 = vl.markBar()
   .encode(
      vl.color().if(click1, vl.color().fieldN('weather')).value('lightgray').title("Weather"),
      vl.x().count(),
      vl.y().fieldN("weather").title("Weather")
).width(800)
.params(click1)
.transform(vl.filter(brush1))

vl.vconcat(plot11, plot21)
   .data(seattle_temps)
   .autosize({type:'fit-x', contains: 'padding'})
   .render()

Vega-Lite: Slider

cars = data['cars.json']()
isOrigin = vl.selectPoint('isOrigin')
   .fields("Origin")
   .bind('legend') // bind to legend interactions
   
isYear = vl.selectPoint('isYear')
   .fields('Year').value(1970)
   .bind(vl.slider(1970, 1980, 1).name("Year"))

show=vl.and(isOrigin, isYear)

vl.markCircle()
   .data(cars)
   .transform(
   vl.calculate('year(datum.Year)').as("Year")
).params(isOrigin, isYear)
.encode(
   vl.x().fieldQ("Horsepower"),
   vl.y().fieldQ('Miles_per_Gallon'),
   vl.color().if(show, vl.color().fieldN('Origin')).value('grey'),
   vl.opacity().if(show, vl.value(1.0)).value(0.2)
).render()

Layers

weather = data['weather.csv']()
printTable(weather.slice(0,5))
vl.markArea({opacity: 0.3})
   .data(weather)
   .encode(
      vl.x().month('date'),
      vl.y().average('temp_max'),
      vl.y2().average('temp_min'),
      vl.color().fieldN('location')
).render()

Facets

vl.markBar()
   .data(weather)
   .transform(vl.filter('datum.location=="Seattle"'))
   .encode(
    vl.x().fieldQ('temp_max').bin(true).title('Temperature (°C)'),
      vl.y().count(),
      vl.color().fieldN('weather'),
      vl.column().fieldN('weather')
).width(150)
.height(150)
.render();

Concatenating graphs

base = vl.markLine()
    .data(weather)
    .encode(
      vl.x().month('date').title(null),
      vl.color().fieldN('location')
    )
    .width(240)
    .height(180);

temp = base.encode(vl.y().average('temp_max'));
 precip = base.encode(vl.y().average('precipitation'));
wind = base.encode(vl.y().average('wind'));

vl.hconcat(temp, precip, wind).render();

Vega-Lite and Altair

We’ve been looking at the Vega-Lite Javascript API, rather than the JSON specification.

This is very similar to what we will see in Altair

Altair

Click here!!

Linking plots

Let’s start with a scatterplot matrix

Code

```{ojs}
datasets = require('vega-datasets@1')
cars = datasets['cars.json'].url

vl.markCircle()
    .data(cars)
    .encode(
      vl.x().fieldQ(vl.repeat('column')),
      vl.y().fieldQ(vl.repeat('row')),
      vl.color().fieldO('Cylinders'),
    )
    .width(140)
    .height(140)
    .repeat({
      column: ['Acceleration', 'Horsepower', 'Miles_per_Gallon'],
      row: ['Miles_per_Gallon', 'Horsepower', 'Acceleration']
    })
    .render();
```

We can’t tell which observations are actually in which part of each subplot.

Adding more depth: selecting & linking

Code link

Changing views: linking different idioms

We can look at linking different idioms that are based on the same dataset

Here, we can select a range of release years to see if the association between the two ratings changes or not.

Adding controls for filtering

Dynamic queries

A dynamic query:

  • represents a query graphically,
  • provides visible limits on the query range,
  • provides a graphical representation of the data and query result,
  • gives immediate feedback of the result after every query adjustment,
  • and allows novice users to begin working with little training.

The idea here is to make rapid exploration of the data to identify patterns. These can be achieved using tools like sliders, radio buttons, and menus

Radio buttons

Plotly