Lecture 6

Multi-view composition, Interactivity and Themes

Abhijit Dasgupta, Jeff Jacobs, Anderson Monken, and Marck Vaisman

Georgetown University

Spring 2024

Agenda and Goals for Today

Lecture

Interaction providing a multi-dimensional view of the data
- The purpose is to add context and understanding, primarily
- Change the viewpoint
- Look at subsets and linkages (filter, select, link)
- Compare across visualization types and encodings
Providing users a modicum of control
- Control structures
- Linking multiple graphs (brushing/filtering)
Thematic elements and customization
- Utilizing CSS via JS
- Adding annotations, changing idioms

Lab

The coffee dataset to explore interactive visualizations
Discussing different decision choices to improve visualizations

A static multi-viewpoint approach

We’re quite familiar with this
- facets, scatterplot matrices
- layered graphics

This can get quite muddy, depending on how many things we’re trying to compare
This typically shows aggregate or overall patterns, which can be misleading
- We can’t separate out individual observations from the whole
- We can’t see the which points correspond to the same observations

Making sense in a cluttered visualization

Code

library(plotly)
data(txhousing, package = "ggplot2")
tx <- highlight_key(txhousing, ~city)
base <- plot_ly(tx, color = I("black")) %>% 
  group_by(city)
time_series <- base |> 
  group_by(city) |> 
  add_lines(x = ~date, y = ~median) |> 
  layout(
    title = "Housing prices in Texas",
    xaxis = list(title = ""),
    yaxis = list(title = "Median house price ($)"),
    width = 800,
    height=500
  )
highlight(
  time_series,
  on = "plotly_click",
  selectize=TRUE,
  dynamic=TRUE,
  persistent=TRUE
)

We had seen this spaghetti plot earlier. We can use interactivity to select particular trajectories and identify the corresponding cities. We’ll see the reverse in a bit, when we can filter the cities to highlight them.

```{ojs}
txhousing = transpose(tx)
{
txhousing = transpose(tx)
selection = vl.selectPoint();

viewof line=vl.markLine()
  .data(txhousing)
  .params(selection)
  .encode(
    vl.x().fieldQ('date').axis({format:"d"}),
    vl.y().fieldQ("median").axis({format: "$d", title: "Median house price"}),
    vl.detail().fieldN("city"),
    vl.opacity().if(selection, vl.value(0.8)).value(0.1),
    vl.tooltip().fieldN("city")
)
  .width(400)
  .height(300)
  .render()

}
```

```{python}
import altair as alt
import pandas as pd
from rdatasets import data
txhousing = data('ggplot2','txhousing')
alt.data_transformers.disable_max_rows()

selection = alt.selection_single(on='mouseover', nearest=True)
alt.Chart(txhousing).mark_line().encode(
  x = "date",
  y = "median",
  detail = "city",
  opacity = alt.condition(selection,0.8, alt.value(0.1))
  tooltip = "city"
).add_selection(
  selection
)



```

txhousing = transpose(tx)
selection = vl.selectPoint();

viewof line=vl.markLine()
  .data(txhousing)
  .params(selection)
  .encode(
    vl.x().fieldQ('date').axis({format:"d"}),
    vl.y().fieldQ("median").axis({format: "$d", title: "Median house price"}),
    vl.detail().fieldN("city"),
    vl.opacity().if(selection, vl.value(0.8)).value(0.1),
    vl.tooltip().fieldN("city")
)
  .width(400)
  .height(300)
  .render()

Full size

Vega-Lite/Altair work better with linked plots and some interactions

Note that you might be better directly dealing with Vega-Lite, since Altair imposes a somewhat artificial limit on the number of rows in your dataset

Interactivity and multiple views

A graphic is not ‘drawn’ once and for all; it is ‘constructed’ and reconstructed until it reveals all the relationships constituted by the interplay of the data. The best graphic operations are those carried out by the decision-maker themselves.

– Jacques Bertin

Hotels and multiple views

Re-arrange views to make sense …

and interpret

Aspects of change

Change the idioms
Change parameters for idioms
Ordering or choice of spatial arrangement
Using different visual channels
- color, size, shape, orientation, etc.
Level of aggregation
Data partitions
Zooming in and out

Note

There is a large variety of attributes that you can change in a visualization. The choice of which to change depends on the data and the story you want to tell. We also consider the transition effects from one choice to the next, to make the transitions less jarring

Multiple views and interactivity

Recall the three blind men and the elephant
You’re trying to shift your viewpoint (your camera, so to speak) so that you can get different and perhaps more complete views of your data
- Data today is sufficiently rich and complex that a single view cannot do it justice. You either miss things or make things so cluttered that discerning detail becomes impossible
Reordering or sorting the data appropriately can give us insights into different patterns. This is especially true for categorical data
- “The power of reordering lies in the privileged status of spatial position as the highest ranked visual channel”
However, you do want to still maintain the sanctity of the observation
- We’re really interested in relationships between observations
- We’ll see how filtering, brushing and linking allows this to happen

Les Miserables Co-occurrence

Introducing Vega, Vega-Lite and Altair

Interactive plotting in Python

https://sites.northwestern.edu/researchcomputing/2022/02/03/what-is-the-best-interactive-plotting-package-in-python/

First, let’s re-visit the Grammar of Graphics

Statistical graphic specifications are expressed in six statements:

Element	Description
DATA	a set of data operations that create variables from datasets
TRANS	variable transformation
SCALE	scale transformations
COORD	a coordinate system
ELEMENT	graphs (e.g. points) and their aesthetic attributes (e.g. color)
GUIDE	one or more guides (axes, legends, etc.)

Hadley Wickham implemented Wilkinson’s grammar in R with the popular ggplot2 package.
We get to re-use this mental model with Vega-Lite and altair

Declarative vs. Imperative

Imperative

Specifies how something should be done
The specification and the execution are intertwined
e.g. “Put a red circle here and a blue circle there”

Declarative

Specifies what should be done (the system figure out how to get there)
Separates the specification from the execution
e.g. “Map <x> and <y> to a position and <attribute_z> to a color”

Declarative visualization lets you think about the data, the mapping and the relationships rather than figuring all of it out.

Today’s journey

Vega

A visualization grammar

Vega

Vega is a declarative language for

creating,
saving,
sharing

interactive visualizations.

It is built on D3.js, but adds a layer of abstraction

It is still quite granular, providing building blocks like data loading and transformation, scales, maps, axes, legends, and marks.

Declarative programming

Declarative programming is a non-imperative style of programming in which programs describe their desired results without explicitly listing commands or steps that must be performed

– Wikipedia

Vega

The Vega specification is written in JSON

Recall that JSON provides a hierarchical structure to record data.

We will demonstrate the capabilities of Vega and Vega-lite through ojs cells in Quarto. These use the capabilities of Observable to embed Javascript graphics into Quarto documents.

JSON looks like a Python dictionary in many ways, but note that JSON is a data storage format while the dictionary is a Python object

Vega

We start by including the Vega definition in our document

```{ojs}
//| echo: fenced
//| code-fold: false
vega = require("https://cdn.jsdelivr.net/npm/vega@4/build/vega.js")
```

One of the interesting things about Javascript libraries like Vega and Vega-Lite and Observable and D3 is that the order of the commands doesn’t matter. In fact, if you look at Observable notebooks, the call to Vega or Vega-Lite or D3 is often at the bottom of the notebook!!

Note we are calling a CDN, or content delivery network to access the Vega JS specification. Alternatives would be to download the specification locally and import it from there.

This method requires that you be connected to the internet.

Note also the different syntax for chunk options in ojs

Vega

We said that the definition for a Vega graph is written in JSON. Here’s an example of a full specification:

inputSpec = ({
  "$schema": "https://vega.github.io/schema/vega/v4.json",
  "width": 400,
  "height": 200,
  "padding": 5,

  "data": [
    {
      "name": "table",
      "values": [
        {"category": "A", "amount": 28},
        {"category": "B", "amount": 55},
        {"category": "C", "amount": 43},
        {"category": "D", "amount": 91},
        {"category": "E", "amount": 81},
        {"category": "F", "amount": 53},
        {"category": "G", "amount": 19},
        {"category": "H", "amount": 87}
      ]
    }
  ],
  "signals": [
    {
      "name": "tooltip",
      "value": {},
      "on": [
        {"events": "rect:mouseover", "update": "datum"},
        {"events": "rect:mouseout",  "update": "{}"}
      ]
    }
  ],
     
  "scales": [
    {
      "name": "xscale",
      "type": "band",
      "domain": {"data": "table", "field": "category"},
      "range": "width",
      "padding": 0.05,
      "round": true
    },
    {
      "name": "yscale",
      "domain": {"data": "table", "field": "amount"},
      "nice": true,
      "range": "height"
    }
  ],

  "axes": [
    { "orient": "bottom", "scale": "xscale" },
    { "orient": "left", "scale": "yscale" }
  ],

  "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "encode": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": 1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value": 0}
        },
        "update": {
          "fill": {"value": "steelblue"}
        },
        "hover": {
          "fill": {"value": "red"}
        }
      }
    },
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "bottom"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category", "band": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -2},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": [
            {"test": "isNaN(tooltip.amount)", "value": 0},
            {"value": 1}
          ]
        }
      }
    }
  ]
}
)

Vega

Since we’re using OJS, we first have to parse the input specification to a live dataflow.

parsedSpec = vega.parse(inputSpec)

This results in the following plot:

viewof view={
   const div = document.createElement('div');
   div.value = new vega.View(parsedSpec)
      .initialize(div)
      .run();
   return div;
}

Note, we’re using JS to

define a div
populate the div with a View of the parsed VegaJS specification
run Vega on that div
return the div to the web page

A note on parsing the Vega JSON specification

Vega parses an input specification to produce a dataflow graph

This graph is the basis of all necessary computations to visually encode the data

Nodes

These are operators that perform operations
- calculate an aggregate
- create a scale mapping

Edges

Dependencies between nodes

Once the input specification is parsed into a dataflow graph, you can instatiate a View component that makes an interactive graph using the vega-runtime library

A major advantage to modeling computation as a dataflow graph is the ability to perform efficient reactive updates. When parameters change or the input data is modified, the dataflow can re-evaluate only those nodes affected by the update.

Vega

Let’s get back to the Vega graph specification.

Visualization size

 "width": 400,
 "height": 200,
 "padding": 5,
 "autosize": "pad",

The width and height determine the size of the canvas where the data will be plotted.

The padding determines the margin between the plot and the border of the view

The autosize property allows either

extra space to accommodate all visual marks ("pad"),
fits the entire plot into the provided width and height ("fit"), or
does no automatic sizing ("none")

Vega: Data


  "data": [
    {
      "name": "table",
      "values": [
        {"category": "A", "amount": 28},
        {"category": "B", "amount": 55},
        {"category": "C", "amount": 43},
        {"category": "D", "amount": 91},
        {"category": "E", "amount": 81},
        {"category": "F", "amount": 53},
        {"category": "G", "amount": 19},
        {"category": "H", "amount": 87}
      ]
    }
  ],

We have an array of data objects with fields named category (a string label) and amount (a number)

Data can be

loaded from the web using the url property (including JSON and CSV)
derived from a previously defined data set using the source property
left undefined and dynamically set when the visualization is constructed

Only one of the values, url or source properties can be defined

You can also modify data using transforms, like filtering, aggregation and layout operations.

Vega: scales

Scales map data to visual values like positions and colors (think ggplot2)

"scales": [
    {
      "name": "xscale",
      "type": "band",
      "domain": {"data": "table", "field": "category"},
      "range": "width",
      "padding": 0.05,
      "round": true
    },
    {
      "name": "yscale",
      "domain": {"data": "table", "field": "amount"},
      "nice": true,
      "range": "height"
    }
  ],

domain specifies the data that is being encoded in that scale
- Here we specify it dynamically from the data
- You can also use an array of values
By default, quantitative domains include 0. To disable, use "zero": false in the scale definition
padding puts space between bars
nice: true makes the scale domain more readable and human-friendly

The range settings of width and height are conveniences provided by Vega, and in this case map to the arrays defined by the size of the visualization.

Each scale needs a unique name attribute.

Vega: axes

"axes": [
 { "orient": "bottom", "scale": "xscale" },
 { "orient": "left", "scale": "yscale" }
],

You can further customize axes; see the axes documentation

Vega: Marks

 "marks": [
    {
      "type": "rect",
      "from": {"data":"table"},
      "encode": {
        "enter": {
          "x": {"scale": "xscale", "field": "category"},
          "width": {"scale": "xscale", "band": 1},
          "y": {"scale": "yscale", "field": "amount"},
          "y2": {"scale": "yscale", "value": 0}
        },
        "update": {
          "fill": {"value": "steelblue"}
        },
        "hover": {
          "fill": {"value": "red"}
        }
      }
    },

This provides the specification of the marks. There are different kinds of encoding sets within the encode property

enter specifies properties when a mark is first created
exit specifies properties when a mark is removed
update specifies updates
hover specifies properties upon mouse hover
y and y2 refer to the top and bottom of the bars, respectively.

Vega: signals

Signals are dynamic variables: expressions that are automatically re-evaluated when other signal values change or when input events occur

Each signal must have a unique name and an inital value

"signals": [
    {
      "name": "tooltip",
      "value": {},
      "on": [
        {"events": "rect:mouseover", "update": "datum"},
        {"events": "rect:mouseout",  "update": "{}"}
      ]
    }
  ],

tooltip changes in response to mouseover and mouseout events on rect marks

"marks": [
    ...,
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "bottom"},
          "fill": {"value": "#333"}
        },
        "update": {
          "x": {"scale": "xscale", "signal": "tooltip.category", "band": 0.5},
          "y": {"scale": "yscale", "signal": "tooltip.amount", "offset": -2},
          "text": {"signal": "tooltip.amount"},
          "fillOpacity": [
            {"test": "isNaN(tooltip.amount)", "value": 0},
            {"value": 1}
          ]
        }
      }
    }

Vega: transforms

There is a lot of granular specification possible in Vega

It’s not as granular as D3.js, in that there are some abstractions like loess and pivot and quantile

Developing in Vega

We’ve been using ojs chunks in a Quarto document to develop Vega graphics. This is a modern solution
Vega can be developed online using the Vega Editor
- Solutions can be transferred to Github quite easily
A great listing of how to specify axes and legends in Vega is available here

Moving from Vega to Vega-Lite

Vega-Lite

Vega-Lite is a high-level grammar of interactive graphics

It uses a declarative JSON syntax to specify visualizations for data analysis and presentation

Differences with Vega

Automatically produces components like axes, legends and scales using carefully designed rules
Meant for quick visualization authoring
Supports data transformations (aggregation, filtering, binning, sorting) and visual transformations (stacking, faceting)
More concise specification

Can you still use Vega?

Yes. Vega-Lite :: Vega as seaborn :: matplotlib. You can create graphics quickly and then drop down for more fine control.

Using Vega-Lite in ojs

We have to first load the Vega-Lite specification into our environment.

```{ojs}
//| echo: fenced
//| code-fold: false
import {vl} from "@vega/vega-lite-api-v5"
```

import {printTable} from '@uwdata/data-utilities'

Data for Vega-Lite

Data is assumed to be a tidy data frame with named data columns. After importing, it is stored as an array of JavaScript objects.

As in Vega, you can import data as a URL, or an array of objects

You can play with a variety of standard “book” data available in the vega-datasets repo. These can be accessed in OJS by data = require('vega-datasets@1'). You can also access these from Python using pip install vega_datasets and then importing the vega_dataset library

Data types

There are four basic data types:

Type	Description	Function
Nominal (N)	Categorical data	`fieldN`
Ordinal (O)	Ordinal data	`fieldO`
Quantitative (Q)	Quantitative data	`fieldQ`
Temporal (T)	Temporal data	`fieldN`

These are specified in the encoding steps to ensure the right kind of plotting is done.

Vega-Lite: API vs JSON

API
JSON

exdat = data['cars.json']()
vl.markCircle()
   .params(vl.selectInterval().bind('scales'))
   .encode(
   vl.x().fieldQ('Horsepower').scale({'domain': [75, 150]}),
   vl.y().fieldQ("Miles_per_Gallon").scale({'domain': [20,40]}),
   vl.size().fieldQ("Cylinders"),
)
.data(exdat).render()

embed({
  "$schema":"https://vega.github.io/schema/vega-lite/v4.json",
  "data": {"values": exdat},
  "mark": "circle",
  "params": [
    {
      "name": "name4",
      "bind": "scales",
      "select": {"type": "interval"}
    }
  ],
  "encoding": {
    "x": {
      "field":"Horsepower", "type": "quantitative",
      "scale": {"domain": [75,150]}
    },
    "y": {
      "field": "Miles_per_Gallon", "type": "quantitative",
      "scale": {"domain": [20,40]}
    },
    "size": {
      "field": "Cylinders", "type":"quantitative"
    }
  }
})

embed = require('vega-embed')

Typically we prefer the API rather than the JSON

Seattle temperatures

data = require('vega-datasets@1')
seattle_temps = data['seattle-weather.csv']()
printTable(seattle_temps.slice(0,5))

vl.markPoint()
   .data(seattle_temps)
   .encode(
      vl.x().fieldT('date')
         .axis({title: "Date", format: "%b %Y"}),
      vl.y().fieldQ('temp_max')
         .axis({title: "Maximum temperature (C)"})
   ) 
   .render()

Aggregation

vl.markPoint()
   .data(seattle_temps)
   .encode(
      vl.x().month('date')
         .axis({title: "Month", format: "%b"}), 
      vl.y().mean('temp_max')
         .scale({domain: [-5,40]})
         .axis({title: "Average Maximum Daily Temperature (C)"})
   )
   .render()

month extracts the month from the date
mean computes the average by month

Layering

```{ojs}
//| code-fold: false
//| output-location: column

line1= vl.markLine({color: "red"})
   .data(seattle_temps)
   .encode(
      vl.x().month('date'),
      vl.y().mean('temp_max')
   )

points1 vl.markCircle()
   .data(seattle_temps)
   .encode(
      vl.x().month('date').axis({title: "Month", format: "%b"}),
      vl.y().fieldQ('temp_max').axis({title:"Maximum temperature"})
      )

   
 vl.layer(line1, points1).render()
```

OJS Syntax Error (line 10, column 9)Unexpected token

A bit more complex

```{ojs}
//| output-location: column
//| code-fold: false
plot1 = vl.markPoint({filled: true})
   .encode(
      vl.color().fieldN("weather").title("Weather"),
      vl.size().fieldQ("precipitation").scale({domain: [-1,50], range:[10,500]}).title("Precipitation"),
      vl.order().fieldQ("precipitation").sort("descending"),
      vl.x().timeMD("date").axis({title: "Date", format: "%b"}),
      vl.y().fieldQ("temp_max").scale({domain: [-5,40]}).axis({title:"Max Daily Temp (C)"})
)
   .width(800)

plot2 = vl.markBar()
   .encode(
      vl.color().title("Weather"),
      vl.x().count(),
      vl.y().fieldN("weather").title("Weather")
).width(800)

vl.vconcat(plot1, plot2)
   .data(seattle_temps)
   .autosize({type:'fit-x', contains: 'padding'})
   .render()
```

plot1 = vl.markPoint({filled: true})
   .encode(
      vl.color().fieldN("weather").title("Weather"),
      vl.size().fieldQ("precipitation").scale({domain: [-1,50], range:[10,500]}).title("Precipitation"),
      vl.order().fieldQ("precipitation").sort("descending"),
      vl.x().timeMD("date").axis({title: "Date", format: "%b"}),
      vl.y().fieldQ("temp_max").scale({domain: [-5,40]}).axis({title:"Max Daily Temp (C)"})
)
   .width(800)

plot2 = vl.markBar()
   .encode(
      vl.color().title("Weather"),
      vl.x().count(),
      vl.y().fieldN("weather").title("Weather")
).width(800)

vl.vconcat(plot1, plot2)
   .data(seattle_temps)
   .autosize({type:'fit-x', contains: 'padding'})
   .render()

Interaction: brushing

brush1 = vl.selectInterval().encodings('x')
click1 = vl.selectPoint().encodings('color')

plot11 = vl.markPoint({filled: true})
   .encode(
      vl.color().value('lightgray')
         .if(brush, vl.color().fieldN('weather').title("Weather")),
      vl.size().fieldQ("precipitation").scale({domain: [-1,50], range:[10,500]}).title("Precipitation"),
      vl.order().fieldQ("precipitation").sort("descending"),
      vl.x().timeMD("date").axis({title: "Date", format: "%b"}),
      vl.y().fieldQ("temp_max").scale({domain: [-5,40]}).axis({title:"Max Daily Temp (C)"})
)
   .width(800)
   .height(300)
   .params(brush1)
   .transform(vl.filter(click1))

plot21 = vl.markBar()
   .encode(
      vl.color().if(click1, vl.color().fieldN('weather')).value('lightgray').title("Weather"),
      vl.x().count(),
      vl.y().fieldN("weather").title("Weather")
).width(800)
.params(click1)
.transform(vl.filter(brush1))

vl.vconcat(plot11, plot21)
   .data(seattle_temps)
   .autosize({type:'fit-x', contains: 'padding'})
   .render()

Vega-Lite: Slider

cars = data['cars.json']()
isOrigin = vl.selectPoint('isOrigin')
   .fields("Origin")
   .bind('legend') // bind to legend interactions
   
isYear = vl.selectPoint('isYear')
   .fields('Year').value(1970)
   .bind(vl.slider(1970, 1980, 1).name("Year"))

show=vl.and(isOrigin, isYear)

vl.markCircle()
   .data(cars)
   .transform(
   vl.calculate('year(datum.Year)').as("Year")
).params(isOrigin, isYear)
.encode(
   vl.x().fieldQ("Horsepower"),
   vl.y().fieldQ('Miles_per_Gallon'),
   vl.color().if(show, vl.color().fieldN('Origin')).value('grey'),
   vl.opacity().if(show, vl.value(1.0)).value(0.2)
).render()

Layers

weather = data['weather.csv']()
printTable(weather.slice(0,5))

vl.markArea({opacity: 0.3})
   .data(weather)
   .encode(
      vl.x().month('date'),
      vl.y().average('temp_max'),
      vl.y2().average('temp_min'),
      vl.color().fieldN('location')
).render()

Facets

vl.markBar()
   .data(weather)
   .transform(vl.filter('datum.location=="Seattle"'))
   .encode(
    vl.x().fieldQ('temp_max').bin(true).title('Temperature (°C)'),
      vl.y().count(),
      vl.color().fieldN('weather'),
      vl.column().fieldN('weather')
).width(150)
.height(150)
.render();

Concatenating graphs

base = vl.markLine()
    .data(weather)
    .encode(
      vl.x().month('date').title(null),
      vl.color().fieldN('location')
    )
    .width(240)
    .height(180);

temp = base.encode(vl.y().average('temp_max'));
 precip = base.encode(vl.y().average('precipitation'));
wind = base.encode(vl.y().average('wind'));

vl.hconcat(temp, precip, wind).render();

Vega-Lite and Altair

We’ve been looking at the Vega-Lite Javascript API, rather than the JSON specification.

This is very similar to what we will see in Altair

Altair

Click here!!

Linking plots

Why link visualizations

Most data is multidimensional

Hard to look at more than 3-5 dimensions in a single plot
Facets/trellis graphs allow us to see multiple views
- Cannot show which observations are aligned across the facets
We can look at multiple idioms and visualization types for the same data
- How do the same observations translate with respect to others across different visualizations

Interactive graphs to the rescue

link observations while looking at multiple viewpoints
Show related patterns across view points
Using controls to make simultaneous changes across multiple plots

Let’s start with a scatterplot matrix

Code

```{ojs}
datasets = require('vega-datasets@1')
cars = datasets['cars.json'].url

vl.markCircle()
    .data(cars)
    .encode(
      vl.x().fieldQ(vl.repeat('column')),
      vl.y().fieldQ(vl.repeat('row')),
      vl.color().fieldO('Cylinders'),
    )
    .width(140)
    .height(140)
    .repeat({
      column: ['Acceleration', 'Horsepower', 'Miles_per_Gallon'],
      row: ['Miles_per_Gallon', 'Horsepower', 'Acceleration']
    })
    .render();
```

datasets = require('vega-datasets@1')
cars1 = datasets['cars.json'].url

vl.markCircle()
    .data(cars1)
    .encode(
      vl.x().fieldQ(vl.repeat('column')),
      vl.y().fieldQ(vl.repeat('row')),
      vl.color().fieldO('Cylinders'),
    )
    .width(140)
    .height(140)
    .repeat({
      column: ['Acceleration', 'Horsepower', 'Miles_per_Gallon'],
      row: ['Miles_per_Gallon', 'Horsepower', 'Acceleration']
    })
    .render();

We can’t tell which observations are actually in which part of each subplot.

Adding more depth: selecting & linking

brush = vl.selectInterval()
    .resolve('global'); // resolve all selections to a single global instance
  
 legend = vl.selectPoint()
    .fields('Cylinders')
    .bind('legend'); // bind to interactions with the color legend
  
 brushAndLegend = vl.and(brush, legend);
  
 vl.markCircle()
    .data(cars)
    .params(brush, legend)
    .encode(
      vl.x().fieldQ(vl.repeat('column')),
      vl.y().fieldQ(vl.repeat('row')),
      vl.color().if(brushAndLegend, vl.fieldO('Cylinders')).value('grey'),
      vl.opacity().if(brushAndLegend, vl.value(0.8)).value(0.1)
    )
    .width(140)
    .height(140)
    .repeat({
      column: ['Acceleration', 'Horsepower', 'Miles_per_Gallon'],
      row: ['Miles_per_Gallon', 'Horsepower', 'Acceleration']
    })
    .render();

Code link

Changing views: linking different idioms

We can look at linking different idioms that are based on the same dataset

movies = datasets['movies.json']()
{
  const brush = vl.selectInterval()
    .encodings('x'); // limit selection to x-axis (year) values
  
  // dynamic query histogram
  const years = vl.markBar({width: 4})
    .data(movies)
    .params(brush)
    .encode(
      vl.x().year('Release_Date').title('Films by Release Year'),
      vl.y().count().title(null)
    )
    .width(600)
    .height(50);
  
  // ratings scatter plot
  const ratings = vl.markCircle()
    .data(movies)
    .encode(
      vl.x().fieldQ('Rotten_Tomatoes_Rating').axis({title: "Rotten Tomatoes rating"}),
      vl.y().fieldQ('IMDB_Rating').axis({title: "IMDB rating"}),
      vl.tooltip().fieldN('Title'),
      vl.opacity().if(brush, vl.value(0.75)).value(0.05)
    )
    .width(600)
    .height(400);

  return vl.vconcat(years, ratings).spacing(5).render();
}

Here, we can select a range of release years to see if the association between the two ratings changes or not.

Vega-Lite code

Code

import pandas as pd
import altair as alt

movies = "https://cdn.jsdelivr.net/npm/vega-datasets@1/data/movies.json"

brush = alt.selection_interval(
    encodings=['x'] # limit selection to x-axis (year) values
)

# dynamic query histogram
years = alt.Chart(movies).mark_bar().add_selection(
    brush
).encode(
    alt.X('year(Release_Date):T', title='Films by Release Year'),
    alt.Y('count():Q', title=None)
).properties(
    width=650,
    height=50
)

# scatter plot, modify opacity based on selection
ratings = alt.Chart(movies).mark_circle().encode(
    x='Rotten_Tomatoes_Rating:Q',
    y='IMDB_Rating:Q',
    tooltip='Title:N',
    opacity=alt.condition(brush, alt.value(0.75), alt.value(0.05))
).properties(
    width=650,
    height=400
)

alt.vconcat(years, ratings).properties(spacing=5)

Adding controls for filtering

Dynamic queries

A dynamic query:

represents a query graphically,
provides visible limits on the query range,
provides a graphical representation of the data and query result,
gives immediate feedback of the result after every query adjustment,
and allows novice users to begin working with little training.

The idea here is to make rapid exploration of the data to identify patterns. These can be achieved using tools like sliders, radio buttons, and menus

Menus

import {uniqueValid} from '@uwdata/data-utilities'

genres = uniqueValid(movies, d => d.Major_Genre)
mpaa = ['G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated']

selectGenre = vl.selectPoint("Select")
    .fields("Major_Genre")
    .init({Major_Genre: genres[0]})
    .bind(vl.menu(genres))
    
vl.markCircle()
    .data(movies)
    .params(selectGenre)
    .encode(
      vl.x().fieldQ('Rotten_Tomatoes_Rating'),
      vl.y().fieldQ('IMDB_Rating'),
      vl.tooltip().fieldN('Title'),
      vl.opacity().if(selectGenre, vl.value(0.75)).value(0.05)
    )
    .render();

Vega-Lite code

Code

df = pd.read_json(movies) # load movies data
genres = df['Major_Genre'].unique() # get unique field values
genres = list(filter(lambda d: d is not None, genres)) # filter out None values
genres.sort()

mpaa = ['G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated']

selectGenre = alt.selection_single(
    name='Select', # name the selection 'Select'
    fields=['Major_Genre'], # limit selection to the Major_Genre field
    init={'Major_Genre': genres[0]}, # use first genre entry as initial value
    bind=alt.binding_select(options=genres) # bind to a menu of unique genre values
)

alt.Chart(movies).mark_circle().add_selection(
    selectGenre
).encode(
    x='Rotten_Tomatoes_Rating:Q',
    y='IMDB_Rating:Q',
    tooltip='Title:N',
    opacity=alt.condition(selectGenre, alt.value(0.75), alt.value(0.05))
)

Radio buttons

{
  // point-value selection over [Major_Genre, MPAA_Rating] pairs
  // use specific hard-wired values as the initial selected values
  const selection = vl
    .selectPoint("Select")
    .fields("Major_Genre", "MPAA_Rating")
    .init({ Major_Genre: "Drama", MPAA_Rating: "R" })
    .bind({ Major_Genre: vl.menu(genres), MPAA_Rating: vl.radio(mpaa) });

  // scatter plot, modify opacity based on selection
  return vl
    .markCircle()
    .data(movies)
    .params(selection)
    .encode(
      vl.x().fieldQ("Rotten_Tomatoes_Rating"),
      vl.y().fieldQ("IMDB_Rating"),
      vl.tooltip().fieldN("Title"),
      vl.opacity().if(selection, vl.value(0.75)).value(0.05)
    )
    .render();
}

Vega-Lite code

Code

# single-value selection over [Major_Genre, MPAA_Rating] pairs
# use specific hard-wired values as the initial selected values
selection = alt.selection_single(
    name='Select',
    fields=['Major_Genre', 'MPAA_Rating'],
    init={'Major_Genre': 'Drama', 'MPAA_Rating': 'R'},
    bind={'Major_Genre': alt.binding_select(options=genres), 'MPAA_Rating': alt.binding_radio(options=mpaa)}
)
  
# scatter plot, modify opacity based on selection
alt.Chart(movies).mark_circle().add_selection(
    selection
).encode(
    x='Rotten_Tomatoes_Rating:Q',
    y='IMDB_Rating:Q',
    tooltip='Title:N',
    opacity=alt.condition(selection, alt.value(0.75), alt.value(0.05))
)

Lecture 6

Agenda and Goals for Today

Lecture

Lab

A static multi-viewpoint approach

Making sense in a cluttered visualization

Introducing Vega-Lite/Altair

Interactivity and multiple views

Hotels and multiple views

Hotels and multiple views

Hotels and multiple views

Hotels and multiple views

Re-arrange views to make sense …

and interpret

and interpret

Aspects of change

Multiple views and interactivity

Introducing Vega, Vega-Lite and Altair

Interactive plotting in Python

First, let’s re-visit the Grammar of Graphics

Declarative vs. Imperative

Imperative

Declarative

Today’s journey

Vega

Vega

Vega

Vega

Vega

Vega

A note on parsing the Vega JSON specification

Nodes

Edges

Vega

Visualization size

Vega: Data

Vega: scales

Vega: axes

Vega: Marks

Vega: signals

Vega: transforms

Developing in Vega

Moving from Vega to Vega-Lite

Vega-Lite

Differences with Vega

Can you still use Vega?

Using Vega-Lite in ojs

Data for Vega-Lite

Data types

Vega-Lite: API vs JSON

Seattle temperatures

Aggregation

Layering

A bit more complex

Interaction: brushing

Vega-Lite: Slider

Layers

Facets

Concatenating graphs

Vega-Lite and Altair

Altair

Linking plots

Why link visualizations

Let’s start with a scatterplot matrix

Code

Adding more depth: selecting & linking

Changing views: linking different idioms

Adding controls for filtering

Dynamic queries

Menus

Radio buttons

Plotly