Lecture 2

Visual encodings, visualization integrity, color theory, elements of a theme

Abhijit Dasgupta, Jeff Jacobs, Anderson Monken, and Marck Vaisman

Georgetown University

Spring 2024

Agenda and Goals for Today

Lecture

Combining some theoretical and practical advice:

  • Some background research
  • Encodings
  • Visual Integrity
  • Color
  • Themes

Lab

  • Create a theme to use throughout the course

There are several sets of principles for good visualization design

Nathan Yau

Adjustment rules

  • Explain the encodings
  • Provide context
  • Focus on readability
  • Develop aesthetics

7 basic rules for making charts and graphs

  1. Check the data
  2. Explain encodings
  3. Label axes
  4. Include units
  5. Keep your geometry in check
  6. Include your sources
  7. Consider your audience

Ed Tufte

Integrity principles

  • Show data variation, not design variation
  • Do not use graphics to quote data out of context
  • Use clear, detailed, thorough labeling
  • Representation of numbers should be directly proportional to numerical quantities
  • Don’t use more dimensions than the data require

Design principles

  • Show comparisons
  • Show causality
  • Use multivariate data
  • Completely integrate modes (like text, images, numbers)
  • Establish credibility
  • Focus on content

Noah Illinsky

Four pillars of visualization

A succesful visualization:

  1. Has clear purpose (why this visualization)
  2. Includes (only) the relevant content (what to visualize)
  3. Uses appropriate structure (how to visualize it)
  4. Has useful formatting (everything else)

Readers who land on your visualization may not have the same luxury of developing and answering questions like you did.

Your audience wants to know the story, conclusions, and/or results; they don’t want to analyze the data - that’s your job!

You MUST care about your different audiences

Visualization for Analysis

  • visualizations for you and your team
  • team and audience knows context
  • tool for understanding datasets
  • iterate quickly to develop insights
  • rough drafts
  • can make changes later

Visualization for Presentation

  • audience external to you and team
  • content is likely new and audience has no context
  • designed to communicate useful information
  • takes significant more time
  • publication ready

Nathan Yau’s four ways to adjust for these differences

  1. Explain the encodings
  2. Provide context
  3. Focus on readability
  4. Develop aesthetics

Explain the encodings

For example: what scale are you using? What does that color represent? Is this normal?

It’s better to err on the side of too much explanation than it is too little. At least with the former, people can gloss over the details if they’re already familiar. They can still read the chart. With the latter, people who are unfamiliar with the visual encodings will get stuck.

Provide context

When readers can decode the shapes, colors and geometries on your chart, you are more than half way there to producing an awesome chart.

However, readers also need to understand the context of the data.

Another context example

Improve readability

Charts should read like text. At the most basic level, it should be obvious what the chart is about and how to interpret it.

Develop aesthetics

  • Default setting in the tools are generic and designed in such a way that they would work with many datasets and visualization types

  • You can (and should) develop aesthetics to make your charts less ugly

Note

In this context, aesthetics means a visual style. Do not confuse this with the aes() call in ggplot2.

Using these guidelines

  • They’re more continuous than absolute. Your charts may need more or less explanations, more or less context, etc.

  • Depends on your audience and the purpose behind your chart:

    • If your audience is a small group who has the same background as you, then you might not need to provide as much context for the data you show.
    • If your audience is already excited about a dataset, then you probably don’t need to make it too flashy.
    • If you make charts for a research paper, there are probably publisher guidelines that you need to follow, which limits what you can do (sometimes a good thing).
  • Think of the above adjustments as continuous knobs that you can turn up or down. The more charts you make, the better you’ll get at deciding how much to turn.

Some foundational research

Pre-attentive processing: the ability of the low-level human visual system to effortlessly identify certain basic visual properties.

Example: let’s help our friend Homer find a donut that looks different!

What stands out? This is pre-attentive processing in action!

Tamara Munzner

Computer scientist, information visualization expert, and professor at University of British Columbia.

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

  • Characterize the problems and data of a particular domain
  • Who are the target users?

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

  • Characterize the problems and data of a particular domain
  • Who are the target users?

Abstraction

  • Translate from the domain specifics to the visualization vocabulary
    • What is shown? data abstraction
    • Why is the user looking at it? task abstraction

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

  • Characterize the problems and data of a particular domain
  • Who are the target users?

Abstraction

  • Translate from the domain specifics to the visualization vocabulary
    • What is shown? data abstraction
    • Why is the user looking at it? {{< solid arrow-right >}} task abstraction

Idiom

  • How is it shown?
    • Visual encoding idiom how to draw
    • Interaction idiom how to manipulate

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

  • Characterize the problems and data of a particular domain
  • Who are the target users?

Abstraction

  • Translate from the domain specifics to the visualization vocabulary
    • What is shown? data abstraction
    • Why is the user looking at it? task abstraction

Idiom

  • How is it shown?
    • Visual encoding idiom how to draw
    • Interaction idiom how to manipulate

Algorithm

  • Efficient computation

The What: Abstracting the Data

Why abstract the data?

  • Different attribute types different representations
  • Different dataset types different idioms available

What do you need to abstract?

  • Dataset type: (e.g. table, network, temporal, etc.)
  • Attribute types: (e.g. categorical, ordinal, quantitative)
  • Ordering direction: (e.g. sequential, diverging, cyclical)
  • Data availability: (e.g. dynamic, static)

Types of datasets

temporal!

Tables

  • Typically a flat Tidy Data table (by analysis unit)
  • Observations are rows, one item per row
  • Attributes are columns
  • May or may not have an identifier

Types of attributes

Categorical

  • No order
  • Example: names, countries, types
  • Must be represented with visual channels that don’t convey order

Ordered

Ordinal

  • Has implicit order
  • But, you can’t do arithmetic
  • Can be numerical (but should be treated as categorical)
  • Example: t-shirt sizes, grade in school, rankings

Quantitative

  • Also ordered
  • You can do arithmetic
  • Can be divergent or sequential
  • Example: age, temperature, earnings

Types of attributes

Sequential

  • There is an infinite range with a clear minimum
  • You can perform arithmetic
  • Example: age, number of goals, price
  • Must be represented with visual channels that do convey order

Diverging

  • There is a middle point
  • And two opposite directions
  • Many times the middle point is not zero
  • Example: temperature, earnings, political affiliation index

Cyclic

  • There is a cycle in the values
  • Starting point may or may not be obvious
  • Can be represented with cyclical channels
  • Example: days of the week, hours in the day

The Why? (more on that next week)

Reminder of model

The How?:

Reminder of model

Marks and Channels

Marks are geometric primitives

Channels (encodings) control the appearance of marks

Channel (encoding) Types

Mark and Channel Examples

Points

  • Zero-dimensional
  • Convey position only
  • Additionally, can be size and shape coded

Lines

  • One-dimensional
  • Convey position and length
  • Can only be width coded

Areas

  • Two dimensional
  • Are fully constrained

Mackinlay ’86

Another encoding guide (Noah Illinsky)

Although encoding is often undertaken without much intention or deeper consideration, it has significant impact on the ability of the visualization to communicate knowledge accurately and efficiently.

Examples of Visual and Integrity Issues with encodings

Position (example 1)

Position allows you to compare values based on where they are placed with reference to a coordinate system.

Considerations

Be aware of the scales you are using (linear vs logarithmic)

  • The scale changes the interpretation of distance
  • It can also change the perceived patterns

Position (example 2)

Position allows you to compare values based on where they are placed with reference to a coordinate system.

Considerations

  • Avoid overplotting since many points can occupy the same space and obscure one another

Solutions

  • Use transparency so that overlapping points make darker areas
  • jitter (add noise so points no longer are on top of each other)
  • Use binning to show aggregate data per pixel

Length

Length is most commonly used in the context of bar charts. The longer a bar is, the greater the value. Don’t truncate bar charts, use length in its entirety!

Angle

Angles range from 0 to 360 degrees in a circle.

Considerations

  • Angles are most associated with pie charts. Pie chart is made up of parts that make up a whole.
  • Don’t use too many categories (bar chart is better)
  • The sum of all percentages should equal 100%!

Don’t even think about this!

Slope

Slope is similar to angle. Line charts are the most common use of slope to encode data.

Considerations

  • Slope magnitude: steeper = greater change, flatter = lesser change
  • The aspect ratio
  • Visual change should match the context of the change

Cleveland, McGIll & McGill (1988) suggested that the average slope in a line chart should be \(45^o\), in order to make neutral comparisons between lines

This is still a good rule of thumb

Area

Like length, area can be used to represent data with size, but with two dimensions instead of one.

Considerations

  • While the encoding might not be as precise from a visual perception perspective, area can provide a more intuitive, less abstract view for some types of data
  • Make sure you scale by area, not edge (remember, area gets squared per unit increase)
    • This means you should encode the length of a side as \(\sqrt{x}\)

Volume

Volume can used in the same way as area but has one more dimension.

Considerations

  • Make sure you scale by volume, not edge (remember, volume gets cubed per unit increase)
    • This means you would encode the side of a “box” as \(x^{1/3}\)

For 3-D encodings, you need to take the volume as proportional to the data

Color

Color theory 101

Color encoding

Color as a visual encoding can be split into two categories: hue and saturation. Hue is what most people refer to as color (red, green, blue, etc.) Saturation is the amount of hue in a color.

  • Qualitative: every color represents a distinct attribute (category)
  • Sequential: color represents a range (saturation) from low to high (or vice-versa)
  • Diverging: multiple hues represent a point of inflection of the data

Other built-in color palettes are usually better than the defaults

Most of these palettes are available to both ggplot2 and matplotlib. For R, you may have to load packages like RColorBrewer or viridis.

Color can help provide context

However, working correctly with color can be hard!

Color is not sortable

The incredibly challenging task of sorting colours

Sorting colors in JavaScript

Color can convey implicit meaning!

Positive and negative association (emotional and cultural)

Color Positive Keywords Negative Keywords
Blue Life, survival, calm, cleansing, protection, divinity Sadness, death, mourning
Red Excitement, love, high fashion, glamour, strength, power, luck, prosperity Danger, warning, death, aggression, mourning, communism
White Purity, simplicity, innocence, weddings, sacred, sacrifice, equality Death, bad luck, cowardice, surrender, cycle of death and rebirth
Black Elegance, luxury, masculinity, maturity, age Bad luck, death
Green Environmentally friendly, good luck, nature, national color Infidelity, jealousy, illness
Orange Safety, sacred, fertility, love, health, happiness, bravery, innovation Mourning
Purple Magic, mystery, royalty, religious faith, ambiguity Death, mourning
Pink Femininity, love, romance, birth, tenderness, mentally stimulating, trust, architecture Foreign color

Consider color blindness

Consider printing

Most of the time your visualization will be displayed in full color. However, you may need to print sometimes and not have a color printer. Printed color reproduction may not be faithful to screen, but it’s another issue when printing in greyscale.

Common color pitfalls

  • Encoding too much information or irrelevant information
  • Using nonmonotonic colors for non-categorical data values
  • Failure to design for color vision deficiency
  • Not creating associations with color
  • Not using contrasting colors to contrast information
  • Not making the important information stand out
  • Using too many colors

Tools and guides to help choose colors for your theme

The Grammar of Graphics

William S. Cleveland, in his 1994 book The Elements of Graphing Data, lists the “basic elements of graph construction” as scales, captions, plotting symbols, reference lines, keys, labels, panels, and tick marks.

In The Grammar of Graphics, published in 2005, Leland Wilkinson built off the work by Bertin and more formally defined the components of a graphic:

Statistical graphic specifications are expressed in six statements:

Statement Description
DATA a set of data operations that create variables from datasets
TRANS variable transformation (e.g. rank)
SCALE scale transformations (e.g. log)
COORD a coordinate system (e.g. polar)>
ELEMENT graphs (e.g. points) and their aesthetic attributes (e.g. color)
GUIDE one or more guides (axes, legends, etc.)

Hadley Wickham implemented Wilkinson’s grammar in R with the popular ggplot2 package.

https://slides.com/karlho/datavisualization_grammarofgraphics#/6/0/4

Lab