Lecture 2

Visual encodings, visualization integrity, color theory, elements of a theme

Abhijit Dasgupta, Jeff Jacobs, Anderson Monken, and Marck Vaisman

Georgetown University

Spring 2024

Agenda and Goals for Today

Lecture

Combining some theoretical and practical advice:

Some background research
Encodings
Visual Integrity
Color
Themes

Lab

Create a theme to use throughout the course

There are several sets of principles for good visualization design

Nathan Yau

Adjustment rules

Explain the encodings
Provide context
Focus on readability
Develop aesthetics

7 basic rules for making charts and graphs

Check the data
Explain encodings
Label axes
Include units
Keep your geometry in check
Include your sources
Consider your audience

Ed Tufte

Integrity principles

Show data variation, not design variation
Do not use graphics to quote data out of context
Use clear, detailed, thorough labeling
Representation of numbers should be directly proportional to numerical quantities
Don’t use more dimensions than the data require

Design principles

Show comparisons
Show causality
Use multivariate data
Completely integrate modes (like text, images, numbers)
Establish credibility
Focus on content

Noah Illinsky

Four pillars of visualization

A succesful visualization:

Has clear purpose (why this visualization)
Includes (only) the relevant content (what to visualize)
Uses appropriate structure (how to visualize it)
Has useful formatting (everything else)

Readers who land on your visualization may not have the same luxury of developing and answering questions like you did.

Your audience wants to know the story, conclusions, and/or results; they don’t want to analyze the data - that’s your job!

You MUST care about your different audiences

Visualization for Analysis

visualizations for you and your team
team and audience knows context
tool for understanding datasets
iterate quickly to develop insights
rough drafts
can make changes later

Visualization for Presentation

audience external to you and team
content is likely new and audience has no context
designed to communicate useful information
takes significant more time
publication ready

Nathan Yau’s four ways to adjust for these differences

Explain the encodings
Provide context
Focus on readability
Develop aesthetics

Explain the encodings

For example: what scale are you using? What does that color represent? Is this normal?

It’s better to err on the side of too much explanation than it is too little. At least with the former, people can gloss over the details if they’re already familiar. They can still read the chart. With the latter, people who are unfamiliar with the visual encodings will get stuck.

Provide context

When readers can decode the shapes, colors and geometries on your chart, you are more than half way there to producing an awesome chart.

However, readers also need to understand the context of the data.

Another context example

Improve readability

Charts should read like text. At the most basic level, it should be obvious what the chart is about and how to interpret it.

Develop aesthetics

Default setting in the tools are generic and designed in such a way that they would work with many datasets and visualization types
You can (and should) develop aesthetics to make your charts less ugly

Note

In this context, aesthetics means a visual style. Do not confuse this with the aes() call in ggplot2.

Using these guidelines

They’re more continuous than absolute. Your charts may need more or less explanations, more or less context, etc.
Depends on your audience and the purpose behind your chart:
- If your audience is a small group who has the same background as you, then you might not need to provide as much context for the data you show.
- If your audience is already excited about a dataset, then you probably don’t need to make it too flashy.
- If you make charts for a research paper, there are probably publisher guidelines that you need to follow, which limits what you can do (sometimes a good thing).
Think of the above adjustments as continuous knobs that you can turn up or down. The more charts you make, the better you’ll get at deciding how much to turn.

Some foundational research

Pre-attentive processing: the ability of the low-level human visual system to effortlessly identify certain basic visual properties.

Example: let’s help our friend Homer find a donut that looks different!

What stands out? This is pre-attentive processing in action!

Tamara Munzner

Computer scientist, information visualization expert, and professor at University of British Columbia.

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

Characterize the problems and data of a particular domain
Who are the target users?

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

Characterize the problems and data of a particular domain
Who are the target users?

Abstraction

Translate from the domain specifics to the visualization vocabulary
- What is shown? data abstraction
- Why is the user looking at it? task abstraction

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

Characterize the problems and data of a particular domain
Who are the target users?

Abstraction

Translate from the domain specifics to the visualization vocabulary
- What is shown? data abstraction
- Why is the user looking at it? {{< solid arrow-right >}} task abstraction

Idiom

How is it shown?
- Visual encoding idiom how to draw
- Interaction idiom how to manipulate

Tamara proposed a nested model analysis framework for visualization

Four levels, three questions:

Domain

Characterize the problems and data of a particular domain
Who are the target users?

Abstraction

Translate from the domain specifics to the visualization vocabulary
- What is shown? data abstraction
- Why is the user looking at it? task abstraction

Idiom

How is it shown?
- Visual encoding idiom how to draw
- Interaction idiom how to manipulate

Algorithm

Efficient computation

The What: Abstracting the Data

Why abstract the data?

Different attribute types different representations
Different dataset types different idioms available

What do you need to abstract?

Dataset type: (e.g. table, network, temporal, etc.)
Attribute types: (e.g. categorical, ordinal, quantitative)
Ordering direction: (e.g. sequential, diverging, cyclical)
Data availability: (e.g. dynamic, static)

Types of datasets

temporal!

Tables

Typically a flat Tidy Data table (by analysis unit)
Observations are rows, one item per row
Attributes are columns
May or may not have an identifier

Types of attributes

Categorical

No order
Example: names, countries, types
Must be represented with visual channels that don’t convey order

Ordered

Ordinal

Has implicit order
But, you can’t do arithmetic
Can be numerical (but should be treated as categorical)
Example: t-shirt sizes, grade in school, rankings

Quantitative

Also ordered
You can do arithmetic
Can be divergent or sequential
Example: age, temperature, earnings

Types of attributes

Sequential

There is an infinite range with a clear minimum
You can perform arithmetic
Example: age, number of goals, price
Must be represented with visual channels that do convey order

Diverging

There is a middle point
And two opposite directions
Many times the middle point is not zero
Example: temperature, earnings, political affiliation index

Cyclic

There is a cycle in the values
Starting point may or may not be obvious
Can be represented with cyclical channels
Example: days of the week, hours in the day

The Why? (more on that next week)

Reminder of model

The How?:

Reminder of model

Marks and Channels

Marks are geometric primitives

Channels (encodings) control the appearance of marks

Channel (encoding) Types

Mark and Channel Examples

Points

Zero-dimensional
Convey position only
Additionally, can be size and shape coded

Lines

One-dimensional
Convey position and length
Can only be width coded

Areas

Two dimensional
Are fully constrained

Mackinlay ’86

Another encoding guide (Noah Illinsky)

Although encoding is often undertaken without much intention or deeper consideration, it has significant impact on the ability of the visualization to communicate knowledge accurately and efficiently.

Examples of Visual and Integrity Issues with encodings

Position (example 1)

Position allows you to compare values based on where they are placed with reference to a coordinate system.

Considerations

Be aware of the scales you are using (linear vs logarithmic)

The scale changes the interpretation of distance
It can also change the perceived patterns

Position (example 2)

Position allows you to compare values based on where they are placed with reference to a coordinate system.

Considerations

Avoid overplotting since many points can occupy the same space and obscure one another

Solutions

Use transparency so that overlapping points make darker areas
jitter (add noise so points no longer are on top of each other)
Use binning to show aggregate data per pixel

Length

Length is most commonly used in the context of bar charts. The longer a bar is, the greater the value. Don’t truncate bar charts, use length in its entirety!

Angle

Angles range from 0 to 360 degrees in a circle.

Considerations

Angles are most associated with pie charts. Pie chart is made up of parts that make up a whole.
Don’t use too many categories (bar chart is better)
The sum of all percentages should equal 100%!

Don’t even think about this!

Slope

Slope is similar to angle. Line charts are the most common use of slope to encode data.

Considerations

Slope magnitude: steeper = greater change, flatter = lesser change
The aspect ratio
Visual change should match the context of the change

Cleveland, McGIll & McGill (1988) suggested that the average slope in a line chart should be \(45^o\), in order to make neutral comparisons between lines

This is still a good rule of thumb

Area

Like length, area can be used to represent data with size, but with two dimensions instead of one.

Considerations

While the encoding might not be as precise from a visual perception perspective, area can provide a more intuitive, less abstract view for some types of data
Make sure you scale by area, not edge (remember, area gets squared per unit increase)
- This means you should encode the length of a side as \(\sqrt{x}\)

Volume

Volume can used in the same way as area but has one more dimension.

Considerations

Make sure you scale by volume, not edge (remember, volume gets cubed per unit increase)
- This means you would encode the side of a “box” as \(x^{1/3}\)

For 3-D encodings, you need to take the volume as proportional to the data

Color

Color theory 101

Color encoding

Color as a visual encoding can be split into two categories: hue and saturation. Hue is what most people refer to as color (red, green, blue, etc.) Saturation is the amount of hue in a color.

Qualitative: every color represents a distinct attribute (category)
Sequential: color represents a range (saturation) from low to high (or vice-versa)
Diverging: multiple hues represent a point of inflection of the data

Other built-in color palettes are usually better than the defaults

Most of these palettes are available to both ggplot2 and matplotlib. For R, you may have to load packages like RColorBrewer or viridis.

Color can help provide context

However, working correctly with color can be hard!

Color is not sortable

The incredibly challenging task of sorting colours

Sorting colors in JavaScript

Color can convey implicit meaning!

Positive and negative association (emotional and cultural)

Color	Positive Keywords	Negative Keywords
Blue	Life, survival, calm, cleansing, protection, divinity	Sadness, death, mourning
Red	Excitement, love, high fashion, glamour, strength, power, luck, prosperity	Danger, warning, death, aggression, mourning, communism
White	Purity, simplicity, innocence, weddings, sacred, sacrifice, equality	Death, bad luck, cowardice, surrender, cycle of death and rebirth
Black	Elegance, luxury, masculinity, maturity, age	Bad luck, death
Green	Environmentally friendly, good luck, nature, national color	Infidelity, jealousy, illness
Orange	Safety, sacred, fertility, love, health, happiness, bravery, innovation	Mourning
Purple	Magic, mystery, royalty, religious faith, ambiguity	Death, mourning
Pink	Femininity, love, romance, birth, tenderness, mentally stimulating, trust, architecture	Foreign color

Consider color blindness

Consider printing

Most of the time your visualization will be displayed in full color. However, you may need to print sometimes and not have a color printer. Printed color reproduction may not be faithful to screen, but it’s another issue when printing in greyscale.

Common color pitfalls

Encoding too much information or irrelevant information
Using nonmonotonic colors for non-categorical data values
Failure to design for color vision deficiency
Not creating associations with color
Not using contrasting colors to contrast information
Not making the important information stand out
Using too many colors

Tools and guides to help choose colors for your theme

The Grammar of Graphics

William S. Cleveland, in his 1994 book The Elements of Graphing Data, lists the “basic elements of graph construction” as scales, captions, plotting symbols, reference lines, keys, labels, panels, and tick marks.

In The Grammar of Graphics, published in 2005, Leland Wilkinson built off the work by Bertin and more formally defined the components of a graphic:

Statistical graphic specifications are expressed in six statements:

Statement	Description
DATA	a set of data operations that create variables from datasets
TRANS	variable transformation (e.g. rank)
SCALE	scale transformations (e.g. log)
COORD	a coordinate system (e.g. polar)>
ELEMENT	graphs (e.g. points) and their aesthetic attributes (e.g. color)
GUIDE	one or more guides (axes, legends, etc.)

Hadley Wickham implemented Wilkinson’s grammar in R with the popular ggplot2 package.

https://slides.com/karlho/datavisualization_grammarofgraphics#/6/0/4

Lecture 2

Agenda and Goals for Today

Lecture

Lab

There are several sets of principles for good visualization design

Nathan Yau

Ed Tufte

Noah Illinsky

You MUST care about your different audiences

Visualization for Analysis

Visualization for Presentation

Nathan Yau’s four ways to adjust for these differences

Explain the encodings

Provide context

Another context example

Improve readability

Develop aesthetics

Using these guidelines

Some foundational research

Tamara Munzner

Tamara proposed a nested model analysis framework for visualization

Domain

Tamara proposed a nested model analysis framework for visualization

Domain

Abstraction

Tamara proposed a nested model analysis framework for visualization

Domain

Abstraction

Idiom

Tamara proposed a nested model analysis framework for visualization

Domain

Abstraction

Idiom

Algorithm

The What: Abstracting the Data

Why abstract the data?

What do you need to abstract?

Types of datasets

Tables

Types of attributes

Categorical

Ordered

Ordinal

Quantitative

Types of attributes

Sequential

Diverging

Cyclic

The Why? (more on that next week)

The How?:

Marks and Channels

Marks are geometric primitives

Channels (encodings) control the appearance of marks

Channel (encoding) Types

Mark and Channel Examples

Points

Lines

Areas

Mackinlay ’86

Another encoding guide (Noah Illinsky)

Examples of Visual and Integrity Issues with encodings

Position (example 1)

Considerations

Position (example 2)

Considerations

Solutions

Length

Angle

Considerations

Don’t even think about this!

Slope

Considerations

Area

Considerations

Volume

Considerations

Links

Color

Color theory 101

Color encoding