Visual encodings, visualization integrity, color theory, elements of a theme
Georgetown University
Spring 2024
Combining some theoretical and practical advice:
Adjustment rules
7 basic rules for making charts and graphs
Integrity principles
Design principles
Four pillars of visualization
A succesful visualization:
Readers who land on your visualization may not have the same luxury of developing and answering questions like you did.
Your audience wants to know the story, conclusions, and/or results; they don’t want to analyze the data - that’s your job!
For example: what scale are you using? What does that color represent? Is this normal?
It’s better to err on the side of too much explanation than it is too little. At least with the former, people can gloss over the details if they’re already familiar. They can still read the chart. With the latter, people who are unfamiliar with the visual encodings will get stuck.
When readers can decode the shapes, colors and geometries on your chart, you are more than half way there to producing an awesome chart.
However, readers also need to understand the context of the data.
Charts should read like text. At the most basic level, it should be obvious what the chart is about and how to interpret it.
Default setting in the tools are generic and designed in such a way that they would work with many datasets and visualization types
You can (and should) develop aesthetics to make your charts less ugly
Note
In this context, aesthetics means a visual style. Do not confuse this with the aes()
call in ggplot2
.
They’re more continuous than absolute. Your charts may need more or less explanations, more or less context, etc.
Depends on your audience and the purpose behind your chart:
Think of the above adjustments as continuous knobs that you can turn up or down. The more charts you make, the better you’ll get at deciding how much to turn.
Pre-attentive processing: the ability of the low-level human visual system to effortlessly identify certain basic visual properties.
Example: let’s help our friend Homer find a donut that looks different!
Computer scientist, information visualization expert, and professor at University of British Columbia.
Four levels, three questions:
Four levels, three questions:
Four levels, three questions:
Four levels, three questions:
Reminder of model
Reminder of model
Although encoding is often undertaken without much intention or deeper consideration, it has significant impact on the ability of the visualization to communicate knowledge accurately and efficiently.
Position allows you to compare values based on where they are placed with reference to a coordinate system.
Position allows you to compare values based on where they are placed with reference to a coordinate system.
Length is most commonly used in the context of bar charts. The longer a bar is, the greater the value. Don’t truncate bar charts, use length in its entirety!
Angles range from 0 to 360 degrees in a circle.
Slope is similar to angle. Line charts are the most common use of slope to encode data.
Cleveland, McGIll & McGill (1988) suggested that the average slope in a line chart should be \(45^o\), in order to make neutral comparisons between lines
This is still a good rule of thumb
Like length, area can be used to represent data with size, but with two dimensions instead of one.
Volume can used in the same way as area but has one more dimension.
Color as a visual encoding can be split into two categories: hue and saturation. Hue is what most people refer to as color (red, green, blue, etc.) Saturation is the amount of hue in a color.
Most of these palettes are available to both ggplot2
and matplotlib
. For R, you may have to load packages like RColorBrewer
or viridis
.
The incredibly challenging task of sorting colours
Color | Positive Keywords | Negative Keywords |
---|---|---|
Blue | Life, survival, calm, cleansing, protection, divinity | Sadness, death, mourning |
Red | Excitement, love, high fashion, glamour, strength, power, luck, prosperity | Danger, warning, death, aggression, mourning, communism |
White | Purity, simplicity, innocence, weddings, sacred, sacrifice, equality | Death, bad luck, cowardice, surrender, cycle of death and rebirth |
Black | Elegance, luxury, masculinity, maturity, age | Bad luck, death |
Green | Environmentally friendly, good luck, nature, national color | Infidelity, jealousy, illness |
Orange | Safety, sacred, fertility, love, health, happiness, bravery, innovation | Mourning |
Purple | Magic, mystery, royalty, religious faith, ambiguity | Death, mourning |
Pink | Femininity, love, romance, birth, tenderness, mentally stimulating, trust, architecture | Foreign color |
Most of the time your visualization will be displayed in full color. However, you may need to print sometimes and not have a color printer. Printed color reproduction may not be faithful to screen, but it’s another issue when printing in greyscale.
William S. Cleveland, in his 1994 book The Elements of Graphing Data, lists the “basic elements of graph construction” as scales, captions, plotting symbols, reference lines, keys, labels, panels, and tick marks.
In The Grammar of Graphics, published in 2005, Leland Wilkinson built off the work by Bertin and more formally defined the components of a graphic:
Statistical graphic specifications are expressed in six statements:
Statement | Description |
---|---|
DATA | a set of data operations that create variables from datasets |
TRANS | variable transformation (e.g. rank) |
SCALE | scale transformations (e.g. log) |
COORD | a coordinate system (e.g. polar)> |
ELEMENT | graphs (e.g. points) and their aesthetic attributes (e.g. color) |
GUIDE | one or more guides (axes, legends, etc.) |
Hadley Wickham implemented Wilkinson’s grammar in R with the popular ggplot2
package.
https://slides.com/karlho/datavisualization_grammarofgraphics#/6/0/4
DSAN 5200 | Spring 2024 | https://gu-dsan.github.io/5200-spring-2024/