Overview first, zoom & filter, then details on demand
– Ben Schneiderman, 1996
Interactive graphics allows you to add dimensions to your graph, while keeping the information organized and accessible
Of course you can still implement all the graphical principles we’ve already learned for making good visual encodings of data
Colors
Shapes/markers
Ink ratio
Size
Tooltips
On/off mechanisms
Panels (of time, for example)
Facets
Control mechanisms (buttons, menus, sliders)
Kinds of interactions
Scroll and pan
Zoom
Open and close
Sort and re-arrange
Search and filter
Jennifer Tidwell
One can also consider increasing dimensions through interaction
Time
Location
Meta-data
Code
import plotly.express as pximport numpy as npdf = px.data.gapminder().query("year == 2007")df["world"] ="world"# in order to have a single root nodefig = px.treemap(df, path=['world', 'continent', 'country'], values='pop', color='lifeExp', color_continuous_scale='RdBu', color_continuous_midpoint=np.average(df['lifeExp'], weights=df['pop']), title ='Treemap of life expectancy in the 2007 Gapminder dataset', labels = {'world':'World', 'lifeExp':'Life expectancy','gdpPercap': 'GDP ($)'} )fig.update_traces(customdata=df,hovertemplate="Life Expectancy: %{color:.1f}<br>Population: %{customdata[4]:,}<br>GDP($): %{customdata[5]:.0f}");fig.update_layout(width=1000, height=650);fig.show()
Where do we see the advantages?
Being able to look at complex data in a targeted manner
Being able to contextualize complex data
Being able to clearly see patterns over time or over geographies
Being able to look at the full data while concentrating on a part
Toolsets
Web technologies to the fore
HTML
The markup language used to structure web content, e.g. paragraphs, headings, and data tables, or embedding images and videos in the page.
CSS
The language of style rules for customizing our HTML content, e.g. setting background colors, fonts, and laying content.
Javascript
The scripting language that enables programmatic modification of content, control multimedia, animate images, and pretty much everything else.
Static content (no interactivity)
HTML + CSS: Formatting and theming, but no user feedback or updating
Dynamic content (interactivity)
Javascript (JS) allows the browser to programmatically update the HTML content
Javascript
HTML can be dynamically and programmatically updated based on its Document Object Model (DOM)
Javascript (JS) can programmatically modify different components of the document
create/add
remove/delete
modify the content (HTML) or theme/look (CSS)
JS runs after the webpage is loaded and facilitates interactivity
Almost all the advanced visualization libraries we’ll describe in this class are JS libraries, that create dynamic data visualizations in the web browser
A quick word about the HTML DOM
The DOM
The Document Object Model (DOM) is the programming interface (API) to represent and interact with an HTML (or XML) document.
The fundamental building block of HTML is the element
This is defined by a start tag, some content, and an end tag
It is a form of text markup that can be translated into visual elements by the web browser
<p class="foo"> This is a paragraph </p>
<p is the start tag
class="foo" is an example of an attribute: value pair
</p> is the end tag
Anything between the start and end tags is the content
The DOM
The DOM represents the HTML document as a tree of nodes, where each node represents a part of the structure and content of the document
<!doctype html>
<html lang="en">
<head>
<title>My blog</title>
<meta charset="utf-8">
<script src="blog.js"></script>
</head>
<body>
<h1>My blog</h1>
<div id="entry1">
<h2>Great day bird watching</h2>
<p>
Today I saw three ducks!
I named them
Huey, Louie, and Dewey.
</p>
<p>
I took a couple of photos ...
</p>
</div>
</body>
</html>
The DOM
The DOM serves as the API (Programming Interface) for Javascript
JavaScript can add/change/remove HTML elements
JavaScript can add/change/remove HTML attributes
JavaScript can add/change/remove CSS styles
JavaScript can react to HTML events
JavaScript can add/change/remove HTML events
The DOM
In Javascript, you can
Finding HTML elements by id
document.getElementByID("intro") finds elements with id="intro"
Finding HTML elements by tag name
document.getElementsByTagName("p") returns a list of all <p> elements
Finding HTML elements by class name
document.getElementsByClassName("intro") returns a list of all elements with class="intro"
Finding HTML elements by CSS selectors
document.querySelectorAll("p.intro") returns a list of all <p> elements with class="intro"
CSS (Cascading style sheets) represents a very rich language to style HTML elements. This language can provide very granular control over how each element in a page is displayed.
Javascript can manipulate the CSS specification of HTML elements, so inputs provided on a web page can change how elements look.
A couple of rich resources that can act as examples and reference for CSS are
One of the most popular and in-demand programming languages
Primary use by web developers to develop web servers and applications
JS is native to every modern web browser
which means every computer has it available and it can be used with no special installation requirements
JS can also run outside the browser using Node.js, a JavaScript runtime environment based on Google Chrome
For data scientists, JavaScript provides a powerful computer language for interactive data visualizations as well as general data science workflows that run in the browser
Practically all modern dynamic data visualization toolkits run on JavaScript
We can write JavaScript directly in HTML files using the <script></script> element.
We can also separate the JavaScript from the HTML by writing JavaScript functions in a file (call it app.js, for example), and then load it into our HTML file using <script src="app.js"></script>
Much as we can separate the CSS from the HTML and load it using <link rel="stylesheet" href="mystyle.css">
JavaScript packages (like d3.js, plotly.js, arquero.js, and others) can be loaded into a HTML file. The specification is placed within the <head></head> tags in the HTML
D3 is a JavaScript library for visualizing data. It is a low-level language and is very granular in terms of flexibility and control.
It was developed by Mike Bostock in 2011 as his PhD work under Jeff Heer at Stanford, and proved transformative in the field of data visualization
It is not a charting library per se, in that it can create and manipulate primitive graphical elements in a SVG or WebGL canvas (that lives in a HTML document) driven by data (D3 = Data Driven Documents)
To make a stacked area chart, you might use
a CSV parser to load data,
a time scale for horizontal position (x),
a linear scale for vertical position (y),
an ordinal scale and categorical scheme for color,
a stack layout for arranging values,
an area shape with a linear curve for generating SVG path data,
Each of these elements needs to be specified separately
It appears to be aligned with the Grammar of Graphics model, but with more granularity
Use D3 if you think it’s perfectly normal to write a hundred lines of code for a bar chart
– Amanda Cox
“D3 is overkill for throwing together a private dashboard or a one-off analysis. Don’t get seduced by whizbang examples: many of them took an immense effort to implement!” – https://d3js.org/what-is-d3
Interactive visualization toolkits
d3.js is granular and complicated for starting out
Fortunately there are several alternatives built upon d3.js to make life easier for people
It’s better to have programs that are human-readable
Plotly is a technical computing company headquartered in Montreal, Canada
Develops tools for data visualization, analytics, and statistical tools, as well as graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST
Dash, an open source Python, R, Julia framework for building analytic applications (competes with Shiny)
Chart Studio Cloud is a free, online tool for creating interactive graphics in a point-and-click interface. However, as with any online resource, data privacy is a concern
Figure converters that convert matplotlib, ggplot2 graphs into interactive JS-based graphics.
The base graphing toolkit is plotly.js which is built on top of d3.js and stack.gl
Plotly.js: Interactive controls
Plotly plots have interactive controls to do the following:
Pan: Move around in the plot.
Box Select: Select a rectangular region of the plot to be highlighted.
Lasso Select: Draw a region of the plot to be highlighted.
Autoscale: Zoom to a “best” scale.
Reset axes: Return the plot to its original state.
Toggle Spike Lines: Show or hide lines to the axes whenever you hover over data.
Show closest data on hover: Show details for the nearest data point to the cursor.
Compare data on hover: Show the nearest data point to the x-coordinate of the cursor.
Trace: Describes a collection of data and the specifications about how you want the data displayed on the plotting surface, which is described by the trace type (scatter, box, , etc).
Data: Collection (list) of traces
Layout: Controls various structural and stylistic components of the figure (e.g. title, font, size, etc)
The Python (or R) wrappers create JSON files that map from Python/R commands to the format needed by plotly.js with these fundamental components
Wrapping Plotly
Transforming ggplot
In , the plotly package allows you to directly transform ggplot graphics into plotly web graphics using the ggplotly function. This is fantastic, since developing graphs in ggplot is more familiar.
You may be stuck with default settings though
ggplotly(plt)
This is not great, since
the theme isn’t exactly copied
the default tooltips (see on mouseover) are unformated
Transforming matplotlib
import plotly.express as pximport matplotlib.pyplot as pltimport seaborn as snsimport pandas as pdimport plotly.tools as tplgapminder = px.data.gapminder()gm = gapminder.query("year==2007")fig,ax = plt.subplots()sns.scatterplot(gm, x ="gdpPercap", y ="lifeExp", size ="pop", hue ="continent", ax = ax)ax.set_xlabel("GDP per capita ($)")ax.set_ylabel("Life expectancy")f = tpl.mpl_to_plotly(fig)f.show()
The input arguments for a Plotly express function are similar to other libraries.
The typical data input is a Pandas data frame, list, or numpy array
The x argument is a string naming the column to be used on the x-axis.
The y argument can either be a string or a list of strings naming column(s) to be used on the y-axis.
Basic customization is straight-forward
px.plotting_fn(dataframe, # Dataframe being visualized x = ["column-for-x-axis"], # Accepts a string or a list of strings y = ["columns-for-y-axis"], # Accepts a string or a list of strings title ="Overall plot title", # Accepts a string xaxis_title ="X-axis title", # Accepts a string yaxis_title ="Y-axis title", # Accepts a string width = width_in_points, # Accepts an integer height = height_in_pixels) # Accepts an integer
IMPORTANT: To make stylistic changes, e.g. figure-size, we can use fig.update_layout()
Plotly has an R-based API that covers most but not all of Plotly’s capabilities.
Furthermore, the concepts from the python section, e.g. data, traces, layout, etc, also apply to the R case
import plotly.express as pximport seaborn as snstips = px.data.tips()# print(tips)fig = px.density_heatmap( tips, x="total_bill", y="tip", marginal_x="histogram", marginal_y="histogram", color_continuous_scale=px.colors.sequential.Viridis, nbinsx=50, nbinsy=50, labels=dict(total_bill="Total bill", tip="Tip"), title="Joint distribution of tip and total bill", width=500, height=500,)fig
Themes
There are several built-in themes in plotly. These can be modified (see the documentation)
Note that animations can be cool, but you really need to think whether they are necessary. Animations are not the same as interactivity, and you really need to have a good story (usually changes over time) to make good animated data visualizations.