Methods
Data Wrangling
- International Coffee Organization (ICO) Dataset:
- Choropleth Plot (figure 2): The data used for the choropleth map was created by combining coffee domestic consumption and coffee importers consumption. There were a few cleaning/tidying parts on the data that needed to be completed before visualization. Among these were fixing the duplicate Belgium, Luxembourg, and Belgium/Luxembourg rows. Belgium and Luxembourg had missing data that was easily found in the Belgium/Luxembourg combined row. The missing data was replaced with the information from the combined row and then that row was removed. Additionally, the data had each year as different column. To fix this, the data was transformed using melting to make one column for year. Next, some countries were written in a different format than what would be accepted for mapping purposes. As a result, naming conventions were edited (ex. replacing “Russian Federation” with just “Russia). This made plotting the countries much easier on the map. Finally, a column for continent was added to the data frame using the country information.
- Import Export Plot (figure 9): The ICO dataset was presented in separate csv files that needed to be combined. Since only the countries were listed in the dataset, but not the continent or region, these variables needed to be appended manually, cross-referencing the regions listed for each country on wikipedia. The data needed to be transformed into long form, combining every year into a single column, for the choropleth and line plot visualizations.
- Coffee Institute Data: For information regarding the coffee chains, simple aggregation was conducted to determine the amount of Starbucks locations by country.
- Coffee Demand and Production Plots from Nationmaster: The data for each of these plots was originally formatted as a single csv file per country, which required concatenation prior to plotting. Data were present for the years 2014-2019 and these plots represent the mean value for each country over this time period.
- Production Plot (figure 5): The data used for the line plot came from 6 different data files (one for each respective country). The data was read in from excel files, and thus began the cleaning process. Rows for each country were removed that contained header and other information that wasn’t necessary for the visualization. A column for the country for each row of data was also added to make plotting easier. Each dataset was then combined into one to begin plotting.
Design Choices
Choice of Color Scheme
Our project uses a consistent color scheme ranging from shades of dark brown to light beige to evoke a coffee aesthetic. In general, darker shades of brown indicate a greater volume or intensity of a measure for any continuous representation. For categorical visualizations, the colors simply delineate different groups.
Figure 1: Innovative View - Top 10 Countries by Coffee Demand
This animated figure shows the top 10 countries arranged by their mean coffee demand during the period 2014-2019. There are espresso machine icons at the top of the plot, and the animation shows coffee being dripped into a cup. The length of the coffee drip represents the amount of coffee demand in that country. The animation of the puff of steam coming from the cup after pouring the coffee draws the readers attention immediately to the specific country being represented. After the cup lands at the level of the plot that represents its amount, the actual demand in thousand metric tons is displayed.
This plot is innovative because it combines a number of strong libraries and packages, including ggplot2, gganimate, ggtext, and ggimage, to produce a visually appealing and informative appearance. In addition to showing the top ten countries by average coffee demand from 2014 to 2019, the plot also contains pictures of a coffee maker for each country, which serves as a visual cue and supports the point of view. The geom_richtext() function from the ggtext package enables the addition of rich text and graphics to the plot, enhancing viewer engagement. The gganimate package’s transition_states() method adds a dynamic aspect to the plot by producing an animation. The coffee makers appear to pour coffee into the corresponding cups as the animation goes along, adding a fun and aesthetic element to the story. Finally, the gganimate package’s shadow_mark(), enter_grow(), enter_fade(), and exit_fade() functions produce seamless and visually striking transitions between the animation’s various stages. Overall, the story offers an intriguing and beneficial method to examine patterns in the world’s coffee demand.
Figure 2: Choropleth of Coffee Consumption by Continent and Year
The amount of coffee consumed by a country is represented by the shade of brown filled within its borders. Darker shades of brown indicate greater consumption. The dropdown menu permits the user to filter by continent, facilitating greater focus on the area of interest. The slider at the bottom of the figure conveys the change in consumption over time. A constant colorbar maintains the same scale across all years and continents. The colorbar is located on the right side of the figure to avoid occlusion of the countries. The colorbar is also oriented horizontally to facilitate ease of interpretation. The figure is interactive, allowing the user to hover over a country to recieve more information.
Figure 3: Number of Starbucks Locations by Country
The number of Starbucks locations by country is represented in both traditional and innovative ways. For the former, there is a histogram of the number of Starbucks locations by country. Darker shades of brown indicate greater quantities, following the general color scheme of the website. For the innovative view, a coffee cup is displayed with interactive bubbles inside of it. The area of the bubble represents the number of Starbucks locations, for a more intuitive representation of the data.
Figure 4: Choropleth Coffee Chain Popularity by US State
Each US state is colored by the primary branding that represents the most popular coffee chain. This design choice quickly communicates the meaning behind the color, since many readers will associate Starbucks with green, Dunkin’ Donuts with orange, etc. The non-sequential color palette also makes it easier to see regional distinctions.
Figure 5: Total Coffee Production Over Time
This plot depicts the top 6 countries’ coffee production over time. We decided to implement the plot only showing the top 6 so the plot would be less crowded and could facilitate easier comparison.
Figure 6: Average Quality Scores for Each Category of Coffee Bean
This radar plot displays the quality ratings of coffee based on 6 factors. Again, greater “quality” in this case is represented by a darker color. All properties of the plot are labeled upon mousing-over, reducing room for misinterpretation. Using an area-based approach in the radar plot creates greater comparison between each factor that was rated. The coffee quality categories of “Outstanding”, “Excellent”, “Very Good”, and “Below Specialty Quality” are determined by the Institute of Coffee Quality and are based on the Total Cup Scores for each coffee variant.
Figure 7: Average Coffee Quality Scores by Country of Origin
A circular bar plot shows quality ratings for every country’s coffee. The darker colors represent greater quality. Divisions between each quality category, combined with increasing bar heights that reflect the quality score, help highlight differences in coffee quality.
Figure 8: Category and Count of Coffee Variants for Coffee-Growing Regions in Ethiopia
A treemap plot shows the breakdown of coffee quality by coffee producing regions in Ethiopia. The colors represent different regions, while the area of the rectangle reflects the number of variants produced by that region relative to its quality grouping. The primary effect of these design choices is to delineate the various coffee producing regions in Ethiopia while still providing additional information.
Figure 9: Linked View - Import and Export of Coffee by Country and Continent Over Time
This plot uses faceting to show the imports and exports of coffee over time on the same scale. Each year is represented as a point on the line plot. Mousing over any point reveals additional information about the data, such as the year, country, continent, type of transaction, and amount of the transaction. There is a linked view in the form of a histogram at the bottom of the plot which shows the country’s total transaction amount during the period. The histogram is sorted and filters the top plots upon selection, allowing for comparison of similarly ranked countries. Finally, there is a dropdown to filter by continent which can make the plot less crowded.