class: center, middle, inverse, title-slide .title[ # ISA 401: Business Intelligence & Data Visualization ] .subtitle[ ## 14: Fundamentals of Data Visualization ] .author[ ###
Fadel M. Megahed, PhD
Professor of Information Systems and Business Analytics
Farmer School of Business
Miami University
@FadelMegahed
fmegahed
fmegahed@miamioh.edu
Automated Scheduler for Office Hours
] .date[ ### Fall 2024 ] --- # Refresher: Organization of this Course <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/course_overview.png" alt="How the ISA 401/501 course is organized." width="100%" /> <p class="caption">How the ISA 401/501 course is organized.</p> </div> --- # Learning Objectives for Today's Class - Explain the concept of "graphical excellence" - Explain the theory of data graphics - Optimize visual encoding based on data types - Understand why color should be used sparingly and how to select appropriate colors (when color is a must) --- class: center, inverse, middle # Graphical Excellence --- # Non-graded activity: Terrible Charts
−
+
05
:
00
.panelset[ .panel[.panel-name[Activity] > Over the next 5 minutes, please identify the **1-2 main problems** in the charts in the following tabs. - Write down your answers in the editable area of each chart. - Discuss your answers with your neighboring classmates. - Be prepared to share these answers with class. ] .panel[.panel-name[Russia's Defense Budget] .tiny[**Source:** The chart was embedded in [this tweet](https://twitter.com/CedScherer/status/1498593405408059394?s=20&t=b0aOBtP77mq0WZinrfaH5g) by Cedric Scherer; however, it is unclear which news outlet have created the original chart.] .pull-left-2[ <img src="data:image/png;base64,#https://pbs.twimg.com/media/FMwSwFOWQAM9eek?format=jpg&name=large" alt="Defence budgets for Russia vs Ukraine (2020)" height="350px" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1_russia[ **Main Issue(s):** .font70[(Insert below)] ] ] ] .panel[.panel-name[White House Economy Growth] .tiny[**Source:** The chart was created by the White House and shared via [this tweet from the verified White House account](https://twitter.com/WhiteHouse/status/1486709480351952901?s=20&t=b0aOBtP77mq0WZinrfaH5g). Note that the chart was latter corrected.] .pull-left-2[ <img src="data:image/png;base64,#https://pbs.twimg.com/media/FKHaZFyWYAAP962?format=jpg&name=large" alt="A bar chart, created by The White House, capturing America's Growth Economy in the 21st century" height="350px" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1_whitehouse[ **Main Issue(s):** .font70[(Insert below)] ] ] ] .panel[.panel-name[Tucker Carlson] .tiny[**Source:** The chart was created by Fox and was highlighted in [this Fox News Clip](https://video.foxnews.com/v/6274513508001#sp=show-clips).] .pull-left-2[ <img src="data:image/png;base64,#https://www.ft.com/__origami/service/image/v2/images/raw/https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net%2Fproduction%2F8597019d-978b-40cf-a2a8-fa47610e62cc.png?fit=scale-down&source=next&width=700" alt="A bar chart, created by Fox News, capturing the decline in the number of Americans who self-identify as Christians." height="350px" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1_tucker[ **Main Issue(s):** .font70[(Insert below)] ] ] ] ] --- # Graphical Excellence: What should Graphs Do? - **Show the data** - Lead to thinking about the **substance** rather than something else - Avoid **distorting** what the data have to say - Present **many numbers in a small space** - Make **large datasets coherent** - Encourage the eye to **compare different pieces of the data** - **Reveal the data at several levels of detail**, from a broad overview to the fine structure - Serve **a purpose:** description, exploration, tabulation, decoration - Be **closely integrated with the statistical & verbal descriptions of the data** .footnote[ <html> <hr> </html> **Source:** Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, Conn: Graphics Press, P. 13. ] --- # Show/Reveal the Data: Anscombe's Dataset **In a seminal paper, Anscombe stated:** > **Few of us escape being indoctrinated with these notions:** > - numerical **calculations are exact, but graphs are rough**; > - for any particular kind of **statistical data there is just one set of calculations constituting a correct statistical analysis**; > - performing **intricate calculations is virtuous**, whereas **actually looking at the data is cheating**. He proceeded by stating that > a computer should **make both calculations and graphs**. Both sorts of output should be studied; each will contribute to understanding. Now, let us consider his four datasets, each consisting of eleven (x,y) pairs. .footnote[ <html> <hr> </html> **Source:** Anscombe, Francis J. 1973. "Graphs in Statistical Analysis." *The American Statistician* 27 (1): 17–21. ([PDF Link](https://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf)). --- count: false # Show/Reveal the Data: Anscombe's Dataset .font80[
] --- count: false # Show/Reveal the Data: Anscombe's Dataset .font80[
] --- count: false # Show/Reveal the Data: Anscombe's Dataset <img src="data:image/png;base64,#14_fundamentals_data_viz_files/figure-html/anscombe4-1.png" style="display: block; margin: auto;" /> --- # A Modern Version of Anscombe's Dataset <img src="data:image/png;base64,#../../figures/DinoSequential-1.gif" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Matejka, J. and Fitzmauricem G. 2023. "Same Stats, Different Graphs" *Proceedings of the 2017 CHI conference on human factors in computing systems*. ([Blog Post Link](https://www.research.autodesk.com/publications/same-stats-different-graphs/)). --- # Substance
−
+
05
:
00
.panelset[ .panel[.panel-name[Activity] > In 5 minutes, please **sketch** a better (non-bubble) chart than the one use by William Playfair for plotting the populations of 22 European cities at the end of the 1700s. <img src="data:image/png;base64,#14_fundamentals_data_viz_files/figure-html/playfair_data-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[Your Solution] **Ideally, on a piece of paper sketch out your solution.** Otherwise, please feel free to download the plot's data (using the code below) and use a software of your choice for plotting a better chart for the data. ``` r pacman::p_load(tidyverse) playfair = read.table("http://www.stat.uiowa.edu/~luke/data/Playfair") |> rownames_to_column(var = 'city') |> # converting row names to city var as_tibble() |> # converting it to a tibble arrange( desc(population) ) # arranging the rows in a descending order by population write_csv(x = playfair, file = 'playfair_data.csv') ``` **Be prepared to share your solution with the entire class.** ] .panel[.panel-name[My Solution] <details> <summary>In my opinion, a <strong>dot chart is more effective</strong> than the bubble chart. The <strong>population would be mainly encoded using the position</strong>; you can still use area as a secondary encoding mechanism.</summary>
</details> ] ] --- # Avoid Distortion of Data <img src="data:image/png;base64,#../../figures/integrity.jpg" alt="A distorted bar chart (with non-zero starting values for the y-axis and images that reduces our focus on the data values)" width="80%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Graph from Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, Conn: Graphics Press, P. 54. ] --- count: false # Avoid Distortion of Data <img src="data:image/png;base64,#../../figures/integrity2.jpg" alt="A distorted bar chart (with non-zero starting values for the y-axis)" width="90%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Nathan Yau (2012). [Fox News continues charting excellence](https://flowingdata.com/2012/08/06/fox-news-continues-charting-excellence/). ] --- # Avoid Distortion of Data: The Lie Factor <img src="data:image/png;base64,#../../figures/liefactor.jpg" alt="The Lie Factor" width="75%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Graph from Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, Conn: Graphics Press, P. 57. ] --- # Graphical Integrity Principles: A Summary - Clear, detailed, and thorough labeling and appropriate scales - Size of the graphic effect should be directly proportional to the numerical quantities (“lie factor”) - Show data variation, not design variation --- class: center, inverse, middle # Theory of Data Graphics --- # Definition of Data Ink - **Data-ink refers to the non-erasable ink used for presenting the data**. * If data-ink would be removed from the image, the graphic would lose the content. $$ `\begin{split} \text{Data-ink ratio} &= \frac{\text{Data-ink}}{\text{Total ink used to print the graphic}} \\ \\ &= \text{proportion of a graphic's ink devoted to the} \\ & \quad \text{ non-redudant display of data-information} \\ \\ &= 1.0 - \text{proportion of a graphic that can be erased} \end{split}` $$ --- count: false # Definition of Data Ink - **Data-ink refers to the non-erasable ink used for presenting the data**. * If data-ink would be removed from the image, the graphic would lose the content. <img src="data:image/png;base64,#../../figures/dataink2.jpg" alt="Data ink" width="65%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Slide Adapted from Hanspeter Pfister. Lecture Notes for CS 171. Harvard University. <http://cs171.org/> ] --- count: false # Definition of Data Ink - **Data-ink refers to the non-erasable ink used for presenting the data**. * If data-ink would be removed from the image, the graphic would lose the content. <img src="data:image/png;base64,#../../figures/dataink3.jpg" alt="Data ink" width="65%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Slide Adapted from Hanspeter Pfister. Lecture Notes for CS 171. Harvard University. <http://cs171.org/> ] --- # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di1.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di1) ] --- count: false # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di2.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di2) ] --- count: false # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di3.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di3) ] --- count: false # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di4.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di4) ] --- count: false # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di5.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di5) ] --- count: false # Focus on Data: Avoid Chartjunk **Chartjunk is the extraneous visual elements that distract from the message!!** <img src="data:image/png;base64,#http://www.tbray.org/ongoing/data-ink/di6.png" alt="Data ink" width="58%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** The image is from [The Data-Ink Ratio Example](http://www.tbray.org/ongoing/data-ink/di6) ] --- # Other Subjective Design Principles - **Aesthetics:** Attractive things are perceived as more useful than unattractive ones - **Style:** Communicates brand, process, who the designer is - **Playfulness:** Encourages experimentation and exploration - **Vividness:** Can make a visualization more memorable .footnote[ <html> <hr> </html> **Source:** Slide Adapted from Hanspeter Pfister. Lecture Notes for CS 171. Harvard University. <http://cs171.org/>. Information is based on comments by Pat Hanrahan in November 2007. Those can be found at <http://www.perceptualedge.com/blog/?p=199> ] --- class: center, inverse, middle # Data Models --- <img src="data:image/png;base64,#../../figures/stevens_science_paper.png" width="55%" style="display: block; margin: auto;" /> --- # Data Types (from S. Stevens, Theory of Scales) <img src="data:image/png;base64,#../../figures/datatypes.png" width="85%" style="display: block; margin: auto;" /> --- # Data Types: Explained .pull-left-2[ .font130[**Nominal:**] - Are `\(=\)` or `\(\neq\)` to other values - Apples, bananas, oranges, etc. .font130[**Ordinal:**] - Obey a `\(<\)` relationship - Small, medium and large .font130[**Quantitative:**] - Can do math on them - 50 inches, 53 inches, etc. ] .pull-right-2[ <img src="data:image/png;base64,#../../figures/datatypes1.png" width="85%" style="display: block; margin: auto;" /> ] --- # Quantitative Data Types (from S. Stevens) **Quantitative data can be further divided into:** .font130[**Intervals (Location of Zero Arbitrary):**] - Dates: Jan 19; Location: (Lat, Long) - Only differences (i.e., intervals) can be compared. .font130[**Ratio (Zero Fixed):**] - Measurements: Length, Weight, ... - Origin is meaningful, we can compute ratios, proportions, differences, etc. --- # Non-Graded Activity: Data Terminology
−
+
02
:
00
.panelset[ .panel[.panel-name[Activity] > In 2 minutes, please identify an appropriate data type for each column below. .font70[
] ] .panel[.panel-name[Your Solution] .can-edit.key-activity3[ **Data Types:** .font70[(Edit below)] - Order ID: ________________ - Order/Ship Date: ________________ - Order Priority: ________________ - Product Container: ________________ - Product Cost: ________________ ] ] ] --- # Data vs. Conceptual Models .font130[**From data model**] - 32.5, 54.0, -17.3, ... .font130[**Using a conceptual model:**] - Temperature .font130[**To data type:**] - Continuous to x significant digits i.e. quantitative - Hot, warm, cold i.e. ordinal - Burned vs. not burned i.e. nominal --- # Image Model: Visual (Encoding) Variables <img src="data:image/png;base64,#../../figures/visualvariables.png" width="70%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Bertin (1967), Seminology of Graphics. ] --- # Mapping to Data Types <img src="data:image/png;base64,#../../figures/mappingdatatypes.png" width="65%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Slide Adapted from Hanspeter Pfister. Lecture Notes for CS 171. Harvard University. <http://cs171.org/> ] --- # Visual Channels and their Precision <img src="data:image/png;base64,#../../figures/franconeri2021.png" width="70%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Franconeri, et al. (2021). [The Science of Visual Data Communication: What Works](https://doi.org/10.1177/15291006211051956). To view the full size image, click [here](https://t.co/Rfr8qSj5HB). ] --- class: center, inverse, middle # Color Should Be Used Sparingly --- # Color
−
+
01
:
00
.panelset[ .panel[.panel-name[Activity] .pull-left[ - The following map of Nevada has been colored to indicate various geological features in each county. - Estimate the larger land area-more red, more green, or the same-and mark your answer on the mentimeter poll in the next panel. - **Please work fairly quickly, as if you were trying to gain an overall impression from a map**. ] .pull-right[ <img src="data:image/png;base64,#../../figures/cleveland1983.png" width="100%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Your Solution] <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 0px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='300' src='https://www.mentimeter.com/embed/fc5ffe4fa8894c11b2b73a67b70478b3/96b5a0976aac' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> ] .panel[.panel-name[Results from 1983 Experiment] <details> <summary>The results from the original experiment were as follows</summary> <br> On the average, <span style="color:red">49 percent of the judgments were that red was bigger</span>, <span style="color:green">22 percent of the judgments were that green was bigger</span>, and 31 percent of the judgments were that they looked the same size. </details> ] ] --- # Simultaneous Contrast Affects Perception <img src="data:image/png;base64,#../../figures/sc2.png" width="85%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Slide Adapted from Hanspeter Pfister. Lecture Notes for CS 171. Harvard University. <http://cs171.org/> ] --- # Color Blindness > "About 1 in 12 men are color blind" -- [NIH's At a glance: Color Blindness](At a glance: Color Blindness) ### Color Blindness Can Distort a Person's Reading/Interpretation of a Chart - **Personal Check:** [To have a feel for color blindness (if you are not color blind), you can take this color blind test](https://enchroma.com/pages/test) - Given the high prevalence of color blind individuals, your charts **should** accommodate for color-blindness. **How?** * Use color sparingly * Use color friendly palettes, e.g., see <https://colorbrewer2.org/> --- # Color Brewer: Color Scales and their Selection <img src="data:image/png;base64,#../../figures/brewer1.png" width="100%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Brewer, Cynthia A. “Color use guidelines for data representation.”Proceedings of the Section on Statistical Graphics, American Statistical Association. 1999. ] --- class: inverse, center, middle # Recap --- # Summary of Main Points By now, you should be able to do the following: - Explain the concept of "graphical excellence" - Explain the theory of data graphics - Optimize visual encoding based on data types - Understand why color should be used sparingly and how to select appropriate colors (when color is a must) --- # Things to Do Prior to Next Class Please go through the following two supplementary readings and complete [assignment 08](https://miamioh.instructure.com/courses/223961/quizzes/667848). 1. [The Lie Factor and the Baseline Paradox](https://nightingaledvs.com/the-lie-factor-and-the-baseline-paradox/); especially noting what the authors mean by "baseline", how the lie factor may be ignored in time-series applications, and/or in applications involving a "ratio" scale. 2. [Useful junk? The effects of visual embellishment on comprehension and memorability of charts](https://dl.acm.org/doi/pdf/10.1145/1753326.1753716?casa_token=GiPCX3MfWRAAAAAA:bj4hH3e73nNkD5ldnL_ohbDqkA_dBhAdTR9HE3A215Mks4E6Lja3lJL90rIqlLXWu1puahbk5XZq), which presents an experimental counter against Tufte's argument for simplicity (by quantifying vividness and recall of data from the more artistic charts). Note they define **"ratio"** different from how we have defined in class.