class: center, middle, inverse, title-slide .title[ # ISA 401: Business Intelligence & Data Visualization ] .subtitle[ ## 01: Introduction to BI and Data Viz ] .author[ ###
Fadel M. Megahed, PhD
Endres Associate Professor
Farmer School of Business
Miami University
@FadelMegahed
fmegahed
fmegahed@miamioh.edu
Automated Scheduler for Office Hours
] .date[ ### Spring 2024 ] --- # Learning Objectives for Today's Class - Describe **course objectives** and **structure**. - Define **data visualization** and describe its **main goals**. - Describe the **BI methodology** and its **major concepts**. --- class: inverse, center, middle # Course Design, Expectations, and Overview --- # The Analytics Journey: Pre-Analytics [1] - **Pre-Analytics/Data Management:** where one attempts to **extract** the needed *data* for analysis. Data can either be: .div[ .pull-left[ ## .center[.large[.large[.large[🥫]]]] * Stale, uninteresting, convenient * Highly processed and archived * Example: `iris`, `mtcars`, `titanic` ] .pull-right[ ## .center[.large[.large[.large[🍅]]]] * Fresh, interesting, challenging * Impactful * Examples: [Cincinnati Open Data Portal](https://data.cincinnati-oh.gov/), [Ohio Data Portal](https://data.ohio.gov/wps/portal/gov/data/), [US Government's Open Data](https://www.data.gov/). ] ] .footnote[ <html> <hr> </html> While the highly processed data can be useful in learning basic concepts, **real-world (often messy)** data real are much interesting to work with -- **e.g., we can make useful & meaningful decision from the data.** In this class, we will learn how to scrape, extract and clean messy data in addition to visualizing clean[ed] data. Source: Slide inspired by [Kia Ora's What I mean by "data"](https://stats220.earo.me/01-intro.html#6). ] --- # The Analytics Journey: Pre-Analytics [2] ### Non-Graded Class Activity #1
−
+
05
:
00
> _Take 5 minutes to discuss with your partner_ .panelset[ .panel[.panel-name[Activity] - Go to <https://data.cincinnati-oh.gov/Safety/Traffic-Crash-Reports-CPD-/rvmt-pkmq/data> - Download the data utilizing the export column and answer the following questions: * How many **observations/rows** and **columns** do we have in the dataset? * How many **crashes** are reported in the dataset? ] .panel[.panel-name[Your Solution] - .can-edit.key-activity1[Insert your solution here (Use Chrome as your browser to edit this part of the page)] ] .panel[.panel-name[Fadel's Approach (No Solution Shown)] ```r if(require(tidyverse) == FALSE) install.packages("tidyverse") # Link obtained from site -> Export -> "Right Click on" CSV crashes = readr::read_csv("https://data.cincinnati-oh.gov/api/views/rvmt-pkmq/rows.csv?accessType=DOWNLOAD") # Number of rows and columns nrow(crashes) ncol(crashes) # Or alternatively dim(crashes) # Total number of crashes # Will be discussed in class in greater detail ``` ] ] --- # The Analytics Journey: Descriptive [1] **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. ### Descriptive Statistics for 2 Categorical Variables .small[ ``` ## $dayofweek ## ## FRI SUN SAT WED THU TUE MON ## 4262 2815 3303 3742 3921 3762 3432 ## ## $weather ## ## 1 - CLEAR 2 - CLOUDY ## 17970 3730 ## 4 - RAIN 6 - SNOW ## 2930 272 ## 99 - OTHER/UNKNOWN 3 - FOG, SMOG, SMOKE ## 249 37 ## 9 - FREEZING RAIN OR FREEZING DRIZZLE 5 - SLEET, HAIL ## 3 29 ## 7 - SEVERE CROSSWINDS ## 17 ``` ] --- # The Analytics Journey: Descriptive [2] **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. ### A Simple Visualization - A Bar Chart of Crashes Per Day <img src="data:image/png;base64,#01_Introduction_files/figure-html/viz-1.png" style="display: block; margin: auto;" /> --- # The Analytics Journey: Descriptive [3] **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. <img src="data:image/png;base64,#01_Introduction_files/figure-html/viz2a-1.png" style="display: block; margin: auto;" /> --- # The Analytics Journey: Descriptive [4] **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. <img src="data:image/png;base64,#../../figures/crash_anim.gif" style="display: block; margin: auto;" /> --- --- # The Analytics Journey: Predictive [1] **Predictive Analytics:** where **statistical** and **machine learning** models are used to help us utilize independent variable[s] to predict an outcome variable of choice. * **Many** consider this component to be the 🍰 aspect of the analytics journey. * IMO, this is not always true, but your success in this stage is **hinged on**: + **Correct** ✅ data, i.e., - *Do you actually capture the important predictors?* - *Is your data aggregated to the right level?* + **Cleaned** 🛀 data, i.e., - *Is your data tidy?* - *Is your data technically correct?* - *Is your data consistent?* --- # The Analytics Journey: Predictive [2] **Predictive Analytics:** where **statistical** and **machine learning** models are used to help us utilize independent variable[s] to predict an outcome variable of choice. * With the aforementioned constraints/setup, now you can explore how to model the data using statistical and machine learning models? * **Some recommendations:** + Start with the simplest (which is also often the most easy-to-explain) model first. + If you are happy with the predictive performance (i.e., no gains would be of practical benefit), you are done 👏. + If not, ↩️ and try other models. --- # The Analytics Journey: Prescriptive **Prescriptive Analytics:** where **mathematical models** are used to make recommendations for business actions. - Our **overarching goal** behind data/business analytics, is to **make informed decisions based on what we have learned from the data**. Hence, this stage is where we build on what we learned during the *descriptive* and *predictive* stages to make more informed decisions. - Imagine that you are a large trucking company (e.g., Amazon, Fedex, JB Hunt), and you have models that show **both**: * Safety critical events that are associated with crashes. * The occurrence of safety critical events can be reasonably predicted as a function of: (a) driver characteristics, (b) weather conditions, and (c) traffic conditions. - **As a business analyst, what two reasonable questions would you attempt to approach/optimize for?** --- # How does our Curriculum at Miami University Prepare you for this Journey? <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/ba_flow_chart.png" alt="Fadel's take on our ISA curriculum" width="100%" /> <p class="caption">My take on the courses within the business analytics major/minor at Miami University</p> </div> --- # ISA 401 Course: An Overview <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/course_overview.png" alt="How the ISA 401 course is organized." width="100%" /> <p class="caption">How the ISA 401 course is organized.</p> </div> --- # ISA 401 Course Objectives Even though software will be extensively used, this is not a software class. **Instead, the focus is on understanding the underlying methods and mindset of how data should be approached.** - Be capable of extracting, transforming and loading (ETL) data using multiple platforms (e.g. <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#C3142D;" xmlns="http://www.w3.org/2000/svg"> <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>, Power BI and/or Tableau). - Write basic scripts to preprocess and clean the data. - Explore the data using visualization approaches that are based on sound human factors. - Understand how statistical/machine learning can capitalize on the insights generated from the data visualization process. - Create interactive dashboards that can be used for business decision making, reporting and/or performance management. - Be able to apply the skills from this class in your future career. --- # Should you Care? Read this Job Ad When I have designed this course, I have incorporated a lot of feedback from **industry collaborators, peer/leading academic programs, and state-of-the-art-research advancements.** Thus, this is meant to be a hands-on, practically-relevant course. ### Non-Graded Class Activity #2
−
+
04
:
00
.panelset[ .panel[.panel-name[Activity] .small[ To demonstrate the practicality of this course, let us consider [this job ad](https://www.indeed.com/viewjob?jk=77f4cf1687882e41&tk=1eff1eg1op7cg800&from=serp&vjs=3). - Please open the Data Scientist (6257U) - CED Data Scientist position at UC - Berkeley by clicking [here](https://www.indeed.com/viewjob?jk=77f4cf1687882e41&tk=1eff1eg1op7cg800&from=serp&vjs=3). - Compare the **responsibilities** and the **required qualifications** with the course objectives. - Read through the required qualifications. - **Document what you will learn in this course to make you more competitive.** ] ] .panel[.panel-name[Documentation Space] - .can-edit.key-activity4[Insert your solution here (Use Chrome as your browser to edit this part of the page)] ] ] --- # Should you Care? Recent Alumni Testimonials <img src="data:image/png;base64,#../../figures/email1.jpg" width="100%" style="display: block; margin: auto;" /><br> <br><img src="data:image/png;base64,#../../figures/email2.jpg" width="100%" style="display: block; margin: auto;" /> --- # Instructional Approach <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/instructional_approach.png" alt="An overview of the instructional approach for ISA 401." width="100%" /> <p class="caption">An overview of the instructional approach for ISA 401.</p> </div> --- # How will I Evaluate your Learning? <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/evaluation.png" alt="An overview of the evaluation components for ISA 401." width="100%" /> <p class="caption">An overview of the evaluation components for ISA 401.</p> </div> --- class: inverse, center, middle # Introductions: Getting to Know Each Other --- # About Me – My route to Miami University - Application of data-driven decisions (D3) in 3 continents. - **Interests:** Applications in logistics, manufacturing, occupational safety & portfolios. - **Collaborations with:** Aflac, GE Research, Gore, IBM Research, & Tennibot <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#01_Introduction_files/figure-html/my_map-1.gif" alt="My journey with data driven decisions." width="100%" /> <p class="caption">My journey with data-driven decision making.</p> </div> --- # Getting to Know Your Learning Objectives <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 6px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/app/presentation/3d85154a0564812d34da0bb42e349428/6c73bccc31d3/embed' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> --- class: inverse, center, middle # So What is Data Visualization? --- # What is Data Visualization? Data visualization involves **presenting data in a graphical format**. It is really a process that starts by getting data, creating initial plot(s) and modifying them to answer questions of interest (and possibly making the plot aesthetically pleasing). For example, see [Cedric Scherer's visualization of the UNESCO data on global student to teacher ratios](https://www.cedricscherer.com/2019/05/17/the-evolution-of-a-ggplot-ep.-1/). <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/1e7033393a2c70dc1255c5d0f1c563e945519251/61035/img/evol-ggplot/evol-ggplot-1.gif" width="58%" style="display: block; margin: auto;" /> --- # The Goals of Data Visualization - **Record** information - **Analyze** data to support reasoning * Develop and assess hypotheses (EDA) * Reveal patterns * Discover errors in data - **Communicate** ideas to others * Infographics * Statistic charts * Interactive charts * Dashboards - **Interact with the data (which supports all the above)** --- # Record Information <html> <center> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">I'm a sucker for clean tables. Last week, I used <a href="https://twitter.com/hashtag/RStats?src=hash&ref_src=twsrc%5Etfw">#RStats</a> and gtExtra magic to summarize by Peloton data.<br><br>This week, I couldn't resist taking reactablefmtr for a test drive too. <a href="https://twitter.com/kc_analytics?ref_src=twsrc%5Etfw">@kc_analytics</a>, this package is beautiful!<br><br>🔗: <a href="https://t.co/9KZHjRsJFM">https://t.co/9KZHjRsJFM</a> <a href="https://t.co/Z18ddDM9SR">pic.twitter.com/Z18ddDM9SR</a></p>— Tanya Shapiro (@tanya_shapiro) <a href="https://twitter.com/tanya_shapiro/status/1480648097533509640?ref_src=twsrc%5Etfw">January 10, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> </center> </html> --- # Analyze Data <img src="data:image/png;base64,#01_Introduction_files/figure-html/cincy_crashes-1.png" style="display: block; margin: auto;" /> --- # Reveal Patterns: The 1854 Cholera Outbreak <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/cholera.jpg" alt="The physician John Snow, dealing with a Cholera outbreak plotted the cases on a map of the city (see schematic above)." width="50%" /> <p class="caption">The physician John Snow, dealing with a Cholera outbreak plotted the cases on a map of the city (see schematic above).</p> </div> .footnote[ <html> <hr> </html> Source: Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of Massive Data Sets (Third Edition). Cambridge University Press. Image is from Chapter 1, which can be accessed [here](http://infolab.stanford.edu/~ullman/mmds/ch1n.pdf). ] --- # Reveal Patterns: COVID-19 Vaccination Rates <img src="data:image/png;base64,#../../figures/animated_vaccine_map.gif" width="70%" style="display: block; margin: auto;" /> --- # Communicate Ideas: C.J Minard 1869 <img src="data:image/png;base64,#../../figures/minard.png" width="100%" style="display: block; margin: auto;" /> --- # Communicate Ideas
−
+
05
:
00
.panelset[ .panel[.panel-name[Activity] .pull-left[ ### Non-Graded Class Activity #3 .small[ - Who is the target audience? - What is the data represented in this visualization? Be Specific. - How is the data visually encoded? - Do you like/dislike this visualization? Why? - Would you do visualization like this for a similar dataset? Why? Why not? ] ] .pull-right[ <img src="data:image/png;base64,#../../figures/wpost.jpg" width="77%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Your Solution] - .can-edit.key-activity5[Insert your solution here (Use Chrome as your browser to edit this part of the page)] ] ] --- # Interact: GapMinder/ Hans Rosling Example <html> <center> <div style="max-width:854px"><div style="position:relative;height:0;padding-bottom:56.25%"><iframe src="https://embed.ted.com/talks/lang/en/hans_rosling_the_best_stats_you_ve_ever_seen" width="854" height="480" style="position:absolute;left:0;top:0;width:100%;height:100%" frameborder="0" scrolling="no" allowfullscreen></iframe></div></div> </center> </html> --- class: inverse, center, middle # Business Intelligence: From Visualizations to Dashboards to Insights --- # What is Business Intelligence? > "... to enable **interactive access (sometimes in real time)** to data, to enable manipulation of data, and to give business managers and analysts the ability to conduct appropriate analysis. By analyzing ... data, situations, and performances, decision makers get valuable insights that enable them to **make more informed and better decisions** ... BI is based on the **transformation of data to information, then to decisions, and finally to actions.**" <img src="data:image/png;base64,#../../figures/stock_market.JPG" alt="A schematic of an interactive BI tool for stock market prediction" width="55%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Quote** from Sharda, R., Delen, D., & Turban, E. (2013). Business Intelligence: A managerial perspective on analytics. Prentice Hall Press. **Image Credit:** Joint work with Bin Weng. ] --- # The BI Process <img src="data:image/png;base64,#../../figures/bi_process.jpg" alt="A schematic of the different components of the business intelligence (BI) process" width="73%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Image Credit:** Sharda, R., Delen, D., & Turban, E. (2013). Business Intelligence: A managerial perspective on analytics. Prentice Hall Press. ] --- class: inverse, center, middle # Recap --- # Summary of Main Points By now, you should be able to do the following: - Describe **course objectives** and **structure**. - Define **data visualization** and describe its **main goals**. - Describe the **BI methodology** and its **major concepts**. --- # 📝 Review and Clarification 📝 1. **Class Notes**: Take some time to revisit your class notes for key insights and concepts. 2. **Zoom Recording**: The recording of today's class will be made available on Canvas approximately 3-4 hours after the session ends. 3. **Questions**: Please don't hesitate to ask for clarification on any topics discussed in class. It's crucial not to let questions accumulate. --- # 📖 Required Readings 📖 .font90[ #### 📈 R Prep - [Workflow: Basics](https://r4ds.had.co.nz/workflow-basics.html) - [Names and Values](https://adv-r.hadley.nz/names-values.html) - [Vectors](https://adv-r.hadley.nz/vectors-chap.html) - [Subsetting](https://adv-r.hadley.nz/subsetting.html) #### 🤖 LLM: Prep - [A Very Gentle Introduction to Large Language Models without the Hype](https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e) ] --- # 🎯 Assignment 🎯 - Complete [Assignment 01](https://miamioh.instructure.com/courses/213901/quizzes/611597) on Canvas to reinforce your understanding and application of the topics covered today as well as the assigned readings.