ISA 401: Business Intelligence & Data Visualization

class: center, middle, inverse, title-slide

.title[
# ISA 401: Business Intelligence & Data Visualization
]
.subtitle[
## 08: Tidy Data in <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z" /></svg>
]
.author[
### <br>Fadel M. Megahed, PhD <br><br>Professor of Information Systems and Business Analytics <br> Farmer School of Business<br> Miami University<br><br> <a href="https://twitter.com/FadelMegahed"><svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z" /></svg> <span class="citation">@FadelMegahed</span></a> <br> <a href="https://github.com/fmegahed/"><svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z" /></svg> fmegahed</a> <br> <a href="mailto:fmegahed@miamioh.edu"><svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z" /></svg> fmegahed@miamioh.edu</a><br> <a href="https://calendly.com/fmegahed"><svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;fill:white;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z" /></svg> Automated Scheduler for Office Hours</a><br><br>
]
.date[
### Fall 2024
]

---

# Quick Refresher from Last Class

✅ Describe what is an API

✅ Download data using APIs

---

# Learning Objectives for Today's Class

- Define tidy data

- Perform pivot and rectangling operations in <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#C3142D;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>

---
class: inverse, center, middle

# Tidy Data 🧹

---

background-image: url("data:image/png;base64,#https://images.unsplash.com/photo-1587654780291-39c9404d746b?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1950&q=80")
background-size: 90% 90%

.footnote[
<html>
<hr>
</html>
**Sources:** All three images are obtained from Upsplash
]

???

* each piece is a scalar, a vector, a tibble, a list
* each piece is vector functions or table functions
* bc they are standard and consistent blocks,
* ensemble them to a simple mario, or complex nintendo
* so far you worked with several datasets now,
* hopefully you find a pattern is using these dplyr verbs in different ways to solve various problems.

---

# The R for Data Science Workflow

.footnote[
<html>
<hr>
</html>
**Source:** Image is from Wickham, H. Grolemnund, G. (2017). "R for Data Science", O'Reily. <https://r4ds.had.co.nz/introduction.html>
]

---

# The Rationale for Tidy Data

- The **tidy framework** provides a **consistent way to organize your data** in <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#C3142D;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>.

- Getting your data into this format requires some **upfront work, but that work pays off in the long term.**

- Once you have tidy data and the tidy tools provided by packages in the `tidyverse`, you will spend **much less time munging data from one representation to another, allowing you to spend more time on the analytic questions at hand.**

.footnote[
<html>
<hr>
</html>
**Source:** Slide is based on Wickham, H. Grolemnund, G. (2017). "R for Data Science", O'Reily. <https://r4ds.had.co.nz/tidy-data.html>.
]

---

background-image: url(data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/main/rstats-artwork/tidydata_1.jpg)
background-size: 95% 95%

.footnote[
**Source:** Illustration is from the Openscapes blog [Tidy Data for reproducibility, efficiency, and collaboration](https://www.openscapes.org/blog/2020/10/12/tidy-data/) by Julia Lowndes and Allison Horst
]

???

* In database, this is schema.
* Tidy data principles are a rephrase of third norm in a database schema design.
<https://en.wikipedia.org/wiki/Third_normal_form>, to data scientists.
* tidy data is for human consumption.
* Tabular data is column-oriented format

---

background-image: url(data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/main/rstats-artwork/tidydata_2.jpg)
background-size: contain

---

# tidy data <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M369.8 37.4c14.7 9.8 18.7 29.7 8.9 44.4L337.1 144H400c17.7 0 32 14.3 32 32s-14.3 32-32 32H294.5l-64 96H400c17.7 0 32 14.3 32 32s-14.3 32-32 32H187.8l-65.2 97.7c-9.8 14.7-29.7 18.7-44.4 8.9s-18.7-29.7-8.9-44.4L110.9 368H48c-17.7 0-32-14.3-32-32s14.3-32 32-32H153.5l64-96H48c-17.7 0-32-14.3-32-32s14.3-32 32-32H260.2l65.2-97.7c9.8-14.7 29.7-18.7 44.4-8.9z"/></svg> clean data

.blue[.center[The `movies` data is tidy but not clean.]]

``` r
movies <- tibble::as_tibble(jsonlite::read_json(
  "https://vega.github.io/vega-editor/app/data/movies.json",
  simplifyVector = TRUE))

movies |> 
  dplyr::relocate(Release_Date, US_DVD_Sales) |> # move cols to front
  dplyr::slice(37:39, 268:269) |> # filter specific row numbers
  print(width = 80) # print nicely
```

```
## # A tibble: 5 × 16
##   Release_Date US_DVD_Sales Title     US_Gross Worldwide_Gross Production_Budget
##   <chr>               <int> <chr>        <int>           <dbl>             <int>
## 1 9-Mar-94               NA Four Wed… 52700832       242895809           4500000
## 2 18-Oct-06              NA 51 Birch…    84689           84689            350000
*## 3 1963-01-01             NA 55 Days … 10000000        10000000          17000000
*## 4 <NA>                   NA Drei             0               0           7200000
## 5 16-Jan-98              NA The Dress    16556           16556           2650000
## # ℹ 10 more variables: MPAA_Rating <chr>, Running_Time_min <int>,
## #   Distributor <chr>, Source <chr>, Major_Genre <chr>, Creative_Type <chr>,
## #   Director <chr>, Rotten_Tomatoes_Rating <int>, IMDB_Rating <dbl>,
## #   IMDB_Votes <int>
```

---

# Non-graded Activity: Tidy or Not?

.panelset[

.panel[.panel-name[Activity]

.small[
- In the next five panels, there five tables all displaying the number of TB cases documented by the World Health Organization in Afghanistan, Brazil, and China between 1999 and 2000.

- The data contains values associated with four variables (country, year, cases, and population), but each table organizes the values in a different layout.

- **Based on the information in the previous slide, please document which of the table(s) is(are) tidy and if not, which rules are violated.**

- **Discuss your answer with your neighboring colleague.**

> _Note that you have a total of five minutes for this non-graded activity._

]
]

.panel[.panel-name[`table1`]

```
## table1 from the tidyr package is printed below:
```

```
## # A tibble: 6 × 4
##   country      year  cases population
##   <chr>       <dbl>  <dbl>      <dbl>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3 Brazil       1999  37737  172006362
## 4 Brazil       2000  80488  174504898
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583
```

.can-edit.key-activity1[
tidy/not-tidy: ..............................

data observations? data variables? ...........

rules broken (if any): ........................
]

]

.panel[.panel-name[`table2`]

```
## table2 from the tidyr package is printed below:
```

```
## # A tibble: 12 × 4
##    country      year type            count
##    <chr>       <dbl> <chr>           <dbl>
##  1 Afghanistan  1999 cases             745
##  2 Afghanistan  1999 population   19987071
##  3 Afghanistan  2000 cases            2666
##  4 Afghanistan  2000 population   20595360
##  5 Brazil       1999 cases           37737
##  6 Brazil       1999 population  172006362
##  7 Brazil       2000 cases           80488
##  8 Brazil       2000 population  174504898
##  9 China        1999 cases          212258
## 10 China        1999 population 1272915272
## 11 China        2000 cases          213766
## 12 China        2000 population 1280428583
```

.can-edit.key-activity2[
tidy/not-tidy: .............................. & rules broken (if any): ........................
]
]

.panel[.panel-name[`table3`]

```
## table3 from the tidyr package is printed below:
```

```
## # A tibble: 6 × 3
##   country      year rate             
##   <chr>       <dbl> <chr>            
## 1 Afghanistan  1999 745/19987071     
## 2 Afghanistan  2000 2666/20595360    
## 3 Brazil       1999 37737/172006362  
## 4 Brazil       2000 80488/174504898  
## 5 China        1999 212258/1272915272
## 6 China        2000 213766/1280428583
```

.can-edit.key-activity3[
tidy/not-tidy: ..............................

data observations? data variables? ...................

rules broken (if any): ........................
]

]

.panel[.panel-name[`table4a`]

```
## table4a from the tidyr package is printed below:
```

```
## # A tibble: 3 × 3
##   country     `1999` `2000`
##   <chr>        <dbl>  <dbl>
## 1 Afghanistan    745   2666
## 2 Brazil       37737  80488
## 3 China       212258 213766
```

.can-edit.key-activity4a[
tidy/not-tidy: ..............................

data observations? data variables? ..................

rules broken (if any): ........................
]

]

.panel[.panel-name[`table4b`]

```
## table4b from the tidyr package is printed below:
```

```
## # A tibble: 3 × 3
##   country         `1999`     `2000`
##   <chr>            <dbl>      <dbl>
## 1 Afghanistan   19987071   20595360
## 2 Brazil       172006362  174504898
## 3 China       1272915272 1280428583
```

.can-edit.key-activity4b[
tidy/not-tidy: ..............................

data observations? data variables? .......................

rules broken (if any): ........................
]

]

---
class: center, inverse, middle

# Getting Data into Tidy Format

---

# Key Functions from the `tidyr` <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:gold;overflow:visible;position:relative;"><path d="M50.7 58.5L0 160H208V32H93.7C75.5 32 58.9 42.3 50.7 58.5zM240 160H448L397.3 58.5C389.1 42.3 372.5 32 354.3 32H240V160zm208 32H0V416c0 35.3 28.7 64 64 64H384c35.3 0 64-28.7 64-64V192z"/></svg>

.pull-right-2[
.center[[<img src="https://raw.githubusercontent.com/rstudio/hex-stickers/master/PNG/tidyr.png" width="240px">](http://tidyr.tidyverse.org)]
]

.pull-left-2[
<div id="nirlwltgdy" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
  
  <table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false" style="-webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3;" bgcolor="#FFFFFF">
  <thead style="border-style: none;">
    <tr class="gt_col_headings" style="border-style: none; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3;">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="type" style="border-style: none; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: left;" bgcolor="#FFFFFF" valign="bottom" align="left">type</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="<span class='gt_from_md'><code>function()</code></span>" style="border-style: none; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: left;" bgcolor="#FFFFFF" valign="bottom" align="left"><span class="gt_from_md"><code style="margin-top: 0; margin-bottom: 0;">function()</code></span></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="<span class='gt_from_md'><code>function()</code></span>" style="border-style: none; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: left;" bgcolor="#FFFFFF" valign="bottom" align="left"><span class="gt_from_md"><code style="margin-top: 0; margin-bottom: 0;">function()</code></span></th>
    </tr>
  </thead>
  <tbody class="gt_table_body" style="border-style: none; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3;">
    <tr style="border-style: none;"><td headers="type" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md"><strong style="margin-top: 0; margin-bottom: 0;">pivoting</strong></span></td>
<td headers="fun1()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">pivot_longer()</span></td>
<td headers="fun2()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">pivot_wider()</span></td></tr>
    <tr style="border-style: none;"><td headers="type" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md"><strong style="margin-top: 0; margin-bottom: 0;">splitting/combining</strong></span></td>
<td headers="fun1()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">separate()</span></td>
<td headers="fun2()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">unite()</span></td></tr>
    <tr style="border-style: none;"><td headers="type" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md"><strong style="margin-top: 0; margin-bottom: 0;">nesting/unnesting</strong></span></td>
<td headers="fun1()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">nest()</span></td>
<td headers="fun2()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">unnest()</span></td></tr>
    <tr style="border-style: none;"><td headers="type" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md"><strong style="margin-top: 0; margin-bottom: 0;">missing</strong></span></td>
<td headers="fun1()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">complete()</span></td>
<td headers="fun2()" class="gt_row gt_left" style="border-style: none; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;" valign="middle" align="left"><span class="gt_from_md">fill()</span></td></tr>
  </tbody>
  
  
</table>
</div>
]

.footnote[
<html>
<hr>
</html>
**Source:** Slide is based on [Earo Wang's STAT 220 Slides](https://stats220.earo.me/06-tidy-data.html#15)
]

---

# Wide Vs Long Data

.footnote[
<html>
<hr>
</html>
**Source:** Image is from Garrick Aden-Buie's excellent [tidyexplain GitHub Repository](https://github.com/gadenbuie/tidyexplain/blob/main/images/static/png/original-dfs-tidy.png)
]

---

# `pviot_()` to Transform Wide from/to Long

.footnote[
<html>
<hr>
</html>
**Source:** Image is from Garrick Aden-Buie's excellent [tidyexplain GitHub Repository](https://github.com/gadenbuie/tidyexplain/blob/main/images/tidyr-pivoting.gif)
]

---

# The `pivot_longer()` Function

---

# `pivot_longer()` for table4a [1]

To tidy a dataset like this, we need to pivot the **offending columns into a new pair of variables**. To describe that operation we need **three parameters:**

- The set of columns whose names are values, not variables. In this example, those are the columns `1999` and `2000`.

- The name of the variable to move the column names to. Here it is `year`.

- The name of the variable to move the column values to. Here it’s `cases`.

---

# `pivot_longer()` for table4a [2]

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/3aea19108d39606bbe49981acda07696c0c7fcd8/2de65/images/tidy-9.png" alt="Pivoting table4a into a longer, tidy form" width="100%" />
<p class="caption">Pivoting table4a into a longer, tidy form</p>
</div>

.footnote[
<html>
<hr>
</html>
**Source:** Slide is based on Wickham, H. Grolemnund, G. (2017). "R for Data Science", O'Reily. <https://r4ds.had.co.nz/tidy-data.html>.
]

---

# `pivot_longer()` for table4a [3]

``` r
tidyr::pivot_longer(table4a, c(`1999`, `2000`), names_to = "year", values_to = "cases")
```

```
## # A tibble: 6 × 3
##   country     year   cases
##   <chr>       <chr>  <dbl>
## 1 Afghanistan 1999     745
## 2 Afghanistan 2000    2666
## 3 Brazil      1999   37737
## 4 Brazil      2000   80488
## 5 China       1999  212258
## 6 China       2000  213766
```

---

# The `pivot_wider()` Function

---

# `pivot_wider()` for table2 [1]

- `pivot_wider()` is the opposite of `pivot_longer()`.

- You use it when an observation is scattered across multiple rows.

- For example, take table2: an observation is a country in a year, but each observation is spread across two rows.

``` r
head(table2, n = 3)
```

```
## # A tibble: 3 × 4
##   country      year type          count
##   <chr>       <dbl> <chr>         <dbl>
*## 1 Afghanistan  1999 cases           745
*## 2 Afghanistan  1999 population 19987071
## 3 Afghanistan  2000 cases          2666
```

---

# `pivot_wider()` for table2 [2]

.footnote[
<html>
<hr>
</html>
**Source:** Slide is based on Wickham, H. Grolemnund, G. (2017). "R for Data Science", O'Reily. <https://r4ds.had.co.nz/tidy-data.html>.
]

---

# `pivot_wider()` for table2 [3]

``` r
tidyr::pivot_wider(table2, names_from = type, values_from = count)
```

```
## # A tibble: 6 × 4
*##   country      year  cases population
*##   <chr>       <dbl>  <dbl>      <dbl>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3 Brazil       1999  37737  172006362
## 4 Brazil       2000  80488  174504898
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583
```

---

# `separate()` for table3 [1]

`table3` has a different problem:

- we have one column (`rate`) that contains two variables (`cases` and `population`).

- To fix this problem, we’ll need the `separate()` function.

---

# `separate()` for table3 [2]

.footnote[
<html>
<hr>
</html>
**Source:** Slide is based on Wickham, H. Grolemnund, G. (2017). "R for Data Science", O'Reily. <https://r4ds.had.co.nz/tidy-data.html>.
]

---

# `separate()` for table3 [3]

``` r
tidyr::separate(table3, rate, into = c("cases", "population"), convert = TRUE)
```

```
## # A tibble: 6 × 4
*##   country      year  cases population
*##   <chr>       <dbl>  <int>      <int>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3 Brazil       1999  37737  172006362
## 4 Brazil       2000  80488  174504898
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583
```

---

# Non-graded Class Activity

.panelset[

.panel[.panel-name[Activity]

.small[

> _In this five minute non-graded activity, please do the following_

- Go to <https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/>

- Download the data for `Deaths` by clicking on the tab to the right of the page.

- **Tidy this data based on the information you have learned in today's class.**

]
]

.panel[.panel-name[`Your Solution`]
In your RStudio Session, please read the data, load the required packages and write the code needed to transform the `deaths` data into a tidy format.
]
]

---
class: center, inverse, middle

# Recap

---

# Summary of Main Points

By now, you should be able to do the following:

- Define tidy data

---

background-image: url(data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/main/rstats-artwork/tidydata_5.jpg)
background-size: contain

# Advantages of Tidy Data

* one set of consistent tools for different datasets
* easier for automation and iteration

???

* To work with messy data, every time you need to switch to different gears
* learn new tools that just works for that specific dataset
* It's much more pleasant to work with tidy data, help you to build a good taste of data analysis
* spent less fighting with different tools, focus more on data analysis bc one set of consistent tools
* the {tidyverse} philosophy to work with the tidy data structures
* build automatic workflow for analysis, feed different data sets.

---

# Things to Do Prior to Next Class

Please go through the following two supplementary readings and complete [assignment 06: tidy data](https://miamioh.instructure.com/courses/223961/quizzes/665878).

.pull-left[
.center[[<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" height="400px">](https://r4ds.had.co.nz)]
]
.pull-right[
* [Tidy data](https://r4ds.had.co.nz/tidy-data.html)
* [{tidyr} cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf)
]