Summarizing Data

class: center, middle, inverse, title-slide

# Summarizing Data
## DATA 606 - Statistics & Probability for Data Analytics
### Jason Bryer, Ph.D. and Angela Lui, Ph.D.
### September 8, 2021

---

# Agenda

.pull-left[.font130[
* Questions
* Homework Presentations
* Data wrangling
	* Data types
	* Descriptive statistics
* Data visualization
	* Grammar of graphics
	* Types of graphics
]]
.pull-right[
<img src='images/data_wrangler.png' alt='Data Wrangler' width='100%' />
.right[.font60[ Image source: [@allison_horst](https://twitter.com/allison_horst) ]]
]

---
# One Minute Paper Results

.pull-left[
**What was the most important thing you learned during this class?**
<img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />
]
.pull-right[
**What important question remains unanswered for you?**
<img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />
]

---
# Announcements

There is an error in the homework 2 Rmarkdown file. The `OIdata` is no longer available on CRAN. All the data for the textbook has been moved to the `openintro` package. However, in that conversion some of the datasets have been renamed. The `heartTr` data frame has been renamed to `heart_transplant`. You have two options:

1. If you already started the homework, search and replace `heartTr` with `heart_transplant`.

2. If not, the Rmd file has been updated on Github: https://github.com/jbryer/DATA606Fall2021/blob/master/Homework/Homework2.Rmd

___________

I made a few minor updates to the `DATA606` package. You can update by reinstalling:

```r
remotes::install_github('jbryer/DATA606')
```

---
# Homework Presentations

* 1.13 Lisa Szydziak

* 1.37 Esteban Aramayo

* 1.43 Mauricio Claudio

---
# Workflow

.center[
<img src='images/data-science-wrangle.png' alt = 'Data Science Workflow' width='1000' />
]

.font80[Source: [Wickham & Grolemund, 2017](https://r4ds.had.co.nz)]

---
# Tidy Data

.center[
<img src='images/tidydata_1.jpg' height='500' />
]

See Wickham (2014) [Tidy data](https://vita.had.co.nz/papers/tidy-data.html).

---
# Types of Data

.pull-left[
* Numerical (quantitative)
	* Continuous
	* Discrete
]
.pull-right[
* Categorical (qualitative)
	* Regular categorical
	* Ordinal
]
.center[
<img src='images/continuous_discrete.png' height='400' />
]

---
# Data Types in R

---
# Data Types / Descriptives / Visualizations

Data Type    |  Descriptive Stats                            | Visualization
-------------|-----------------------------------------------|-------------------|
Continuous   | mean, median, mode, standard deviation, IQR   | histogram, density, box plot
Discrete     | contingency table, proportional table, median | bar plot
Categorical  | contingency table, proportional table         | bar plot
Ordinal      | contingency table, proportional table, median | bar plot
Two quantitative | correlation                               | scatter plot
Two qualitative  | contingency table, chi-squared            | mosaic plot, bar plot
Quantitative & Qualitative | grouped summaries, ANOVA, t-test | box plot

---
# Robust Statistics

Median and IQR are more robust to skewness and outliers than mean and SD. Therefore,

* for skewed distributions it is often more helpful to use median and IQR to describe the center and spread

* for symmetric distributions it is often more helpful to use the mean and SD to describe the center and spread

---
# About `legosets` <img src="images/hex/brickset.png" class="title-hex">

To install the `brickset` package:

```r
remotes::install_github('jbryer/brickset')
```

To load the load the `legosets` dataset.

```r
data('legosets', package = 'brickset')
```

The `legosets` data has 16355 observations of 34 variables.

.code70[

```r
names(legosets)
```

```
##  [1] "setID"                 "name"                  "year"                  "theme"                
##  [5] "themeGroup"            "subtheme"              "category"              "released"             
##  [9] "pieces"                "minifigs"              "bricksetURL"           "rating"               
## [13] "reviewCount"           "packagingType"         "availability"          "agerange_min"         
## [17] "US_retailPrice"        "US_dateFirstAvailable" "US_dateLastAvailable"  "UK_retailPrice"       
## [21] "UK_dateFirstAvailable" "UK_dateLastAvailable"  "CA_retailPrice"        "CA_dateFirstAvailable"
## [25] "CA_dateLastAvailable"  "DE_retailPrice"        "DE_dateFirstAvailable" "DE_dateLastAvailable" 
## [29] "height"                "width"                 "depth"                 "weight"               
## [33] "thumbnailURL"          "imageURL"
```
]

---
# Structure (`str`) <img src="images/hex/brickset.png" class="title-hex">

.code50[

```r
str(legosets)
```

```
## 'data.frame':	16355 obs. of  34 variables:
##  $ setID                : int  7693 7695 7697 7698 25534 7418 7419 6020 22704 7421 ...
##  $ name                 : chr  "Small house set" "Medium house set" "Medium house set" "Large house set" ...
##  $ year                 : int  1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
##  $ theme                : chr  "Minitalia" "Minitalia" "Minitalia" "Minitalia" ...
##  $ themeGroup           : chr  "Vintage" "Vintage" "Vintage" "Vintage" ...
##  $ subtheme             : chr  NA NA NA NA ...
##  $ category             : chr  "Normal" "Normal" "Normal" "Normal" ...
##  $ released             : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ pieces               : int  67 109 158 233 NA 1 1 60 65 NA ...
##  $ minifigs             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ bricksetURL          : chr  "https://brickset.com/sets/1-8" "https://brickset.com/sets/2-8" "https://brickset.com/sets/3-6" "https://brickset.com/sets/4-4" ...
##  $ rating               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ reviewCount          : int  0 0 1 0 0 0 0 1 0 0 ...
##  $ packagingType        : chr  "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ...
##  $ availability         : chr  "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ...
##  $ agerange_min         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ US_retailPrice       : num  NA NA NA NA NA 1.99 NA NA 4.99 NA ...
##  $ US_dateFirstAvailable: Date, format: NA NA NA NA ...
##  $ US_dateLastAvailable : Date, format: NA NA NA NA ...
##  $ UK_retailPrice       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ UK_dateFirstAvailable: Date, format: NA NA NA NA ...
##  $ UK_dateLastAvailable : Date, format: NA NA NA NA ...
##  $ CA_retailPrice       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ CA_dateFirstAvailable: Date, format: NA NA NA NA ...
##  $ CA_dateLastAvailable : Date, format: NA NA NA NA ...
##  $ DE_retailPrice       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DE_dateFirstAvailable: Date, format: NA NA NA NA ...
##  $ DE_dateLastAvailable : Date, format: NA NA NA NA ...
##  $ height               : num  NA NA NA NA NA ...
##  $ width                : num  NA NA NA NA NA ...
##  $ depth                : num  NA NA NA NA NA NA NA NA 5.08 NA ...
##  $ weight               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ thumbnailURL         : chr  "https://images.brickset.com/sets/small/1-8.jpg" "https://images.brickset.com/sets/small/2-8.jpg" "https://images.brickset.com/sets/small/3-6.jpg" "https://images.brickset.com/sets/small/4-4.jpg" ...
##  $ imageURL             : chr  "https://images.brickset.com/sets/images/1-8.jpg" "https://images.brickset.com/sets/images/2-8.jpg" "https://images.brickset.com/sets/images/3-6.jpg" "https://images.brickset.com/sets/images/4-4.jpg" ...
```

]

---
# RStudio Eenvironment tab can help <img src="images/hex/rstudio.png" class="title-hex">

---
class: hide-logo
# Table View

.font60[

<div id="htmlwidget-a80066f240c2042495d1" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-a80066f240c2042495d1">{"x":{"filter":"none","fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100"],[22576,24302,30406,4291,23553,28448,28991,1404,7874,2078,27152,29584,1621,22885,8029,9963,4809,1604,7824,6133,8860,23747,6614,27881,6807,26043,8663,25612,6793,29840,6153,2408,25754,9236,27433,8622,1189,4138,23662,27929,668,5539,8914,5763,8447,29908,24831,3603,27406,1615,1466,2493,5793,9737,27854,5363,10255,26884,8644,23777,1734,26003,13665,6167,31005,6891,24413,28480,27778,26865,6637,23798,9525,4776,29320,7253,3163,4132,27990,1,26790,29536,22,30557,4501,24014,331,26366,5354,23437,6040,24705,1618,31377,9660,26045,9430,28985,24691,22933],["Stunt Pilot and Plane","The IMP for the Enterprise","LEGO 4 stud Red Storage Brick Drawer","Slammer Raptor","Optosensors (4.5V) and Discs","Rex's Rexplorer!","The LEGO Movie 2 Awesome Trading Cards","Stacking Tower","Groundhog","Story Builder Starter Set - Jungle Jam","Dance Instructor","First Order Stormtrooper ","Tony Parker","Evil Mech","Court Jester Key Chain","Arctic Batman vs. Mr. Freeze: Aquaman on Ice","Jedi Starfighter","Hook-Truck","Mini Basic Bricks - 29 elements","Viking Cape","Training Set","Technic Control Centre","Vintage Minifigure Collection Vol. 1","Batman: The Attack of the Talons","Vorox","Kai","Diver Key Chain","The Village","Darth Vader's TIE Fighter","Huntress","Rolling Storage Box - Blue/Green","Hockey Headshox","The 2011 LEGO Minifigure Catalog: 1st Edition","Red","Egghead Mech Food Fight","Panda","Play House","Sky Flyer 1","Brick White Adult Watch","Year of the Pig","Milk Truck","Classic Elastic Watch","Mini Fire Truck","QUICK Good Guy Red","Pilot","2x4 Lavender Luggage Tag","{Ninjago Accessory Pack}","Ogel Command Striker","Iron Man MK50","Blue Deer","Helicopter","Basic Building Set, 5+","Train Station","Superman Shield","Friends Clubhouse","Dirt Crusher RC","Grand Prix Truck","Aaron's Stone Destroyer","AAA Battery Box","The LEGO Movie DVD","Paddle Steamer","Mariachi","Bonus/Value Pack","Castle Tic Tac Toe","Still Life with Bricks 100 Collectable Postcards","Bonus/Value Pack","Hans Moleman","Andrea's Talent Show","Destiny's Wing","Eraser","Twinpocket Portfolio","Gorilla Grodd Goes Bananas","Daredevil","Jet Fuel Truck","Police officer","Magma Mech","Road Signs","Pull-Back Motor, Red","Star Wars: Choose Your Path","Antenna Pack","Sea Rescue Plane","Lunch Set Iconic Girl","Rebel Blockade Runner","Resistance I-TS Transport","Street Chopper","Electro","Pirate Lookout","Fun Time Gears","City Advent Calendar","Magnifo","Duplo Zoo Sticker Sheet","Pumpkin's Royal Carriage","Forklift","Motor housing","Mining Quad","Heartlake Performance School","Shoot 'n' Save (Bayern Munich FC Edition)","Molten Man Battle","Chopper Transporter","Parisian Restaurant"],[1985,2010,2020,2002,1986,2019,2019,2001,2010,2002,2017,2019,2003,2013,2010,2013,2005,1998,1986,2007,2012,1987,2008,2018,2009,2016,2011,2016,2009,2020,2007,2003,2012,2012,2018,2011,2000,1998,2014,2019,1989,2006,2012,2006,2011,2020,2015,2001,2018,2003,1998,1985,2007,2012,2018,2006,2013,2017,2011,2014,1985,2016,2004,2007,2020,2006,2015,2019,2018,2017,2008,2015,2012,2005,2004,2010,1988,1990,2018,2004,2017,2019,2001,2020,1993,2014,1992,1995,2006,2014,2007,2016,1999,1986,2012,2016,2003,2019,2016,2014],["Duplo","Serious Play","Gear","Racers","Service Packs","The LEGO Movie 2","Gear","Baby","Promotional","Explore","Collectable Minifigures","Star Wars","Sports","Collectable Minifigures","Gear","DC Comics Super Heroes","Star Wars","Znap","Dacta","Gear","Ninjago","Dacta","Miscellaneous","DC Comics Super Heroes","Bionicle","Ninjago","Gear","Minecraft","Star Wars","Collectable Minifigures","Gear","Sports","Books","Duplo","The LEGO Batman Movie","Creator","Duplo","Technic","Gear","Seasonal","Town","Gear","Creator","Bionicle","Collectable Minifigures","Gear","Ninjago","Alpha Team","BrickHeadz","Explore","Basic","Basic","City","Promotional","Friends","Racers","City","Nexo Knights","Power Functions","Gear","Fabuland","Collectable Minifigures","Castle","Gear","Gear","Bionicle","Collectable Minifigures","Friends","Ninjago","Collectable Minifigures","Gear","DC Comics Super Heroes","Collectable Minifigures","Duplo","Duplo","Power Miners","Town","Basic","Books","Racers","City","Gear","Star Wars","Star Wars","Technic","Marvel Super Heroes","Pirates","Dacta","City","Mixels","Gear","Disney","Znap","Service Packs","City","Friends","Sports","Marvel Super Heroes","Creator","Creator Expert"],["Pre-school","Educational","Miscellaneous","Racing","Miscellaneous","Licensed","Miscellaneous","Pre-school","Miscellaneous","Pre-school","Miscellaneous","Licensed","Modern day","Miscellaneous","Miscellaneous","Licensed","Licensed","Technical","Educational","Miscellaneous","Action/Adventure","Educational","Miscellaneous","Licensed","Constraction","Action/Adventure","Miscellaneous","Licensed","Licensed","Miscellaneous","Miscellaneous","Modern day","Miscellaneous","Pre-school","Licensed","Model making","Pre-school","Technical","Miscellaneous","Miscellaneous","Modern day","Miscellaneous","Model making","Constraction","Miscellaneous","Miscellaneous","Action/Adventure","Action/Adventure","Licensed","Pre-school","Basic","Basic","Modern day","Miscellaneous","Girls","Racing","Modern day","Action/Adventure","Technical","Miscellaneous","Junior","Miscellaneous","Historical","Miscellaneous","Miscellaneous","Constraction","Miscellaneous","Girls","Action/Adventure","Miscellaneous","Miscellaneous","Licensed","Miscellaneous","Pre-school","Pre-school","Action/Adventure","Modern day","Basic","Miscellaneous","Racing","Modern day","Miscellaneous","Licensed","Licensed","Technical","Licensed","Historical","Educational","Modern day","Licensed","Miscellaneous","Licensed","Technical","Miscellaneous","Modern day","Girls","Modern day","Licensed","Model making","Model making"],["Normal","Normal","Gear","Normal","Normal","Normal","Gear","Normal","Other","Normal","Normal","Other","Normal","Normal","Gear","Normal","Normal","Normal","Normal","Gear","Normal","Normal","Extended","Normal","Normal","Other","Gear","Normal","Normal","Normal","Gear","Normal","Book","Normal","Normal","Normal","Normal","Normal","Gear","Normal","Normal","Gear","Normal","Normal","Normal","Gear","Extended","Normal","Normal","Normal","Normal","Normal","Normal","Other","Extended","Normal","Normal","Normal","Normal","Gear","Normal","Normal","Collection","Gear","Gear","Collection","Normal","Normal","Normal","Normal","Gear","Normal","Normal","Normal","Normal","Normal","Normal","Normal","Book","Other","Normal","Gear","Normal","Normal","Normal","Extended","Normal","Normal","Normal","Normal","Gear","Normal","Normal","Normal","Normal","Normal","Normal","Normal","Normal","Normal"],[null,null,null,10,null,119.99,null,null,0,50,3.99,null,null,2.99,4.99,19.99,4.99,null,null,null,19.99,null,17.99,19.99,12.99,null,4.99,199.99,29.99,4.99,119.99,null,null,14.99,29.99,null,70,null,99.99,9.99,null,19.99,5.99,null,2.99,4.99,null,4,9.99,3.5,0.6,null,44.99,null,6.99,40,29.99,24.99,12.99,null,20,3.99,null,14.99,null,null,3.99,49.99,19.99,3.99,1.99,49.99,2.99,null,null,19.99,3,12.25,null,null,19.99,11.99,200,99.99,null,null,null,null,20,4.99,null,9.99,null,null,4.99,79.99,null,29.99,9.99,159.99],[2,null,null,144,4,1187,null,5,38,10,6,5,4,8,null,198,38,30,29,null,219,471,21,155,51,7,null,1600,251,10,null,36,null,13,293,62,73,41,null,152,133,null,69,21,7,null,27,29,101,10,17,98,387,257,27,88,315,251,1,null,61,6,87,null,null,124,4,492,181,5,null,347,8,10,null,183,11,8,null,13,141,null,1747,932,409,5,17,27,257,61,null,79,39,2,40,774,110,294,124,2469],[1,null,null,null,null,2,null,null,null,null,1,1,1,1,null,3,null,null,2,null,1,4,5,3,null,1,null,8,1,1,null,null,null,null,3,null,4,null,null,null,1,null,null,null,1,null,null,1,null,null,null,null,5,null,1,null,3,2,null,null,3,1,null,10,null,null,1,2,2,1,null,6,1,1,null,2,null,null,1,null,2,null,null,4,null,1,1,null,9,null,null,null,null,null,1,3,2,3,null,5],[0,0,0,0,0,3.7,3.4,0,0,0,3.6,3.9,0,3.8,0,3.8,4,0,0,0,3.4,0,0,3.6,4.3,3.8,0,4.1,4.1,4.2,0,0,0,0,4.3,0,0,0,0,3.7,0,0,3.4,0,3.5,0,0,3.5,4,0,0,0,3.9,0,3.3,0,3.8,3.7,3.8,0,0,4.1,0,0,0,0,3.7,0,3.8,3.9,0,3.7,3.7,0,0,4.2,3.5,0,0,0,3.5,0,4.5,4.2,3.8,4.1,3.1,0,0,3.6,0,3.2,0,0,3.7,3.4,0,3.9,3.6,4.6]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>setID<\/th>\n      <th>name<\/th>\n      <th>year<\/th>\n      <th>theme<\/th>\n      <th>themeGroup<\/th>\n      <th>category<\/th>\n      <th>US_retailPrice<\/th>\n      <th>pieces<\/th>\n      <th>minifigs<\/th>\n      <th>rating<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":10,"columnDefs":[{"className":"dt-right","targets":[1,3,7,8,9,10]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

---
# Data Wrangling Cheat Sheet <img src="images/hex/dplyr.png" class="title-hex">

.center[
<a href='https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf' target='_new'><img src='images/data-transformation.png' width='700' /></a>
]

---
# Tidyverse vs Base R <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/pipe.png" class="title-hex">

.center[
<a href='images/R_Syntax_Comparison.jpeg' target='_new'><img src="images/R_Syntax_Comparison.jpeg" width='700' /></a>
]

---
# Pipes `%>%` <img src="images/hex/magrittr.png" class="title-hex">

.font90[
The pipe operator (`%>%`) introduced with the `magrittr` R package allows for the chaining of R operations. It takes the output from the left-hand side and passes it as the first parameter to the function on the right-hand side. In base R, to get the output of a proportional table, you need to first call `table` then `prop.table`. 
]

.pull-left[
You can do this in two steps:

```r
tab_out <- table(legosets$category)
prop.table(tab_out)
```

Or as nested function calls.

```r
prop.table(table(legosets$category))
```
]
.pull-right[
Using the pipe (`%>%`) operator we can chain these calls in a what is arguably a more readable format:

```r
table(legosets$category) %>% prop.table()
```
]

<hr />

```
## 
##        Book  Collection    Extended        Gear      Normal       Other      Random 
## 0.028798533 0.032100275 0.025191073 0.143564659 0.713420972 0.054050749 0.002873739
```

---
# Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

.center[
<img src='images/dplyr_filter_sm.png' width='800' />
]

---
# Logical Operators

* `!a` - TRUE if a is FALSE
* `a == b` - TRUE if a and be are equal
* `a != b` - TRUE if a and b are not equal
* `a > b` - TRUE if a is larger than b, but not equal
* `a >= b` - TRUE if a is larger or equal to b
* `a < b` - TRUE if a is smaller than be, but not equal
* `a <= b` - TRUE if a is smaller or equal to b
* `a %in% b` - TRUE if a is in b where b is a vector

```r
which( letters %in% c('a','e','i','o','u') )
```

```
## [1]  1  5  9 15 21
```
* `a | b` - TRUE if a *or* b are TRUE
* `a & b` - TRUE if a *and* b are TRUE
* `isTRUE(a)` - TRUE if a is TRUE

---
# Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

### `dplyr`

```r
mylego <- legosets %>% filter(themeGroup == 'Educational' & year > 2015)
```

### Base R

```r
mylego <- legosets[legosets$themeGroups == 'Educaitonal' & legosets$year > 2015,]
```

<hr />

```r
nrow(mylego)
```

```
## [1] 61
```

---
# Select <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

### `dplyr`

```r
mylego <- mylego %>% select(setID, pieces, theme, availability, US_retailPrice, minifigs)
```

### Base R

```r
mylego <- mylego[,c('setID', 'pieces', 'theme', 'availability', 'US_retailPrice', 'minifigs')]
```

<hr />

```r
head(mylego, n = 4)
```

```
##   setID pieces     theme    availability US_retailPrice minifigs
## 1 26803    103 Education {Not specified}             NA        6
## 2 26689    142 Education {Not specified}             NA        4
## 3 26804     98 Education {Not specified}             NA        6
## 4 26277    188 Education     Educational          78.95       NA
```

---
# Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

.center[
<img src='images/dplyr_relocate.png' width='800' />
]

---
# Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

### `dplyr`

```r
mylego %>% relocate(where(is.numeric), .after = where(is.character)) %>% head(n = 3)
```

```
##       theme    availability setID pieces US_retailPrice minifigs
## 1 Education {Not specified} 26803    103             NA        6
## 2 Education {Not specified} 26689    142             NA        4
## 3 Education {Not specified} 26804     98             NA        6
```

### Base R

```r
mylego2 <- mylego[,c('theme', 'availability', 'setID', 'pieces', 'US_retailPrice', 'minifigs')]
head(mylego2, n = 3)
```

---
# Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

.center[
<img src='images/rename_sm.jpg' width='1000' />
]

---
# Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

### `dplyr`

```r
mylego %>% dplyr::rename(USD = US_retailPrice) %>% head(n = 3)
```

```
##   setID pieces     theme    availability USD minifigs
## 1 26803    103 Education {Not specified}  NA        6
## 2 26689    142 Education {Not specified}  NA        4
## 3 26804     98 Education {Not specified}  NA        6
```

### Base R

```r
names(mylego2)[5] <- 'USD'
head(mylego2, n = 3)
```

```
##       theme    availability setID pieces USD minifigs
## 1 Education {Not specified} 26803    103  NA        6
## 2 Education {Not specified} 26689    142  NA        4
## 3 Education {Not specified} 26804     98  NA        6
```

---
# Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

.center[
<img src='images/dplyr_mutate.png' width='700' />
]

---
# Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

### `dplyr`

```r
mylego %>% filter(!is.na(pieces) & !is.na(US_retailPrice)) %>% 
	mutate(Price_per_piece = US_retailPrice / pieces) %>% head(n = 3)
```

```
##   setID pieces     theme availability US_retailPrice minifigs Price_per_piece
## 1 26277    188 Education  Educational          78.95       NA       0.4199468
## 2 25949    280 Education  Educational         224.95       NA       0.8033929
## 3 25954      1 Education  Educational          14.95       NA      14.9500000
```

### Base R

```r
mylego2 <- mylego[!is.na(mylego$US_retailPrice) & !is.na(mylego$Price_per_piece),]
mylego2$Price_per_piece <- mylego2$Price_per_piece / mylego2$US_retailPrice
head(mylego2, n = 3)
```

```
## [1] setID           pieces          theme           availability    US_retailPrice  minifigs        Price_per_piece
## <0 rows> (or 0-length row.names)
```

---
# Group By and Summarize <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex">

.code80[

```r
legosets %>% group_by(themeGroup) %>% summarize(mean_price = mean(US_retailPrice, na.rm = TRUE),
												sd_price = sd(US_retailPrice, na.rm = TRUE),
												median_price = median(US_retailPrice, na.rm = TRUE),
												n = n(),
												missing = sum(is.na(US_retailPrice)))
```

```
## # A tibble: 15 × 6
##    themeGroup       mean_price sd_price median_price     n missing
##    <chr>                 <dbl>    <dbl>        <dbl> <int>   <int>
##  1 Action/Adventure      31.3     29.9         20.0   1280     462
##  2 Basic                 13.1     12.8          7.99   843     473
##  3 Constraction          15.1     14.0          9.99   501     125
##  4 Educational           89.0    107.          59.7    452     294
##  5 Girls                 23.4     22.6         15.0    677     225
##  6 Historical            25.5     27.7         15.0    473     125
##  7 Junior                18.6     13.2         17.8    228      93
##  8 Licensed              42.9     58.3         25.0   2060     467
##  9 Miscellaneous         14.3     20.8          6.99  4925    2117
## 10 Model making          52.8     65.1         30.0    582     166
## 11 Modern day            31.2     33.7         20.0   1723     763
## 12 Pre-school            23.8     19.4         20.0   1487     699
## 13 Racing                24.8     30.2         10      270      59
## 14 Technical             60.8     68.1         40.0    550     137
## 15 Vintage                9.71     9.56         7.50   304     264
```
]

---
# Describe and Describe By

```r
library(psych)
describe(legosets$US_retailPrice)
```

```
##    vars    n  mean sd median trimmed   mad min    max  range skew kurtosis   se
## X1    1 9886 28.52 42  14.99   20.14 14.83   0 799.99 799.99 5.62    58.91 0.42
```

```r
describeBy(legosets$US_retailPrice, group = legosets$availability, mat = TRUE, skew = FALSE)
```

```
##      item                group1 vars    n      mean        sd   min    max  range         se
## X11     1       {Not specified}    1 3197  24.24484 36.282072  0.60 789.99 789.39  0.6416833
## X12     2           Educational    1    9 140.95000 86.358265 14.95 244.95 230.00 28.7860885
## X13     3        LEGO exclusive    1 1066  28.79797 70.954538  0.00 799.99 799.99  2.1732094
## X14     4    LEGOLAND exclusive    1    7  12.70429  6.447591  4.99  19.99  15.00  2.4369603
## X15     5              Not sold    1    1  12.99000        NA 12.99  12.99   0.00         NA
## X16     6           Promotional    1  167   9.19485 23.667555  0.00 249.99 249.99  1.8314504
## X17     7 Promotional (Airline)    1   11  15.79455  6.614819  5.00  28.00  23.00  1.9944429
## X18     8                Retail    1 4824  29.82030 33.270049  1.95 399.99 398.04  0.4790158
## X19     9      Retail - limited    1  600  44.64837 57.391438  0.40 379.99 379.59  2.3429956
## X110   10               Unknown    1    4   2.24750  1.253671  1.00   3.99   2.99  0.6268356
```

---
class: middle
# Grammer of Graphics

.center[
<img src="images/ggplot2_masterpiece.png" height="550" />
]

---
# Data Visualizations with ggplot2 <img src="images/hex/ggplot2.png" class="title-hex">

* `ggplot2` is an R package that provides an alternative framework based upon Wilkinson’s (2005) Grammar of Graphics.

* `ggplot2` is, in general, more flexible for creating "prettier" and complex plots.

* Works by creating layers of different types of objects/geometries (i.e. bars, points, lines, polygons, etc.)
`ggplot2` has at least three ways of creating plots:
     1. `qplot`
     2. `ggplot(...) + geom_XXX(...) + ...`
     3. `ggplot(...) + layer(...)`

* We will focus only on the second.

---
# Parts of a `ggplot2` Statement <img src="images/hex/ggplot2.png" class="title-hex">

* Data  
`ggplot(myDataFrame, aes(x=x, y=y))`

* Layers  
`geom_point()`, `geom_histogram()`

* Facets  
`facet_wrap(~ cut)`, `facet_grid(~ cut)`

* Scales  
`scale_y_log10()`

* Other options  
`ggtitle('my title')`, `ylim(c(0, 10000))`, `xlab('x-axis label')`

---
# Lots of geoms <img src="images/hex/ggplot2.png" class="title-hex">

```r
ls('package:ggplot2')[grep('^geom_', ls('package:ggplot2'))]
```

```
##  [1] "geom_abline"            "geom_area"              "geom_bar"               "geom_bin_2d"           
##  [5] "geom_bin2d"             "geom_blank"             "geom_boxplot"           "geom_col"              
##  [9] "geom_contour"           "geom_contour_filled"    "geom_count"             "geom_crossbar"         
## [13] "geom_curve"             "geom_density"           "geom_density_2d"        "geom_density_2d_filled"
## [17] "geom_density2d"         "geom_density2d_filled"  "geom_dotplot"           "geom_errorbar"         
## [21] "geom_errorbarh"         "geom_freqpoly"          "geom_function"          "geom_hex"              
## [25] "geom_histogram"         "geom_hline"             "geom_jitter"            "geom_label"            
## [29] "geom_line"              "geom_linerange"         "geom_map"               "geom_path"             
## [33] "geom_point"             "geom_pointrange"        "geom_polygon"           "geom_qq"               
## [37] "geom_qq_line"           "geom_quantile"          "geom_raster"            "geom_rect"             
## [41] "geom_ribbon"            "geom_rug"               "geom_segment"           "geom_sf"               
## [45] "geom_sf_label"          "geom_sf_text"           "geom_smooth"            "geom_spoke"            
## [49] "geom_step"              "geom_text"              "geom_tile"              "geom_violin"           
## [53] "geom_vline"
```

---
# Data Visualization Cheat Sheet <img src="images/hex/ggplot2.png" class="title-hex">

.center[
<a href='https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf'><img src='images/data-visualization-2.1.png' width='700' /></a>
]

---
# Scatterplot  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=pieces, y=US_retailPrice)) + geom_point()
```

---
# Scatterplot (cont.)  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=pieces, y=US_retailPrice, color=availability)) + geom_point()
```

---
# Scatterplot (cont.)  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs, color=availability)) + geom_point()
```

---
# Scatterplot (cont.)  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs)) + geom_point() + facet_wrap(~ availability)
```

---
# Boxplots  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x='Lego', y=US_retailPrice)) + geom_boxplot()
```

---
# Boxplots (cont.)  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot()
```

---
# Boxplot (cont.)  <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot() + coord_flip()
```

---
# Histograms <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram()
```

---
# Histograms (cont.)<img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram() + scale_x_log10()
```

---
# Histograms (cont.) <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram() + facet_wrap(~ availability)
```

---
# Density Plots <img src="images/hex/ggplot2.png" class="title-hex">

```r
ggplot(legosets, aes(x = US_retailPrice, color = availability)) + geom_density()
```

---
# `ggplot2` aesthetics <img src="images/hex/ggplot2.png" class="title-hex">

.center[
<a href='images/ggplot_aesthetics_cheatsheet.png' target='_new'> <img src='images/ggplot_aesthetics_cheatsheet.png' height='550' /></a>
]

---
# Likert Scales <img src="images/hex/likert.png" class="title-hex">

Likert scales are a type of questionnaire where respondents are asked to rate items on scales usually ranging from four to seven levels (e.g. strongly disagree to strongly agree).

```r
library(likert)
library(reshape)
data(pisaitems)
items24 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST24Q']
items24 <- rename(items24, c(
			ST24Q01="I read only if I have to.",
			ST24Q02="Reading is one of my favorite hobbies.",
			ST24Q03="I like talking about books with other people.",
			ST24Q04="I find it hard to finish books.",
			ST24Q05="I feel happy if I receive a book as a present.",
			ST24Q06="For me, reading is a waste of time.",
			ST24Q07="I enjoy going to a bookstore or a library.",
			ST24Q08="I read only to get information that I need.",
			ST24Q09="I cannot sit still and read for more than a few minutes.",
			ST24Q10="I like to express my opinions about books I have read.",
			ST24Q11="I like to exchange books with my friends."))
```

---
# `likert` R Package <img src="images/hex/likert.png" class="title-hex">

```r
l24 <- likert(items24)
summary(l24)
```

```
##                                                        Item      low neutral     high     mean        sd
## 10   I like to express my opinions about books I have read. 41.07516       0 58.92484 2.604913 0.9009968
## 5            I feel happy if I receive a book as a present. 46.93475       0 53.06525 2.466751 0.9446590
## 8               I read only to get information that I need. 50.39874       0 49.60126 2.484616 0.9089688
## 7                I enjoy going to a bookstore or a library. 51.21231       0 48.78769 2.428508 0.9164136
## 3             I like talking about books with other people. 54.99129       0 45.00871 2.328049 0.9090326
## 11                I like to exchange books with my friends. 55.54115       0 44.45885 2.343193 0.9609234
## 2                    Reading is one of my favorite hobbies. 56.64470       0 43.35530 2.344530 0.9277495
## 1                                 I read only if I have to. 58.72868       0 41.27132 2.291811 0.9369023
## 4                           I find it hard to finish books. 65.35125       0 34.64875 2.178299 0.8991628
## 9  I cannot sit still and read for more than a few minutes. 76.24524       0 23.75476 1.974736 0.8793028
## 6                       For me, reading is a waste of time. 82.88729       0 17.11271 1.810093 0.8611554
```

---
# `likert` Plots  <img src="images/hex/likert.png" class="title-hex">

```r
plot(l24)
```

---
# `likert` Plots  <img src="images/hex/likert.png" class="title-hex">

```r
plot(l24, type='heat')
```

---
# `likert` Plots  <img src="images/hex/likert.png" class="title-hex">

```r
plot(l24, type='density')
```

---
class: font90
# Dual Scales <img src="images/hex/shiny.png" class="title-hex">

Some problems<sup>1</sup>:

* The designer has to make choices about scales and this can have a big impact on the viewer
* "Cross-over points” where one series cross another are results of the design choices, not intrinsic to the data, and viewers (particularly unsophisticated viewers)
* They make it easier to lazily associate correlation with causation, not taking into account autocorrelation and other time-series issues
* Because of the issues above, in malicious hands they make it possible to deliberately mislead

This example looks at the relationship between NZ dollar exchange rate and trade weighted index.

```r
DATA606::shiny_demo('DualScales', package='DATA606')
```

My advise:

* Avoid using them. You can usually do better with other plot types.
* When necessary (or compelled) to use them, rescale (using z-scores, we'll discuss this in a few weeks)

.font50[
<sup>1</sup> http://blog.revolutionanalytics.com/2016/08/dual-axis-time-series.html  
<sup>2</sup> http://ellisp.github.io/blog/2016/08/18/dualaxes
]

---
# Pie Charts

There is only one pie chart in *OpenIntro Statistics* (Diez, Barr, & Çetinkaya-Rundel, 2015, p. 48). Consider the following three pie charts that represent the preference of five different colors. Is there a difference between the three pie charts? This is probably a difficult to answer.

---
# Pie Charts

Source: [https://en.wikipedia.org/wiki/Pie_chart](https://en.wikipedia.org/wiki/Pie_chart).

---
class: middle
# Just say NO to pie charts!

.font150[
"There is no data that can be displayed in a pie chart that cannot better be displayed in some other type of chart"]
.right[.font130[John Tukey]]

---
# Additional Resources

For data wrangling:

* `dplyr` website: https://dplyr.tidyverse.org
* R for Data Science book: https://r4ds.had.co.nz/wrangle-intro.html
* Wrangling penguins tutorial: https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome
* Data transformation cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf

For data visualization:

* `ggplot2` website: https://ggplot2.tidyverse.org
* R for Data Science book: https://r4ds.had.co.nz/data-visualisation.html
* R Graphics Cookbook: https://r-graphics.org
* Data visualization cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf

---
class: left
# One Minute Paper

.font140[
Complete the one minute paper: 
https://forms.gle/ENFqTnDB5fJDw3kx9

1. What was the most important thing you learned during this class?
2. What important question remains unanswered for you?
]