Week 2: ggplot

ggplot2 Overview

Why ggplot2?

Advantages of ggplot2

  • consistent underlying grammar of graphics (Wilkinson, 2005)
  • very flexible
  • theme system for polishing plot appearance

Grammar of Graphics

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.

Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • faceting

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

Aesthetic Mapping

Aesthetics are things that you can see. Examples include:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

Aesthetic mappings are set with the aes() function.

Geometric Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point)
  • lines (geom_line)
  • boxplot (geom_boxplot)

A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

Exercise 1: NCAA Basketball data

We will use data from the NCAA basketball tournament from 2011 - 2016 to create a polished figure.

hoops <- read_csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/TourneyDetailedResults.csv')
hoops_2011 <- hoops %>% filter(Season >= 2011)

NCAA Basketball Data Overview

glimpse(hoops_2011)
## Rows: 402
## Columns: 34
## $ Season <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 201…
## $ Daynum <dbl> 134, 134, 135, 135, 136, 136, 136, 136, 136, 136, 136, 136, 136…
## $ Wteam  <dbl> 1155, 1421, 1427, 1433, 1139, 1140, 1153, 1163, 1196, 1211, 124…
## $ Wscore <dbl> 70, 81, 70, 59, 60, 74, 78, 81, 79, 86, 73, 59, 62, 74, 69, 68,…
## $ Lteam  <dbl> 1412, 1114, 1106, 1425, 1330, 1459, 1281, 1137, 1364, 1385, 142…
## $ Lscore <dbl> 52, 77, 61, 46, 58, 66, 63, 52, 51, 71, 68, 57, 61, 51, 66, 50,…
## $ Wloc   <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
## $ Numot  <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Wfgm   <dbl> 26, 27, 23, 20, 22, 24, 29, 32, 29, 28, 22, 24, 21, 25, 26, 27,…
## $ Wfga   <dbl> 50, 54, 54, 59, 54, 61, 54, 66, 53, 52, 54, 47, 57, 59, 56, 66,…
## $ Wfgm3  <dbl> 4, 4, 4, 9, 7, 6, 4, 9, 8, 9, 5, 5, 9, 8, 12, 9, 5, 4, 5, 12, 5…
## $ Wfga3  <dbl> 13, 12, 16, 24, 26, 22, 11, 24, 23, 15, 17, 11, 19, 19, 24, 22,…
## $ Wftm   <dbl> 14, 23, 20, 10, 9, 20, 16, 8, 13, 21, 24, 6, 11, 16, 5, 5, 13, …
## $ Wfta   <dbl> 16, 28, 30, 15, 11, 24, 22, 9, 17, 26, 28, 11, 18, 25, 12, 8, 1…
## $ Wor    <dbl> 4, 6, 10, 17, 18, 10, 13, 13, 9, 12, 12, 6, 17, 22, 10, 15, 8, …
## $ Wdr    <dbl> 25, 29, 30, 23, 14, 29, 23, 36, 26, 31, 21, 20, 24, 28, 17, 30,…
## $ Wast   <dbl> 17, 17, 14, 11, 11, 14, 14, 20, 22, 20, 14, 12, 11, 18, 14, 11,…
## $ Wto    <dbl> 12, 15, 13, 9, 15, 10, 11, 6, 12, 15, 11, 9, 18, 12, 3, 4, 10, …
## $ Wstl   <dbl> 10, 6, 4, 5, 8, 9, 3, 5, 11, 6, 2, 5, 12, 4, 4, 5, 6, 7, 6, 3, …
## $ Wblk   <dbl> 2, 1, 0, 3, 1, 1, 2, 6, 2, 2, 1, 7, 1, 1, 9, 5, 5, 8, 1, 2, 2, …
## $ Wpf    <dbl> 12, 25, 14, 24, 21, 22, 15, 13, 15, 15, 17, 8, 13, 16, 18, 8, 1…
## $ Lfgm   <dbl> 18, 24, 22, 15, 16, 22, 24, 16, 17, 27, 23, 24, 22, 17, 23, 19,…
## $ Lfga   <dbl> 48, 56, 62, 38, 45, 56, 63, 51, 47, 58, 50, 52, 48, 55, 46, 56,…
## $ Lfgm3  <dbl> 12, 9, 7, 1, 5, 4, 6, 7, 5, 7, 6, 3, 10, 3, 6, 8, 8, 10, 5, 6, …
## $ Lfga3  <dbl> 24, 29, 26, 9, 15, 19, 19, 21, 19, 15, 18, 14, 25, 19, 13, 22, …
## $ Lftm   <dbl> 4, 20, 10, 15, 21, 18, 9, 13, 12, 10, 16, 6, 7, 14, 14, 4, 4, 1…
## $ Lfta   <dbl> 7, 26, 12, 25, 27, 24, 13, 15, 18, 15, 21, 7, 16, 19, 23, 8, 6,…
## $ Lor    <dbl> 7, 7, 11, 5, 13, 7, 15, 1, 8, 5, 8, 8, 6, 9, 11, 9, 8, 15, 13, …
## $ Ldr    <dbl> 22, 26, 28, 26, 16, 29, 16, 22, 18, 15, 21, 20, 23, 18, 25, 24,…
## $ Last   <dbl> 10, 19, 13, 4, 8, 12, 15, 6, 8, 9, 10, 8, 14, 9, 12, 8, 12, 20,…
## $ Lto    <dbl> 19, 16, 12, 15, 15, 13, 9, 5, 17, 10, 12, 6, 16, 10, 7, 9, 9, 1…
## $ Lstl   <dbl> 4, 6, 6, 2, 9, 4, 7, 3, 7, 10, 2, 8, 14, 7, 1, 2, 4, 9, 5, 7, 7…
## $ Lblk   <dbl> 3, 2, 3, 6, 2, 4, 3, 1, 3, 4, 2, 2, 7, 6, 2, 0, 0, 8, 3, 0, 1, …
## $ Lpf    <dbl> 10, 24, 22, 20, 17, 20, 18, 10, 17, 22, 25, 11, 18, 21, 14, 7, …

Graphical Primitives/ ggplot

graph.a <- ggplot(data = hoops_2011, aes(Lfgm, Wfgm))
graph.a

Adding Geoms: geom_point()

graph.a + geom_point()

Adding Geoms: geom_smooth()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x')

Adding Geoms: geom_rug()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x') +
  geom_rug()

Adding Geoms: geom_density2d()

graph.a + geom_point() + 
  geom_smooth(method = 'loess', formula = 'y ~ x') +
  geom_rug() + geom_density2d()

Adding Geoms: geom_jitter()

graph.a + geom_rug() + geom_density2d() + geom_jitter()

Adding Geoms: labs()

graph.a  + geom_rug() + geom_density2d() +
 geom_jitter() + 
  labs(x='Losing Team Field Goals Made', 
       y = 'Winning Team Field Goals Made')

Scales: xlim() and ylim()

graph.a + geom_rug() + geom_density2d() +
 geom_jitter() + 
  labs(x='Losing Team Field Goals Made', 
       y = 'Winning Team Field Goals Made') +
  xlim(c(0,max(hoops_2011$Wfgm))) + ylim(c(0,max(hoops_2011$Wfgm)))

Themes

There are a wide range of themes available in ggplot: theme overview

More about aes

graph.a + geom_jitter(col = 'firebrick4')

More about aes

graph.a + geom_jitter(aes(col = as.factor(Season)))

More about aes

graph.a + geom_jitter(aes(col = as.factor(Season)), size=3,alpha=.4)

More about aes

More about aes: Comment

graph.a + 
  geom_jitter(aes(shape = as.factor(Season),col=Wscore),
              size=3,alpha=.4)

Faceting

Faceting: Comment

graph.a + geom_point() + facet_wrap(~Season)

Faceting

graph.a + facet_wrap(~Season) + 
  geom_jitter(alpha=.5, aes(color=Wfgm3))

Seattle Housing Data Set

Use the Seattle Housing Data Set http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv to create an interesting graphic, include informative titles, labels, and add an annotation.

seattle_in <- read_csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv')
## Rows: 869 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfr...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Exercise: ggplot2

Now use ggplot2 to create an interesting graph using the Seattle Housing data set.

Solution: ggplot2

A Solution: ggplot2

seattle_in$zipcode <- as.factor(seattle_in$zipcode) 
ggplot(data = seattle_in, aes(sqft_living,price)) + 
  geom_jitter(aes(col = zipcode)) + 
  theme(plot.title = element_text(size=8), text = element_text(size=6)) + 
  geom_smooth(method='loess', formula = 'y ~ x') + 
  ggtitle('Seattle Housing Sales: Price vs. Square Footage Living Space') + 
  ylab('Sales Price (million dollars)') + 
  xlab('Living Space (square foot)') + 
  scale_y_continuous(breaks=c(seq(0,7000000,by=1000000)), labels=as.character(0:7)) +  annotate('text',3500,6000000, label = 'Housing price depends on zipcode', size=2) + 
  annotate("rect", xmin = 0, xmax = 7250, ymin = 5500000, ymax = 6500000, alpha = .6) +
  geom_segment(aes(x=3500, xend=3500, y=5500000, yend=3000000),
                           arrow = arrow(length = unit(0.5, "cm")))