Why ggplot2?
The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want.
Building blocks of a graph include:
Compared to base graphics, ggplot2
Aesthetics are things that you can see. Examples include:
Aesthetic mappings are set with the aes() function.
Geometric objects are the actual marks we put on a plot. Examples include:
geom_point
)geom_line
)geom_boxplot
)A plot must have at least one geom; there is no upper limit. You can
add a geom to a plot using the +
operator
We will use data from the NCAA basketball tournament from 2011 - 2016 to create a polished figure.
## Rows: 402
## Columns: 34
## $ Season <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 201…
## $ Daynum <dbl> 134, 134, 135, 135, 136, 136, 136, 136, 136, 136, 136, 136, 136…
## $ Wteam <dbl> 1155, 1421, 1427, 1433, 1139, 1140, 1153, 1163, 1196, 1211, 124…
## $ Wscore <dbl> 70, 81, 70, 59, 60, 74, 78, 81, 79, 86, 73, 59, 62, 74, 69, 68,…
## $ Lteam <dbl> 1412, 1114, 1106, 1425, 1330, 1459, 1281, 1137, 1364, 1385, 142…
## $ Lscore <dbl> 52, 77, 61, 46, 58, 66, 63, 52, 51, 71, 68, 57, 61, 51, 66, 50,…
## $ Wloc <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
## $ Numot <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Wfgm <dbl> 26, 27, 23, 20, 22, 24, 29, 32, 29, 28, 22, 24, 21, 25, 26, 27,…
## $ Wfga <dbl> 50, 54, 54, 59, 54, 61, 54, 66, 53, 52, 54, 47, 57, 59, 56, 66,…
## $ Wfgm3 <dbl> 4, 4, 4, 9, 7, 6, 4, 9, 8, 9, 5, 5, 9, 8, 12, 9, 5, 4, 5, 12, 5…
## $ Wfga3 <dbl> 13, 12, 16, 24, 26, 22, 11, 24, 23, 15, 17, 11, 19, 19, 24, 22,…
## $ Wftm <dbl> 14, 23, 20, 10, 9, 20, 16, 8, 13, 21, 24, 6, 11, 16, 5, 5, 13, …
## $ Wfta <dbl> 16, 28, 30, 15, 11, 24, 22, 9, 17, 26, 28, 11, 18, 25, 12, 8, 1…
## $ Wor <dbl> 4, 6, 10, 17, 18, 10, 13, 13, 9, 12, 12, 6, 17, 22, 10, 15, 8, …
## $ Wdr <dbl> 25, 29, 30, 23, 14, 29, 23, 36, 26, 31, 21, 20, 24, 28, 17, 30,…
## $ Wast <dbl> 17, 17, 14, 11, 11, 14, 14, 20, 22, 20, 14, 12, 11, 18, 14, 11,…
## $ Wto <dbl> 12, 15, 13, 9, 15, 10, 11, 6, 12, 15, 11, 9, 18, 12, 3, 4, 10, …
## $ Wstl <dbl> 10, 6, 4, 5, 8, 9, 3, 5, 11, 6, 2, 5, 12, 4, 4, 5, 6, 7, 6, 3, …
## $ Wblk <dbl> 2, 1, 0, 3, 1, 1, 2, 6, 2, 2, 1, 7, 1, 1, 9, 5, 5, 8, 1, 2, 2, …
## $ Wpf <dbl> 12, 25, 14, 24, 21, 22, 15, 13, 15, 15, 17, 8, 13, 16, 18, 8, 1…
## $ Lfgm <dbl> 18, 24, 22, 15, 16, 22, 24, 16, 17, 27, 23, 24, 22, 17, 23, 19,…
## $ Lfga <dbl> 48, 56, 62, 38, 45, 56, 63, 51, 47, 58, 50, 52, 48, 55, 46, 56,…
## $ Lfgm3 <dbl> 12, 9, 7, 1, 5, 4, 6, 7, 5, 7, 6, 3, 10, 3, 6, 8, 8, 10, 5, 6, …
## $ Lfga3 <dbl> 24, 29, 26, 9, 15, 19, 19, 21, 19, 15, 18, 14, 25, 19, 13, 22, …
## $ Lftm <dbl> 4, 20, 10, 15, 21, 18, 9, 13, 12, 10, 16, 6, 7, 14, 14, 4, 4, 1…
## $ Lfta <dbl> 7, 26, 12, 25, 27, 24, 13, 15, 18, 15, 21, 7, 16, 19, 23, 8, 6,…
## $ Lor <dbl> 7, 7, 11, 5, 13, 7, 15, 1, 8, 5, 8, 8, 6, 9, 11, 9, 8, 15, 13, …
## $ Ldr <dbl> 22, 26, 28, 26, 16, 29, 16, 22, 18, 15, 21, 20, 23, 18, 25, 24,…
## $ Last <dbl> 10, 19, 13, 4, 8, 12, 15, 6, 8, 9, 10, 8, 14, 9, 12, 8, 12, 20,…
## $ Lto <dbl> 19, 16, 12, 15, 15, 13, 9, 5, 17, 10, 12, 6, 16, 10, 7, 9, 9, 1…
## $ Lstl <dbl> 4, 6, 6, 2, 9, 4, 7, 3, 7, 10, 2, 8, 14, 7, 1, 2, 4, 9, 5, 7, 7…
## $ Lblk <dbl> 3, 2, 3, 6, 2, 4, 3, 1, 3, 4, 2, 2, 7, 6, 2, 0, 0, 8, 3, 0, 1, …
## $ Lpf <dbl> 10, 24, 22, 20, 17, 20, 18, 10, 17, 22, 25, 11, 18, 21, 14, 7, …
geom_point()
geom_smooth()
geom_rug()
geom_density2d()
geom_jitter()
labs()
xlim()
and ylim()
There are a wide range of themes available in ggplot: theme overview
Use the Seattle Housing Data Set http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv to create an interesting graphic, include informative titles, labels, and add an annotation.
seattle_in <- read_csv('http://math.montana.edu/ahoegh/teaching/stat408/datasets/SeattleHousing.csv')
## Rows: 869 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfr...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now use ggplot2
to create an interesting graph using the
Seattle Housing data set.
seattle_in$zipcode <- as.factor(seattle_in$zipcode)
ggplot(data = seattle_in, aes(sqft_living,price)) +
geom_jitter(aes(col = zipcode)) +
theme(plot.title = element_text(size=8), text = element_text(size=6)) +
geom_smooth(method='loess', formula = 'y ~ x') +
ggtitle('Seattle Housing Sales: Price vs. Square Footage Living Space') +
ylab('Sales Price (million dollars)') +
xlab('Living Space (square foot)') +
scale_y_continuous(breaks=c(seq(0,7000000,by=1000000)), labels=as.character(0:7)) + annotate('text',3500,6000000, label = 'Housing price depends on zipcode', size=2) +
annotate("rect", xmin = 0, xmax = 7250, ymin = 5500000, ymax = 6500000, alpha = .6) +
geom_segment(aes(x=3500, xend=3500, y=5500000, yend=3000000),
arrow = arrow(length = unit(0.5, "cm")))