Week 6: Video

Data Viz Resources

Data Viz Resources

Telling Stories with Data

Telling Stories with Data

One of the best ways to explore and understand a dataset is with visualization.

Data viz is more than numbers

Journalism

Spending Allocation

Art

Starry night for the color blind

Entertainment

Kobe Bryant Shot Chart

Compelling - Hans Rosling

Hans Rosling

http://www.youtube.com/embed/jbkSRLYSojo?rel=0

Exercise: Hans Rosling Discussion

  • What did you learn from this movie?
  • How did Hans Rosling use data visualization to tell a story?
  • What principles from the visualization would you like to be able to do?

Data Viz: What to look for

Patterns

Why so many births around Sept. 25?

Relationships

Age vs. hospital visits

Questionable Data

Fox News

Design Principles

Explain Encodings

what is purple?

Explain Encodings

what is gray?

Label Axes

Calories for menu items

Keep Geometry in Check

proper scaling

Include Sources

source your data

Spotting Visualization Lies

FlowingData Guide for Spotting Visualization Lies:

Types of Graphs

Why use Graphics

  • Why do you, or have you, in the past used data graphics?
    • Exploratory Graphics
    • Publication Graphics
    • Presentation Graphics

Graphics in R

Exercise: Visualizing Patterns Over Time

  • What are we looking for with data over time?

Solution: Visualizing Patterns Over Time

  • What are we looking for with data over time?
    • Trends (increasing/decreasing)
    • Are season cycles present?
  • Identifying these patterns requires looking beyond single points
  • We are also interested in looking at more the data in more detail
    • Are there outliers?
    • Do any time periods look out of place?
    • Are there spikes or dips?
    • What causes any of these irregularities?

Capital BikeShare

Capital BikeShare

Capital Bikeshare Data

library(readr)
bike.data <- read_csv('http://www.math.montana.edu/ahoegh/teaching/stat408/datasets/Bike.csv')
## Rows: 10886 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (11): season, holiday, workingday, weather, temp, atemp, humidity, wind...
## dttm  (1): datetime
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Capital Bikeshare Data

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
bike.data <- bike.data %>% mutate(year = as.factor(year(datetime)), month = as.factor(month(datetime)))
monthly.counts <- bike.data %>% group_by(month) %>% summarize(num_bikes = sum(count)) %>% arrange(month)
monthly.counts
## # A tibble: 12 × 2
##    month num_bikes
##    <fct>     <dbl>
##  1 1         79884
##  2 2         99113
##  3 3        133501
##  4 4        167402
##  5 5        200147
##  6 6        220733
##  7 7        214617
##  8 8        213516
##  9 9        212529
## 10 10       207434
## 11 11       176440
## 12 12       160160

Discrete Points: Bar Charts

Exercise: Visualizing Proportions

  • What to look for in proportions?

Visualizing Proportions

  • What to look for in proportions?
    • Generally looking for maximum, minimum, and overall distribution.
  • Many of the figures we have discussed are useful here as well: for example, stacked bar charts or points to look at changes in proportions over time.
  • Another possibility is the waffle plot we’ve previously seen.

Exercise: Visualizing Relationships

  • When considering relationships between variables, what are we looking for?

Visualizing Relationships

  • When considering relationships between variables, what are we looking for?
    • If something goes up, do other variables have a positive relationship, negative relationship, or no relationship.
    • What is the distribution of your data? (both univariate and multivariate)