This lab will explore basic ggplot2
functionality. Please turn in one compiled document (PDF) per group. We will use two datasets related to the recent XXXIII Olympiad (Olympics) held in Paris, France.
The first dataset contains medal counts for all countries earning a medal.
```{r, message = F} library(tidyverse) medals <- read_csv(‘https://raw.githubusercontent.com/stat408/Data/main/MedalCount.csv’)
### 1. (4 points)
Create a figure that tells the story of the medal count at the Paris Olympics.
## Olympic Athlete Dataset
This set of figures will use an Olympic dataset from Kaggle. Additional information is available at <https://www.kaggle.com/datasets/willianoliveiragibin/olympics-2024?resource=download&select=athletes+new.csv>
```{r, message = F}
library(tidyverse)
athletes <- read_csv('https://raw.githubusercontent.com/stat408/Data/main/athletes%20new.csv') |>
mutate(birth_year = year(birth_date))
Using the birth_date
variable, create a figure that visualizes the ages of the Olympians. Which sports tend to have the youngest and oldest athletes?
Create a figure that displays the number of competing athletes from the 12 countries with the most medals.
string_in <- medals$Country[1]
extract_country_abbr <- function(string_in){
str_split(string_in, '\\(' )[[1]][2] |>
str_sub(end = -2)
}
top12 <- sapply(medals$Country[1:12], extract_country_abbr)
top12_athletes <- athletes |>
filter(country_code %in% top12)
Note: I’ve made this process easier for you by only including athletes from these 12 countries.
Use the Q4_data
to visualize the relationship between the number of medals earned by a country against the number of athletes participating in the Olympics.
medals$country_code <- sapply(medals$Country, extract_country_abbr)
Q4_data <- athletes |>
group_by(country_code, country_full) |>
tally() |>
rename(num_athletes = n) |>
left_join(medals, by = 'country_code') |>
mutate(`Total Medals` = case_when(
!is.na(`Total Medals`) ~ `Total Medals`,
is.na(`Total Medals`) ~ 0
)) |>
select(country_code, num_athletes, country_full, `Total Medals`) |>
rename(total_medals = `Total Medals`)
The athletes
dataset also contains the events the athletes are competing in. See the value for Montana’s Katherine Berkoff
or gymnast Simone Biles
Describe (in words or pseudocode) what you’d need to do and/or what additional information you’d need in order to create a figure that displayed the number of events competed in by athletes that won medals.