The full version of the dataset contains 61,000 rows and 36 columns, where each row corresponds to a vehicle and the columns are information pertaining to the vehicle.
We will be working with a smaller dataset with approximately 30,000 rows and 5 columns.
Rows: 30263 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): vehicleType, vehicleMake, vehicleModel, receivingDateTime, totalPaid
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The first goal is to determine how many vehicles were towed for each year in the data set.
Given that the we don’t have a column for year and the first observation for receiving date is “10/24/2010 12:41:00 PM”.
Option 1: str_sub()
baltimore_tow |>mutate(year =str_sub(receivingDateTime, 7, 10)) |>ggplot(aes(x = year)) +geom_bar() +theme_bw() +labs(title ='Number of Vehicles Towed in Baltimore by Year') +xlab('') +ylab("number of vehicles towed") +annotate('text', x =3, y =10000, label ='Most data is from 2015 & 2016') +annotate('segment', x =3, y =9000, xend =6, yend =9000, arrow =arrow())
Option 2: Create date-time object
baltimore_tow |>mutate(date =parse_date_time(receivingDateTime, "%m/%d/%y %I:%M:%S %p"),year =year(date)) |>ggplot(aes(x = year)) +geom_bar() +theme_bw() +labs(title ='Number of Vehicles Towed in Baltimore by Year') +xlab('') +ylab("number of vehicles towed") +annotate('text', x =2012, y =10000, label ='Most data is from 2015 & 2016') +annotate('segment', x =2012, y =9000, xend =2015, yend =9000, arrow =arrow())
Goal 2. Type of Vehicles Towed by Morning / Afternoon
Next we wish to compute how many vehicles were towed in the AM and PM for each type of variables.