Strings and Dates

Baltimore Towing Data

An Overview

The first example focuses on a subset of a dataset that contains information on vehicles towed in Baltimore, MD:

A larger version of this dataset along with additional descriptions can be found at: https://data.baltimorecity.gov/.
The full version of the dataset contains 61,000 rows and 36 columns, where each row corresponds to a vehicle and the columns are information pertaining to the vehicle.
We will be working with a smaller dataset with approximately 30,000 rows and 5 columns.

The dataset

First read in the data set which is available at: https://math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv.

baltimore_tow <- read_csv('https://math.montana.edu/ahoegh/teaching/stat408/datasets/BaltimoreTowing.csv')

Rows: 30263 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): vehicleType, vehicleMake, vehicleModel, receivingDateTime, totalPaid

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(baltimore_tow)

Rows: 30,263
Columns: 5
$ vehicleType       <chr> "Van", "Car", "Car", "Car", "Car", "SUV", "Car", "Ca…
$ vehicleMake       <chr> "LEXUS", "Mercedes", "Chysler", "Chevrolet", "Hyunda…
$ vehicleModel      <chr> NA, NA, "Cirrus", "Cavalier", "Tiburon", "RAV4", "32…
$ receivingDateTime <chr> "10/24/2010 12:41:00 PM", "04/28/2015 09:27:00 AM", …
$ totalPaid         <chr> "$322.00", "$130.00", "$280.00", "$1057.00", "$469.0…

Information for a few vehicles

kable(head(baltimore_tow,15))

vehicleType	vehicleMake	vehicleModel	receivingDateTime	totalPaid
Van	LEXUS	NA	10/24/2010 12:41:00 PM	$322.00
Car	Mercedes	NA	04/28/2015 09:27:00 AM	$130.00
Car	Chysler	Cirrus	07/23/2015 07:55:00 AM	$280.00
Car	Chevrolet	Cavalier	10/23/2010 11:35:00 AM	$1057.00
Car	Hyundai	Tiburon	10/25/2010 02:49:00 PM	$469.00
SUV	Toyota	RAV4	10/25/2010 11:12:00 AM	$305.00
Car	Bmw	325	10/23/2012 07:50:00 PM	$220.00
Car	Honda	Accord	10/25/2010 02:53:00 PM	$327.00
Car	Ford	Taurus	12/23/2010 04:09:00 AM	$290.00
SUV	Ford	Expo	12/23/2010 02:51:00 PM	$230.00
SUV	Lincoln	Mkx	12/23/2010 01:40:00 PM	$230.00
Car	Geo	Prizm	12/23/2010 06:45:00 AM	$570.00
Car	Kia	Spectra	12/23/2010 03:57:00 AM	$280.00
Car	Nissan	NA	12/23/2010 05:08:00 AM	$280.00
Pick-up Truck	Dodge	Dakota	12/23/2010 02:05:00 PM	$275.00

Goal 1: Plot the number of Vehicles Towed by Year

The first goal is to determine how many vehicles were towed for each year in the data set.

Given that the we don’t have a column for year and the first observation for receiving date is “10/24/2010 12:41:00 PM”.

Option 1: str_sub()

baltimore_tow |>
  mutate(year = str_sub(receivingDateTime, 7, 10)) |>
  ggplot(aes(x = year)) +
  geom_bar() +
  theme_bw() +
  labs(title = 'Number of Vehicles Towed in Baltimore by Year') +
  xlab('') +
  ylab("number of vehicles towed") +
  annotate('text', x = 3, y = 10000, label ='Most data is from 2015 & 2016') +
  annotate('segment', x = 3, y = 9000, xend = 6, yend = 9000, arrow = arrow())

Option 2: Create date-time object

baltimore_tow |>
  mutate(date = parse_date_time(receivingDateTime, "%m/%d/%y %I:%M:%S %p"),
         year = year(date)) |>
  ggplot(aes(x = year)) +
  geom_bar() +
  theme_bw() +
  labs(title = 'Number of Vehicles Towed in Baltimore by Year') +
  xlab('') +
  ylab("number of vehicles towed") +
  annotate('text', x = 2012, y = 10000, label ='Most data is from 2015 & 2016') +
  annotate('segment', x = 2012, y = 9000, xend = 2015, yend = 9000, arrow = arrow())

Goal 2. Type of Vehicles Towed by Morning / Afternoon

Next we wish to compute how many vehicles were towed in the AM and PM for each type of variables.

unique(baltimore_tow$vehicleType)

 [1] "Van"                        "Car"                       
 [3] "SUV"                        "Pick-up Truck"             
 [5] "Motor Cycle (Street Bike)"  "Dirt Bike"                 
 [7] "Commercial Truck"           "Trailer"                   
 [9] "Station Wagon"              "Truck"                     
[11] "Taxi"                       "Pickup Truck"              
[13] "Convertible"                "Tractor Trailer"           
[15] "Tow Truck"                  "All terrain - 4 wheel bike"
[17] "Mini-Bike"                  "Golf Cart"                 
[19] "Boat"                       "Tractor"                   
[21] "Construction Equipment"     "Sport Utility Vehicle"

However, we want to take a close look at the vehicle types in the data set and perhaps create more useful groups.

Suggested Grouping

First consider reasonable groups for vehicle types.

Cars - (Car, Convertible)
Large Cars - (SUV, Station Wagon, Sport Utility Vehicle, Van, Taxi)
Trucks - (Pick-up Truck, Pickup Truck)
Large Trucks - (Truck, Tractor Trailer, Tow Truck, Tractor, Construction Equipment, Commercial Truck)
Bikes - (Motor Cycle (Street Bike), Dirt Bike, All terrain - 4 wheel bike, Mini-Bike)
Misc (delete) - (Boat, Golf Cart, Trailer)

baltimore_tow |>
  filter(!vehicleType %in% c('Boat','Golf Cart', 'Trailer')) |>
  mutate(vehicleGroup = case_when(
    vehicleType %in% c('Car','Convertible') ~ 'Cars',
    vehicleType %in% c('SUV', 'Station Wagon','Sport Utility Vehicle', 'Van', 'Taxi') ~ 'Large Cars',
    vehicleType %in% c("Pick-up Truck", "Pickup Truck") ~ 'Trucks',
    vehicleType %in% c("Truck", 'Tractor Trailer', "Tow Truck", "Tractor", "Construction Equipment",'Commercial Truck') ~ 'Large Trucks',
    vehicleType %in% c('Motor Cycle (Street Bike)', "Dirt Bike", "All terrain - 4 wheel bike", "Mini-Bike") ~ 'Bikes',
    TRUE ~ 'Other'
  )) |>
  mutate(time_of_day = str_sub(receivingDateTime, start = -2)) |>
  group_by(vehicleGroup, time_of_day) |>
  tally() |>
  ggplot(aes(x = vehicleGroup, fill = vehicleGroup, y = n)) +
  geom_bar(stat = 'identity') +
  facet_wrap(.~time_of_day, nrow = 2) +
  theme_bw() +
  labs(title = "Number of vehicles towed by car type") +
  xlab('') +
  ylab('') +
  theme(legend.position = 'none') +
  scale_fill_viridis(discrete = TRUE, option = 'D')

Figure 1: Vehicles towed in Baltimore, MD. Groups are defined as Cars - (Car, Convertible), Large Cars - (SUV, Station Wagon, Sport Utility Vehicle, Van, Taxi), Trucks - (Pick-up Truck, Pickup Truck), Large Trucks - (Truck, Tractor Trailer, Tow Truck, Tractor, Construction Equipment, Commercial Truck), Bikes - (Motor Cycle (Street Bike), Dirt Bike, All terrain - 4 wheel bike, Mini-Bike)

Goal 3: Calculate Average Towing fees by vehicleMake

baltimore_tow |>
  mutate(fees = parse_number(totalPaid)) |>
  group_by(vehicleMake) |>
  summarize(`Average Towing Fee` = mean(fees)) |>
  arrange(vehicleMake) |>
  slice(1:30) |>
  kable()

Table 1: First 30 rows – Uh Oh, major spelling errors
vehicleMake	Average Towing Fee
,itsubishi	235.0000
2bp6329	250.0000
3gmc	415.0000
6an8561	220.0000
8bg7541	130.0000
?	1260.0000
ACURA	288.6667
AUDI	277.0000
Aacura	220.0000
Accura	243.0000
Acira	220.0000
Aciura	280.0000
Acrua	289.5000
Acua	242.5000
Acur	130.0000
Acura	342.0937
Acute	220.0000
Apollo	190.0000
Aqcura	162.0000
Aston Martin	355.0000
Aucra	250.0000
Audi	361.2941
Audio	385.0000
Audu	1065.0000
Aura	220.0000
Avp1	130.0000
BMW	297.0000
BUICK	272.0000
Bendron	313.0000
Bentley	595.0000

Goal 4. Data Cleaning

Spelling errors can be addressed, by reassigning vehicle makes to the correct spelling, but is tedious..