Case study: How does a bike-share navigate speedy success?
Cyclistic’s Story
We’re exploring how rider duration patterns differ between casual and member types—who rides longer, when, and where. For example, who rides longer? Rider duration patterns paired by station names? Rider day of week vs weekend lengths and also ride duration broken by time of day. The company believes that maximising the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, there is a solid opportunity to convert casual riders into members. Executives understand that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. I will unravel interesting rider patterns and provide a summary of my analysis at the end of the report.
Three questions will guide the future marketing program:
By conducting my analysis I can see there is a clear difference in how casual and member types differ. Casual riders tend to take longer trips on average than members, often using the bikes for leisure rather than commuting. Members show shorter average durations approximating 12 minutes, which aligns with commuter behavior — quick, point-to-point travel, often during peak hours.This pattern supports targeted planning around peak weekend usage and pricing strategy.
# Average Ride Duration by Rider Group
avg_by_rider <- station_data_clean %>%
group_by(rider_group) %>%
summarise(avg_duration_mins = mean(ride_length_mins_2, na.rm = TRUE)) %>%
arrange(desc(avg_duration_mins))
ggplot(avg_by_rider, aes(x = reorder(rider_group, avg_duration_mins), y = avg_duration_mins, fill = rider_group)) +
geom_col(width = 0.3) +
scale_fill_manual(values = c("casual" = "#D55E00", "member" = "#009E73")) +
labs(title = "Average Ride Duration by Rider Group",
x = "Rider Group",
y = "Average Duration (mins)") +
theme_minimal()
While casual riders tend to ride longer, members appear to use the service more frequently. This duality highlights the importance of tailoring engagement strategies to each group’s usage pattern.
ride_count_by_group <- station_data_clean %>%
group_by(rider_group) %>%
summarise(ride_count = n()) %>%
arrange(desc(ride_count))
ggplot(ride_count_by_group, aes(x = reorder(rider_group, ride_count), y = ride_count, fill = rider_group)) +
geom_col(width = 0.3) +
scale_fill_manual(values = c("casual" = "#D55E00", "member" = "#009E73")) +
geom_text(aes(label = scales::comma(ride_count)), vjust = -0.5, size = 4) +
labs(title = "Ride Count by Rider Group",
x = "Rider Group",
y = "Total Rides") +
theme_minimal()
Here’s another quick visual cue:
station_data_clean %>%
group_by(rider_group) %>%
summarise(
avg_duration = round(mean(ride_length_mins_2, na.rm = TRUE), 1),
ride_count = n()
) %>%
arrange(desc(ride_count)) %>%
knitr::kable()
rider_group | avg_duration | ride_count |
---|---|---|
member | 13.2 | 720311 |
casual | 89.5 | 67877 |
This chart reveals that members ride short and steady, while casual riders linger longer—especially after dark. To convert more casual riders, I suggest we offer them flexible membership plans especially for those who are simply visiting the city for a short period of time. We could use the following tag line to grab their attention, “Grab your 10% evening membership discount now.”
#Time of Day: average ride duration
station_data_clean <- station_data_clean %>%
mutate(
hour = lubridate::hour(started_at),
time_of_day = case_when(
hour >= 5 & hour < 12 ~ "Morning",
hour >= 12 & hour < 17 ~ "Afternoon",
hour >= 17 & hour < 21 ~ "Evening",
TRUE ~ "Night"
),
time_of_day = factor(time_of_day, levels = c("Morning", "Afternoon", "Evening", "Night"))
)
duration_by_time <- station_data_clean %>%
group_by(time_of_day, rider_group) %>%
summarise(avg_duration = mean(ride_length_mins_2, na.rm = TRUE)) %>%
arrange(match(time_of_day, c("Early Morning", "Mid Morning", "Afternoon", "Evening", "Night")))
## `summarise()` has grouped output by 'time_of_day'. You can override using the
## `.groups` argument.
station_data_clean <- station_data_clean %>%
mutate(time_of_day = factor(time_of_day,
levels = c("Early Morning", "Mid Morning", "Afternoon", "Evening", "Late Night")))
ggplot(duration_by_time, aes(x = time_of_day, y = avg_duration, fill = rider_group)) +
geom_col(position = "dodge") +
scale_fill_manual(values = c("casual" = "#D55E00", "member" = "#009E73"))
labs(title = "Average Ride Duration by Time of Day",
subtitle = "Members ride short and steady; casual riders linger longer afte dark.\nConsider flexible evening memberships to boost conversations",
x = "Time of Day",
y = "Average Duration (mins)",
fill = "Rider Group") +
theme_minimal()
## NULL
The following is a ggplot graph that shows the top paired stations. Why this is important? Because we can use this data to advertise future membership plans in ad boards around those 10 top stations. The data lacks consistent demographic data which does not enable us to dip deeper into best ad practices. Still, these top ten stations offer a strategic opportunity for targeted promotions. We understand that folks ride longer after dark, this gives us strong indication that rider enjoy night riding along the canal.
# top paired stations.
top_pairs <- station_data_clean %>%
count(pair, sort = TRUE) %>%
slice_head(n = 10)
ggplot(top_pairs, aes(x = pair, y = n)) + geom_col(fill = "#20B2AA") +
labs(
title = "Top Station Pairs Grouped \nBy Total Number of Rides",
x = "Station Pair",
y = "Number of Rides Per Station"
) + coord_flip()
To avoid skew from high-volume rider types, we calculated weighted average ride durations by station pair:
# Weighted Average Duration by Station Pair
library(tidyverse)
# Create top_pairs from most frequent station pairings
top_pairs <- station_data_clean %>%
count(pair, sort = TRUE) %>%
slice_head(n = 10)
# Step 1: Compute totals per pair and rider type
length_counts_2 <- station_data_clean %>%
filter(pair %in% top_pairs$pair) %>%
group_by(pair, rider_group) %>%
summarise(total_length = sum(ride_length_mins_2, na.rm = TRUE),
rider_count = n(),
.groups = "drop")
# Step 2: Get total rides per pair for weighting
pair_totals_2 <- length_counts_2 %>%
group_by(pair) %>%
summarise(pair_total = sum(rider_count))
# Step 3: Join and calculate weighted contribution
weighted_stats_2 <- left_join(length_counts_2, pair_totals_2, by = "pair") %>%
mutate(weight = rider_count / pair_total,
weighted_length = weight * (total_length / rider_count))
# Step 4: Summarize weighted average per pair
final_weighted_2 <- weighted_stats_2 %>%
group_by(pair) %>%
summarise(weighted_avg_length = sum(weighted_length))
# Optional: Visualise 📊
ggplot(final_weighted_2, aes(x = reorder(pair, weighted_avg_length), y = weighted_avg_length)) +
geom_col(fill = "#009E73") +
coord_flip() +
labs(title = "Weighted Average Ride Duration \nby Station Pair",
x = "Station Pair",
y = "Weighted Avg Duration (min)")