Hello,
I'm finishing the Coursera Google Data Analytics Professional Certificate course and I'm stuck with one problem on the case study.
The case study is related to historical bicycle trips in Chicago. We use these 4 files that can be download here: Bucket loading...
[Divvy_Trips_2020_Q1.zip]
[Divvy_Trips_2019_Q4.zip]
[Divvy_Trips_2019_Q3.zip]
[Divvy_Trips_2019_Q2.zip]
On the case study there is this script from Kevin Hartman to Clean and prepare the data: Divvy Exercise R Script - Google Docs
I used the script above including few lines to be sure that all datasets has the same number of columns with the same name.
library(tidyverse) #helps wrangle data
library(lubridate) #helps wrangle date attributes
library(ggplot2) #helps visualize data
library(reprex)
#=====================
# STEP 1: COLLECT DATA
#=====================
# Upload Divvy datasets (csv files) here
q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
q3_2019 <- read_csv("Divvy_Trips_2019_Q3.csv")
q4_2019 <- read_csv("Divvy_Trips_2019_Q4.csv")
q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")
#====================================================
# STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE
#====================================================
# Compare column names each of the files
# While the names don't have to be in the same order, they DO need to match perfectly before we can use a command to join them into one file
colnames(q3_2019)
colnames(q4_2019)
colnames(q2_2019)
colnames(q1_2020)
# Rename columns to make them consistent with q1_2020 (as this will be the supposed going-forward table design for Divvy)
(q4_2019 <- rename(q4_2019
,ride_id = trip_id
,rideable_type = bikeid
,started_at = start_time
,ended_at = end_time
,start_station_name = from_station_name
,start_station_id = from_station_id
,end_station_name = to_station_name
,end_station_id = to_station_id
,member_casual = usertype))
(q3_2019 <- rename(q3_2019
,ride_id = trip_id
,rideable_type = bikeid
,started_at = start_time
,ended_at = end_time
,start_station_name = from_station_name
,start_station_id = from_station_id
,end_station_name = to_station_name
,end_station_id = to_station_id
,member_casual = usertype))
(q2_2019 <- rename(q2_2019
,ride_id = "01 - Rental Details Rental ID"
,rideable_type = "01 - Rental Details Bike ID"
,started_at = "01 - Rental Details Local Start Time"
,ended_at = "01 - Rental Details Local End Time"
,start_station_name = "03 - Rental Start Station Name"
,start_station_id = "03 - Rental Start Station ID"
,end_station_name = "02 - Rental End Station Name"
,end_station_id = "02 - Rental End Station ID"
,member_casual = "User Type"))
# Inspect the dataframes and look for incongruencies
str(q1_2020)
str(q4_2019)
str(q3_2019)
str(q2_2019)
# Convert ride_id and rideable_type to character so that they can stack correctly
q4_2019 <- mutate(q4_2019, ride_id = as.character(ride_id)
,rideable_type = as.character(rideable_type))
q3_2019 <- mutate(q3_2019, ride_id = as.character(ride_id)
,rideable_type = as.character(rideable_type))
q2_2019 <- mutate(q2_2019, ride_id = as.character(ride_id)
,rideable_type = as.character(rideable_type))
#drop the columns that are not useful
q1_2020$start_lat <- NULL
q1_2020$start_lng <- NULL
q1_2020$end_lat <- NULL
q1_2020$end_lng <- NULL
q4_2019$gender <- NULL
q4_2019$birthyear <- NULL
q3_2019$gender <- NULL
q3_2019$birthyear <- NULL
q2_2019$`05 - Member Details Member Birthday Year` <- NULL
q2_2019$`Member Gender`<- NULL
#rename column to has the same name as in other datasets
(q2_2019 <- rename(q2_2019
,tripduration = "01 - Rental Details Duration In Seconds Uncapped"))
# Stack individual quarter's data frames into one big data frame
# not included q1_2020 because has one column less
all_trips <- bind_rows(q2_2019, q3_2019, q4_2019)
title: tweed-barb_reprex.R
author: r1387388
date: '2022-09-03'
Created on r Sys.Date()
with [reprex vr utils::packageVersion("reprex")
]
I tried to run many times but never works. I always get the error: "The previous R session was abnormally terminated due to an unexpected crash".
The files are imported and I create the datasets correctly. The issue is when I try to bind_rows. I tried also rbind and didn't work.
I also tried using the desktop software instead of RStudio Cloud and didn't work.