Unable to create a new column (ride_length) using two date columns(Started_at),(Ended_at).

Meteor_007 · September 22, 2022, 8:09am

Currently I'm working on Divvy dataset Google capstone project. I am a novice in this field.
This Dataset contains data about the ride. I have combined 12 files of previous year to make a huge dataset.

Masterfile <- read.csv('Masterfile.csv')

I'm struggling to form a new column 'ride_length' which i can get by subtracting data of column "Started_at" and "Ended_it".
These columns contains date "02-07-21 14:44" and "02-07-21 15:19" in this format.
I even converted both the columns in Numeric.

***Masterfile$ended_at <- as.numeric(Masterfile$ended_at, origin = "02-07-21 15:19")
Masterfile$started_at <- as.numeric(Masterfile$started_at , origin = "02-07-21 14:44") ***

When i Didn't supply origin I was getting this error-

"Error in as.POSIXct.numeric(time1) : 'origin' must be supplied"

But when I did , I got the following error-

Error in difftime(Masterfile$ended_at, Masterfile$ended_at, origin = "02-07-21 15:19") :

I might be doing silly mistake kindly help me . I am just stuck.

But I am unable to do it.

scottyd22 · September 22, 2022, 11:41am

You can leave the columns in character format and use difftime().

d = data.frame(Started_at = '02-07-21 14:44',
               Ended_at = '02-07-21 15:19')

d$ride_length = difftime(d$Ended_at, d$Started_at)

d
#>       Started_at       Ended_at ride_length
#> 1 02-07-21 14:44 02-07-21 15:19     35 mins

Created on 2022-09-22 with reprex v2.0.2.9000

andresrcs · September 22, 2022, 10:19pm

This will only be the case if all start/end pairs are within the same day since the actual date is not being parsed (the dates are being parsed as %y%m%d), I think this is a dangerous approach since it can lead to unexpected results, for example:

difftime('02-08-21 00:19', '02-07-21 23:44')
#> Time difference of 30.02431 days

^{Created on 2022-09-22 with reprex v2.0.2}

I think it is better to properly parse the character strings into date-time objects respecting the actual date format they are in (I'm assuming "%m-%d-%y).

library(dplyr)

d = data.frame(Started_at = c('02-07-21 14:44', '02-07-21 23:44'),
               Ended_at = c('02-07-21 15:19', '02-08-21 00:19'))

d %>% 
    mutate(across(c(Started_at, Ended_at), as.POSIXct, format = "%m-%d-%y %H:%M"),
           ride_length = difftime(Ended_at, Started_at))
#>            Started_at            Ended_at ride_length
#> 1 2021-02-07 14:44:00 2021-02-07 15:19:00     35 mins
#> 2 2021-02-07 23:44:00 2021-02-08 00:19:00     35 mins

^{Created on 2022-09-22 with reprex v2.0.2}

Meteor_007 · September 23, 2022, 7:59am

I have tried the solution. It is not working.

Masterfile$ridelength <- difftime(Masterfile$ended_at , Masterfile$started_at)

I'm getting the following error:

Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format

scottyd22 · September 23, 2022, 12:56pm

See @andresrcs solution, which specifically sets the inputs as date-time objects and specifies the exact format prior to using difftime().

system · October 14, 2022, 12:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.