Creating a column with duration from start and stop times

Hi, all.

I have a data frame with start and stop times. I want to calculate the duration from the start and stop times and generate a new column with the durations for every row (a million or so rows). Here are the first two rows of the data frame:

Ride_id Started_at ended_at
EACB19130B0 2020-01-21 20:06:59 2020-01-21 20:14:30
8FED874C809 2020-01-30 14:22:39 2020-01-30 14:26:22

I grabbed a solution from here: r - Calculating new column based on input from existing columns - Stack Overflow

diff_time <- function(start, end) {
case_when(start < end ~ end - start,
start > end ~ parse_time("23:59") - start + end + parse_time("0:01")
)
}

bike_trips %>%
mutate_all(parse_time) %>%
mutate(tripduration = diff_time(started_at, ended_at))

The Diff_time function seems logical. In cases when the ended_at time crosses over into the next day, the diff_time function should make the correction.

As I understand it, the mutate expression should create a column (tripduration) by passing the contents of the started_at and ended_at columns to the diff_time function.

However, I get an error (see below). I’m afraid I’m a newbie and troubleshooting this is over my head, although that there’s an error while trying to compute the ride_id column is a clue (I don’t want to do anything with that column!). Can anyone help with a solution for what I’m trying to do? Thanks in advance!

Warning: Problem while computing ride_id = (function (x, format = "", na = c("", "NA"), locale = default_locale(), ....
:information_source: 426887 parsing failures. row col expected actual 1 -- time like EACB19130B0CDA4A 2 -- time like 8FED874C809DC021 3 --
time like 789F3C21E472CA96 4 -- time like C9A388DAC6ABF313 5 -- time like 943BC3CBECCFD662 ... ... ..........
................ See problems(...) for more details.
Error in mutate():
! Problem while computing started_at = (function (x, format = "", na = c("", "NA"), locale = default_locale(), ....
Caused by error in parse_vector():
! is.character(x) is not TRUE
Run rlang::last_error() to see where the error occurred.

1 Like

Don't use mutate_all when you have different types. Besides there is no need for a custom function. Lubridate package understands time differences.

Try this:

library(tidyverse)
library(lubridate)

bike_trips <- tribble(
~'ride_id',~'started_at',~'ended_at',
   'EACB19130B0', '2020-01-21 20:06:59','2020-01-21 20:14:30',
   '8FED874C809', '2020-01-3014:22:39','2020-01-30 14:26:22'
)

# note the data types, all <chr>
bike_trips
#> # A tibble: 2 x 3
#>   ride_id     started_at          ended_at           
#>   <chr>       <chr>               <chr>              
#> 1 EACB19130B0 2020-01-21 20:06:59 2020-01-21 20:14:30
#> 2 8FED874C809 2020-01-3014:22:39  2020-01-30 14:26:22

# change types to what we want.
bike_trips <- bike_trips %>% 
   mutate(started_at = as_datetime(started_at)) %>% 
   mutate(ended_at = as_datetime(ended_at))
   
bike_trips
#> # A tibble: 2 x 3
#>   ride_id     started_at          ended_at           
#>   <chr>       <dttm>              <dttm>             
#> 1 EACB19130B0 2020-01-21 20:06:59 2020-01-21 20:14:30
#> 2 8FED874C809 2020-01-30 14:22:39 2020-01-30 14:26:22

# note desired data types, now we can just use subtraction
# subtraction operator method for <dttm> produces durations

bike_trips <- bike_trips %>%
   mutate(tripduration = ended_at - started_at)

bike_trips
#> # A tibble: 2 x 4
#>   ride_id     started_at          ended_at            tripduration 
#>   <chr>       <dttm>              <dttm>              <drtn>       
#> 1 EACB19130B0 2020-01-21 20:06:59 2020-01-21 20:14:30 7.516667 mins
#> 2 8FED874C809 2020-01-30 14:22:39 2020-01-30 14:26:22 3.716667 mins

Created on 2022-08-29 by the reprex package (v2.0.1)

4 Likes
# Try this
library(dplyr)
library(lubridate)
# Creating a Data Frame
df <- data.frame(Ride_id = c('EACB19130B0', '8FED874C809'), Started_at = c('2020-01-21 20:06:59','2020-01-30 14:22:39'), Ended_at=c('2020-01-21 20:14:30','2020-01-30 14:26:22'))
df$Started_at <- as.POSIXct(df$Started_at)
df$Ended_at <- as.POSIXct(df$Ended_at)

df <- df %>% mutate(Trip_Duration = Ended_at - Started_at)

df
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.