difftime in R with inconsistent datetime

I am wondering if anyone has insight on how to approach this problem.
I have a dataframe with the columns: Startime, Endtime and ID (dataframe below) I have already grouped the IDs. I am trying to figure out the time difference in seconds of a specific ‘grouped by’ ID (specifically A)

Starttime	                Endtime		                   ID

12/18/2019 4:06:59 PM	       12/18/2019 4:07:05 PM	    A
12/18/2019 4:07:26 PM	       12/18/2019 4:07:28 PM	    A
12/17/2019 6:48:06 PM	       12/17/2019 6:48:07 PM	    A
12/17/2019 6:25:16 PM	       12/17/2019 6:25:22 PM	    A

I went about this by using the following code within R and utilizing the dplyr package:

data <-rawdata %>% group_by(ID) %>% summarize(diff = difftime(last(as.POSIXct(Endtime, format =
"%m/%d/%Y %I:%M:%S %p")),first(as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p" )), units = "secs"))

However, the code above only calculates the difference between start and end time but not the differences in between. For instance, the code above calculates:

12/18/2019 4:06:59 PM and 12/17/2019 6:25:22 PM

I see this as inaccurate, because it doesn’t account for when 12/17 changes to 12/18, its simply taking the first and last value within the grouped ID.

Is there a way to subgroup data within already grouped data? (further group by time: month day and year and then take its difference in seconds?

Is there a way to do this? Further subgroup the data so that when the day changes, it will know to calculate this difference as well, versus only calculating the first and last values.

                 Starttime		            Endtime                                    ID

                12/18/2019 4:06:59PM      12/18/2019 4:07:05 PM                             A
                12/18/2019 4:07:26PM      12/18/2019 4:07:28 PM                             A

                12/17/2019 6:48:06PM     12/17/2019 6:48:07PM                               A
                12/17/2019 6:25:16PM     12/17/2019 6:25:22PM                               A

Any suggestions will help! I feel like I am stuck on this. I will continue to research. Thank you!

EDIT: Correcting errors in my answer. Credit to @Yarnabrina for bringing them to my notice.

Hi @tanishahudson1! Welcome to RStudio Community.

Are you trying to do something like this?

library(tidyverse)
library(lubridate)

rawdata <- tibble(Starttime = c("12/18/2019 4:06:59 PM", "12/18/2019 4:07:26 PM", 
                                "12/17/2019 6:48:06 PM", "12/17/2019 6:25:16 PM"),
               Endtime = c("12/18/2019 4:07:05 PM", "12/18/2019 4:07:28 PM", 
                           "12/17/2019 6:48:07 PM", "12/17/2019 6:25:22 PM"),
               ID = rep("A", 4))

rawdata <- rawdata %>% 
  mutate_at(c("Starttime", "Endtime"), as.POSIXct, format = "%m/%d/%Y %I:%M:%S %p")

print(rawdata)
# A tibble: 4 x 3
  Starttime           Endtime             ID   
  <dttm>              <dttm>              <chr>
1 2019-12-18 16:06:59 2019-12-18 16:07:05 A    
2 2019-12-18 16:07:26 2019-12-18 16:07:28 A    
3 2019-12-17 18:48:06 2019-12-17 18:48:07 A    
4 2019-12-17 18:25:16 2019-12-17 18:25:22 A

rawdata %>% 
  group_by(ID, Date = date(Starttime)) %>% 
  summarize(diff = difftime(last(Endtime), first(Starttime), units = "secs"))
# A tibble: 2 x 3
# Groups:   ID [1]
  ID    Date       diff      
  <chr> <date>     <drtn>    
1 A     2019-12-17 -1364 secs
2 A     2019-12-18    29 secs

Note: Here I'm assuming that each observation always has the same date in the Starttime and Endtime columns.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.