Help with modifying time stamps based on condition

Hi,

Need help to modify time stamps based on a condition.

I had 2 datasets and while combining the two, some of the timestamps are overlapping. I am interested in adjusting these overlaps. Combining datasets created NAs in ID and that shows this data comes from another dataframe. We do want to retain timestamps coming from this dataframe with NA IDs and adjust other timestamps accordingly.

How do I remove rows with some kind of condition to avoid overlaps in time stamps START_TIME & END_TIME?

# Data with overlaps in timestamps such as in Row 5th & 6th. 
library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
              START_TIME = c("2023-08-14 12:07:40","2023-08-14 12:08:48","2023-08-14 12:09:49",
                             "2023-08-14 12:11:22","2023-08-14 12:13:17",
                             "2023-08-14 12:14:24","2023-08-14 12:14:56",
                             "2023-08-14 12:15:05","2023-08-14 12:17:04",
                             "2023-08-14 12:17:04","2023-08-14 12:17:05","2023-08-14 12:17:25",
                             "2023-08-14 12:17:26","2023-08-14 12:17:27",
                             "2023-08-14 12:25:54","2023-08-14 12:34:48",
                             "2023-08-14 12:35:24","2023-08-14 12:40:19",
                             "2023-08-14 12:40:38","2023-08-14 12:46:28",
                             "2023-08-14 12:46:50","2023-08-14 12:55:44","2023-08-14 12:55:53",
                             "2023-08-14 12:57:10","2023-08-14 12:59:25",
                             "2023-08-14 13:01:04","2023-08-14 13:08:45",
                             "2023-08-14 13:10:50","2023-08-14 13:13:28",
                             "2023-08-14 13:15:50","2023-08-14 13:22:01"),
                END_TIME = c("2023-08-14 12:08:48","2023-08-14 12:09:49","2023-08-14 12:11:22",
                             "2023-08-14 12:13:17","2023-08-14 12:14:56",
                             "2023-08-14 13:10:50","2023-08-14 12:15:05",
                             "2023-08-14 12:17:04","2023-08-14 12:17:04",
                             "2023-08-14 12:17:05","2023-08-14 12:17:25","2023-08-14 12:17:26",
                             "2023-08-14 12:17:27","2023-08-14 12:25:54",
                             "2023-08-14 12:34:48","2023-08-14 12:35:24",
                             "2023-08-14 12:40:19","2023-08-14 12:40:38",
                             "2023-08-14 12:46:28","2023-08-14 12:46:50",
                             "2023-08-14 12:55:44","2023-08-14 12:55:53","2023-08-14 12:57:10",
                             "2023-08-14 12:59:25","2023-08-14 13:01:04",
                             "2023-08-14 13:08:45","2023-08-14 13:13:28",
                             "2023-08-14 13:14:54","2023-08-14 13:22:01",
                             "2023-08-14 13:18:36","2023-08-14 13:24:10"),
            ELAPSED_TIME = c(1.1326,1.0197,
                             1.5486,1.915,1.65,56.4333,0.15,1.9833,0,0.0138,
                             0.3362,0.0167,0.0167,8.455,8.893,0.6012,
                             4.9153,0.3241,5.8319,0.3597,8.9031,0.15,1.2892,
                             2.2385,1.6556,7.6833,4.7208,4.06662,8.5425,
                             2.76666,2.1566),
                 DATE_ID = c("2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00",
                             "2023-08-14 00:00:00","2023-08-14 00:00:00"),
                      ID = c(237214724L,
                             237214724L,237214724L,237214724L,235565710L,NA,
                             235565710L,232849697L,232849697L,232849697L,
                             232849697L,232849697L,232849697L,233163426L,234067478L,
                             234067478L,234067478L,234067478L,235266765L,
                             235266765L,239167461L,239167461L,239167461L,
                             234182936L,238535268L,236468967L,236468967L,NA,
                             236019152L,NA,236918867L),
                Category = c("Cat1","Cat1",
                             "Cat1","Cat7","Cat1","Cat4","Cat3","Cat3","Cat1",
                             "Cat2","Cat3","Cat1","Cat1","Cat5","Cat1",
                             "Cat1","Cat5","Cat3","Cat5","Cat3","Cat1",
                             "Cat3","Cat5","Cat1","Cat2","Cat1","Cat1","Cat6",
                             "Cat1","Cat6","Cat3")
      ) %>%
  mutate(START_TIME = as.POSIXct(START_TIME), END_TIME = as.POSIXct(END_TIME))


# I would like to retain time stamps where IDs are NAs and adjust the other rows accordingly.
# I tried to modify as follows:
mod_df <- df %>%
  mutate(END_TIME = case_when(
    END_TIME > lead(START_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lead(START_TIME),
    TRUE ~ END_TIME
  )) %>%
  mutate(START_TIME = case_when(
    START_TIME < lag(END_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lag(END_TIME),
    TRUE ~ START_TIME
  )) 

# Above approach helps achieve right time stamps in 6th row and row 27th. 
# However, I would like to now remove these extra rows in between, 
# i.e. from 7th to 27th row as these time stamps are already captured by 6th and 28th row. How do I achieve this? 

How do I remove rows with some kind of condition such as if END_TIME < lag(END_TIME), remove that row and repeat until END_TIME >= lag(END_TIME). Hoping this could remove the rows with overlapping time stamps START_TIME & END_TIME.

Thanks for your help!

``

I'm not sure I understand your goal. Does this give you what you want with respect to the END_TIME column?

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
  START_TIME = c("2023-08-14 12:07:40","2023-08-14 12:08:48","2023-08-14 12:09:49",
                 "2023-08-14 12:11:22","2023-08-14 12:13:17",
                 "2023-08-14 12:14:24","2023-08-14 12:14:56",
                 "2023-08-14 12:15:05","2023-08-14 12:17:04",
                 "2023-08-14 12:17:04","2023-08-14 12:17:05","2023-08-14 12:17:25",
                 "2023-08-14 12:17:26","2023-08-14 12:17:27",
                 "2023-08-14 12:25:54","2023-08-14 12:34:48",
                 "2023-08-14 12:35:24","2023-08-14 12:40:19",
                 "2023-08-14 12:40:38","2023-08-14 12:46:28",
                 "2023-08-14 12:46:50","2023-08-14 12:55:44","2023-08-14 12:55:53",
                 "2023-08-14 12:57:10","2023-08-14 12:59:25",
                 "2023-08-14 13:01:04","2023-08-14 13:08:45",
                 "2023-08-14 13:10:50","2023-08-14 13:13:28",
                 "2023-08-14 13:15:50","2023-08-14 13:22:01"),
  END_TIME = c("2023-08-14 12:08:48","2023-08-14 12:09:49","2023-08-14 12:11:22",
               "2023-08-14 12:13:17","2023-08-14 12:14:56",
               "2023-08-14 13:10:50","2023-08-14 12:15:05",
               "2023-08-14 12:17:04","2023-08-14 12:17:04",
               "2023-08-14 12:17:05","2023-08-14 12:17:25","2023-08-14 12:17:26",
               "2023-08-14 12:17:27","2023-08-14 12:25:54",
               "2023-08-14 12:34:48","2023-08-14 12:35:24",
               "2023-08-14 12:40:19","2023-08-14 12:40:38",
               "2023-08-14 12:46:28","2023-08-14 12:46:50",
               "2023-08-14 12:55:44","2023-08-14 12:55:53","2023-08-14 12:57:10",
               "2023-08-14 12:59:25","2023-08-14 13:01:04",
               "2023-08-14 13:08:45","2023-08-14 13:13:28",
               "2023-08-14 13:14:54","2023-08-14 13:22:01",
               "2023-08-14 13:18:36","2023-08-14 13:24:10"),
  ELAPSED_TIME = c(1.1326,1.0197,
                   1.5486,1.915,1.65,56.4333,0.15,1.9833,0,0.0138,
                   0.3362,0.0167,0.0167,8.455,8.893,0.6012,
                   4.9153,0.3241,5.8319,0.3597,8.9031,0.15,1.2892,
                   2.2385,1.6556,7.6833,4.7208,4.06662,8.5425,
                   2.76666,2.1566),
  DATE_ID = c("2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00",
              "2023-08-14 00:00:00","2023-08-14 00:00:00"),
  ID = c(237214724L,
         237214724L,237214724L,237214724L,235565710L,NA,
         235565710L,232849697L,232849697L,232849697L,
         232849697L,232849697L,232849697L,233163426L,234067478L,
         234067478L,234067478L,234067478L,235266765L,
         235266765L,239167461L,239167461L,239167461L,
         234182936L,238535268L,236468967L,236468967L,NA,
         236019152L,NA,236918867L),
  Category = c("Cat1","Cat1",
               "Cat1","Cat7","Cat1","Cat4","Cat3","Cat3","Cat1",
               "Cat2","Cat3","Cat1","Cat1","Cat5","Cat1",
               "Cat1","Cat5","Cat3","Cat5","Cat3","Cat1",
               "Cat3","Cat5","Cat1","Cat2","Cat1","Cat1","Cat6",
               "Cat1","Cat6","Cat3")
) %>%
  mutate(START_TIME = as.POSIXct(START_TIME), END_TIME = as.POSIXct(END_TIME))

df |> mutate(MaxEnd = as.POSIXct(cummax(as.numeric(END_TIME)), 
                              origin  = "1970-01-01 00:00:00"),
             Flag = MaxEnd  > END_TIME) |> 
  filter(is.na(ID) | !Flag)
#>             START_TIME            END_TIME ELAPSED_TIME             DATE_ID
#> 1  2023-08-14 12:07:40 2023-08-14 12:08:48      1.13260 2023-08-14 00:00:00
#> 2  2023-08-14 12:08:48 2023-08-14 12:09:49      1.01970 2023-08-14 00:00:00
#> 3  2023-08-14 12:09:49 2023-08-14 12:11:22      1.54860 2023-08-14 00:00:00
#> 4  2023-08-14 12:11:22 2023-08-14 12:13:17      1.91500 2023-08-14 00:00:00
#> 5  2023-08-14 12:13:17 2023-08-14 12:14:56      1.65000 2023-08-14 00:00:00
#> 6  2023-08-14 12:14:24 2023-08-14 13:10:50     56.43330 2023-08-14 00:00:00
#> 7  2023-08-14 13:08:45 2023-08-14 13:13:28      4.72080 2023-08-14 00:00:00
#> 8  2023-08-14 13:10:50 2023-08-14 13:14:54      4.06662 2023-08-14 00:00:00
#> 9  2023-08-14 13:13:28 2023-08-14 13:22:01      8.54250 2023-08-14 00:00:00
#> 10 2023-08-14 13:15:50 2023-08-14 13:18:36      2.76666 2023-08-14 00:00:00
#> 11 2023-08-14 13:22:01 2023-08-14 13:24:10      2.15660 2023-08-14 00:00:00
#>           ID Category              MaxEnd  Flag
#> 1  237214724     Cat1 2023-08-14 12:08:48 FALSE
#> 2  237214724     Cat1 2023-08-14 12:09:49 FALSE
#> 3  237214724     Cat1 2023-08-14 12:11:22 FALSE
#> 4  237214724     Cat7 2023-08-14 12:13:17 FALSE
#> 5  235565710     Cat1 2023-08-14 12:14:56 FALSE
#> 6         NA     Cat4 2023-08-14 13:10:50 FALSE
#> 7  236468967     Cat1 2023-08-14 13:13:28 FALSE
#> 8         NA     Cat6 2023-08-14 13:14:54 FALSE
#> 9  236019152     Cat1 2023-08-14 13:22:01 FALSE
#> 10        NA     Cat6 2023-08-14 13:22:01  TRUE
#> 11 236918867     Cat3 2023-08-14 13:24:10 FALSE

Created on 2024-03-20 with reprex v2.0.2

Thanks @FJCC !
Its not exactly giving me in the sequential order. Hoeveer, I am able to work aound it based on your recommendation and then adding my previous try to it. Combination of both looks like a good work around.

df %>% mutate(MaxEnd = as.POSIXct(cummax(as.numeric(END_TIME)), 
                              origin  = "1970-01-01 00:00:00"),
             Flag = MaxEnd  > END_TIME) %>%
  filter(is.na(ID) | !Flag) %>%
  mutate(END_TIME = case_when(
    END_TIME > lead(START_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lead(START_TIME),
    TRUE ~ END_TIME
  )) %>%
  mutate(START_TIME = case_when(
    START_TIME < lag(END_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lag(END_TIME),
    TRUE ~ START_TIME
  )) 

However, this workaoround doesn't seem to work if I wanted to perform it on a grouped data. Example, this was a reprex of one person . If I had another person and few more people in data., how do we use your recommended ciode How does this Flag wouold work with group_by option

df |> mutate(MaxEnd = as.POSIXct(cummax(as.numeric(END_TIME)), 
                              origin  = "1970-01-01 00:00:00"),
             Flag = MaxEnd  > END_TIME) |> 
  filter(is.na(ID) | !Flag)
~~~

If I make two copies the data you provided, add a Person column to each and then run your code with a group_by() at the beginning, it seems to work. Is there a problem with that?

df %>% group_by(Person) |> 
  mutate(MaxEnd = as.POSIXct(cummax(as.numeric(END_TIME)), 
                                            origin  = "1970-01-01 00:00:00"),
                        Flag = MaxEnd  > END_TIME) %>%
  filter(is.na(ID) | !Flag) %>%
  mutate(END_TIME = case_when(
    END_TIME > lead(START_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lead(START_TIME),
    TRUE ~ END_TIME
  )) %>%
  mutate(START_TIME = case_when(
    START_TIME < lag(END_TIME) & str_detect(Category, "Cat1|Cat2|Cat3|Cat5|Cat7") ~ lag(END_TIME),
    TRUE ~ START_TIME
  )) 

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.