Calculation of the occurrence of two consecutive sequences (Lag1) as a workaround for the problem in Lag 2 – the unknown middle segment

Hallo there, I need help pls. I want to calculate a Lag 2 analysis in which I want to look at a specific sequence. I know that with a Lag 2 calculation, it is not possible to specify which event lies in between, e.g. behaviour A → behaviour X unknown → behaviour C. To get around this, I thought I could examine two Lag 1s, such as behaviour A → behaviour B and behaviour B → behaviour C, and calculate whether these appear in my data set. However, my current R code only calculates the sum of behaviour A → behaviour B and behaviour B → behaviour C, and not the actual joint occurrence of both sequences. Does anyone have any ideas for the calculation code?

Thx

You could transform your data frame to include only those rows where event C is preceded by B and A

library(dplyr)
library(purrr)

df <- tibble::tribble(
  ~event, ~time,
  "A", "00:01",
  "A", "00:02",
  "C", "00:03",
  "A", "00:04",
  "B", "00:05",
  "C", "00:06",
  "B", "00:07",
  "C", "00:08",
  "D", "00:09"
)

df |> 
  mutate(
    l1 = lag(event), 
    l2 = lag(event, n =2)) |> 
  rowwise() |> 
  mutate(
    seq =  list(paste0(c(l2, l1, event), sep = "", collapse = ""))
    ) |> 
  ungroup() |> 
  slice(
    unlist(
      map(which(seq == "ABC"), ~ (.x-2):.x
      )
    )
  ) |> 
  select (-c(l1, l2, seq))
1 Like

thanks, i´ll try it!