Calculation of the occurrence of two consecutive sequences (Lag1) as a workaround for the problem in Lag 2 – the unknown middle segment

StephBZ · November 19, 2025, 1:30pm

Hallo there, I need help pls. I want to calculate a Lag 2 analysis in which I want to look at a specific sequence. I know that with a Lag 2 calculation, it is not possible to specify which event lies in between, e.g. behaviour A → behaviour X unknown → behaviour C. To get around this, I thought I could examine two Lag 1s, such as behaviour A → behaviour B and behaviour B → behaviour C, and calculate whether these appear in my data set. However, my current R code only calculates the sum of behaviour A → behaviour B and behaviour B → behaviour C, and not the actual joint occurrence of both sequences. Does anyone have any ideas for the calculation code?

Thx

mduvekot · November 19, 2025, 4:55pm

You could transform your data frame to include only those rows where event C is preceded by B and A

library(dplyr)
library(purrr)

df <- tibble::tribble(
  ~event, ~time,
  "A", "00:01",
  "A", "00:02",
  "C", "00:03",
  "A", "00:04",
  "B", "00:05",
  "C", "00:06",
  "B", "00:07",
  "C", "00:08",
  "D", "00:09"
)

df |> 
  mutate(
    l1 = lag(event), 
    l2 = lag(event, n =2)) |> 
  rowwise() |> 
  mutate(
    seq =  list(paste0(c(l2, l1, event), sep = "", collapse = ""))
    ) |> 
  ungroup() |> 
  slice(
    unlist(
      map(which(seq == "ABC"), ~ (.x-2):.x
      )
    )
  ) |> 
  select (-c(l1, l2, seq))

StephBZ · November 27, 2025, 10:34am

thanks, i´ll try it!

system · February 25, 2026, 10:35am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.