How to enumerate intervals in a sequence, like periods when cryptids come to visit

dromano · April 13, 2024, 11:27am

Here is an attempt at a faithful translation of your solution into pure tidyverse, which is interesting to compare:

visits |> 
  # Find which visits are Sasquatch
  mutate(sasquatch = visitor == "Sasquatch") |> 
  # Isolate those visits
  group_by(sasquatch) |> 
  # Add a flag (1 if a new visit, 0 if a continuation of a visit).
  mutate(new_visit = if_else(lag(day) + 1 < day, 1, 0)) |> 
  # Fix the NA in the first row (caused by lack of a previous row when lagging).
  mutate(new_visit = if_else(day == 1, 1, new_visit)) |> 
  # Add the "visit" value and drop the new_visit column (no longer needed).
  mutate(visit = cumsum(new_visit)) |> select(!new_visit) |> 
  # Replace the visit values for the Sasquatch visits.
  ungroup()

What's nice about your original solution is that you don't have to worry about what to do with the non-Sasquatch rows. In my translation, it happens that lag takes care of them in the end because cumsum transmits the initial NA to the rest of the non-Sasquatch rows, but that's just a lucky accident.

The principal trade-off seems to be the clearer focus on Sasquatch rows in your original solution for not having to make a copy of visits in the translation.

The only significant observation I would make is that your solution depends on the argument to set.seed(), which happens to have Sasquatch visit on day 1, but I didn't make that part of the question clear. (I'll make a quick edit to fix that.)