Here is an attempt at a faithful translation of your solution into pure tidyverse, which is interesting to compare:
visits |>
# Find which visits are Sasquatch
mutate(sasquatch = visitor == "Sasquatch") |>
# Isolate those visits
group_by(sasquatch) |>
# Add a flag (1 if a new visit, 0 if a continuation of a visit).
mutate(new_visit = if_else(lag(day) + 1 < day, 1, 0)) |>
# Fix the NA in the first row (caused by lack of a previous row when lagging).
mutate(new_visit = if_else(day == 1, 1, new_visit)) |>
# Add the "visit" value and drop the new_visit column (no longer needed).
mutate(visit = cumsum(new_visit)) |> select(!new_visit) |>
# Replace the visit values for the Sasquatch visits.
ungroup()
What's nice about your original solution is that you don't have to worry about what to do with the non-Sasquatch rows. In my translation, it happens that lag
takes care of them in the end because cumsum
transmits the initial NA
to the rest of the non-Sasquatch rows, but that's just a lucky accident.
The principal trade-off seems to be the clearer focus on Sasquatch rows in your original solution for not having to make a copy of visits
in the translation.
The only significant observation I would make is that your solution depends on the argument to set.seed()
, which happens to have Sasquatch visit on day 1, but I didn't make that part of the question clear. (I'll make a quick edit to fix that.)