I am trying to compute the percent change in sales in California cities from 2012 to 2022. If every city had sales in both years and all years in-between, then I could just use lag() to go back 10 rows, but I want to go back 10 years. In my example dataset, both Oakland and LA are missing 2012 data. A city that is missing one or both years should not appear in my final result.
I actually solved the problem by adding a column with 2022 – year and then filtering on that value being 10. I was also able to solve the problem with base R techniques. But I am hoping there is an option for a tidyverse function that simplifies the process. I looked at slider. I think I need a “lag_index()” instead of slide_index().
Is there a tidy function or a function in another package that solves this problem?
https://cran.r-project.org/web/packages/slider/vignettes/slider.html
Only cities that had sales in both 2012 and 2022 should appear in the output.
# sample data
tax = data.frame(city = c(rep('SF', 22), rep('Oakland', 21), rep('LA', 3), rep('SJ', 22)),
year = c(2001:2022, 2001:2011, 2013:2022, 2020:2022, 2001:2022),
sales = c(1:22, rep(5, 11), 0, rep(6, 9), 18:20, seq(25, by = 2, length.out = 22)))
tax
# inelegant solution
tax |>
group_by(city) |>
mutate(diff12_22 = year - lag(year, 10),
sales18 = lag(sales, 4, order_by = year),
sales12 = lag(sales, 10, order_by = year),
change12_22 = (sales/sales12-1)) |>
filter(year == 2022) |>
filter(diff12_22 == 10) |>
drop_na() |> ungroup() |>
mutate(rank12_22 = min_rank(desc(change12_22))) |>
print(n = 50)