I have a table with almost a million rows. I want to apply a rolling count of strings/"event" based on Date AND Category. So far, I was able to get the counts of events within each month and category using dplyr
:
df %>% group_by(Date, Category) %>% mutate(n_distinct(event))
I want to be able to modify this code so I can perform a rolling count based on specified window. I do have a column called "Num_Months" and it is category-specific (ex: Category A would be 6 months, category B would be 3 months, Category C would be 12 months, etc). My Date column has dates ranging from as early as 2012.
I have looked into the slider
package and "slide_index" seems like a good function for me to use as it's index specific (index would be my Date column) , but how would you include it in this code? I'm open to other options/solutions as well!
Here is a sample of my data:
df<- structure(list(Date = c("01-01-2020","01-01-2020", "01-01-2020", "01-01-2020","01-01-2020","01-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "03-01-2020", "03-01-2020", "04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020"), Category = c("A", "A", "A", "B", "C", "C", "A", "B", "B", "B", "C", "B", "C", "A", "C", "A", "B" ), Num_Months = c(6,6,6,3,12,12,6,3,3,3,12,3,12,6,12,6,3), Events= c("event1a", "event2a", "event1a", "event1b", "event1c", "event2c", "event3a", "event2b", "event3b", "event3b", "event3c", "event4b", "event4c", "event4a", "event5c", "event5a", "event 5b")), row.names = c(NA, 17L), class = "data.frame")
This is what I want the results to look like:
result <- structure(list(Date = c("01-01-2020","01-01-2020", "01-01-2020","02-01-2020", "02-01-2020", "02-01-2020", "03-01-2020", "03-01-2020","04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020"), Category = c("A", "B", "C","A", "B", "C", "B", "C", "A", "C", "A", "B" ), rolling= c(2, 1,2,3,3,3,4,4,4, 5,5,4)), row.names = c(NA, 12L), class = "data.frame")