I am trying to summarise data on a rolling basis depending on what I input.
For example, I could have the following input grouping_value <- 7 which I would then hope to have the earliest Date in my data frame group by. So if my earliest Date is 2021-01-01 then I would hope that a new column would be mutated like Group_2021-01-01_to_Group_2021-01-08 and the next group would be Group_2021-01-09_to_Group_2021-01-16 and so on. It would group according to the grouping_value.
Raw data:
ID
Date
Sales
1
2021-01-01
22
1
2022-01-03
32
1
2021-01-11
42
1
2022-01-16
32
1
2022-01-13
27
2
2022-01-12
42
grouping_value <- 7
This grouping_value would take the earliest date of 2021-01-01 and add 7 to get the group until it has cycled through the whole dataframe. I could also change the group_value <- 30 to group according to 30 days theoretically .
I could then group_by this Group.
The only thing that has occurred to me is to do a case_when() to group by dates but that would lack the dynamic element. I'd appreciate any help or ideas. Thank you
Here is a solution with two differences from your request.
The Group labels are simply numbers, not the text you show.
When the grouping_value is 7, the groups have 7 days. In your example, a value of 7 leads to groups of 8 days, e.g. Jan. 1 - Jan. 8. Is that really what you want?
I did not drop the DateDelta column because I think it is easier to see how the groups are formed if that is present. It can easily be dropped.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
StartDay <- ymd("2022-01-01")
EndDay <- ymd("2022-02-10")
DF <- data.frame(Date = seq.Date(from = StartDay,
to = EndDay, by = 5))
DF
#> Date
#> 1 2022-01-01
#> 2 2022-01-06
#> 3 2022-01-11
#> 4 2022-01-16
#> 5 2022-01-21
#> 6 2022-01-26
#> 7 2022-01-31
#> 8 2022-02-05
#> 9 2022-02-10
GroupFunc <- function(DAT, GrpVal) {
DAT <- DAT |> mutate(DateDelta = as.numeric(Date - min(Date)),
Group = DateDelta %/% GrpVal)
}
NewDF <- GroupFunc(DF, 7)
NewDF
#> Date DateDelta Group
#> 1 2022-01-01 0 0
#> 2 2022-01-06 5 0
#> 3 2022-01-11 10 1
#> 4 2022-01-16 15 2
#> 5 2022-01-21 20 2
#> 6 2022-01-26 25 3
#> 7 2022-01-31 30 4
#> 8 2022-02-05 35 5
#> 9 2022-02-10 40 5
The %/% operator is for integer division. It returns just the integer part of the division calculation. So 4 %/% 5 = 0 because 4/5 = 0.8 and only the zero is kept. It is very useful for grouping things.