# Grouping dates together into periods

I'm trying to solve a problem of grouping dates together based on certain length of time from a starting date. For example, given the following data:

``````library(dplyr)

test_df <- tibble::tribble(
~id, ~date,
1,'01/01/2011',
2,'01/05/2011',
3,'01/10/2011',
4,'02/01/2011',
5,'02/10/2011',
6,'02/11/2011',
7,'02/19/2011',
8,'03/01/2011',
9,'03/02/2011',
10,'03/03/2011',
11,'03/04/2011',
12,'04/29/2011',
13,'05/01/2011',
14,'05/02/2011',
15,'05/29/2011',
16,'05/30/2011',
17,'06/01/2011',
18,'06/02/2011',
19,'06/03/2011',
20,'06/10/2011'
) %>%
mutate(date = lubridate::mdy(date))
``````

the goal would be to group all dates within 28 days (this could be varied) from the minimum date. That is, at first pass, the starting date would be `01/01/2011` and we would group all dates 28 days out. Then, we would start over at the next non-grouped date which would be `02/01/2011` and so on until all the dates were grouped.

I realize this can be solved pretty easy with a for loop, but I'm wondering if anyone can think of any simple / creative solutions to solve this in a more tidyverse / functional way.

Thanks!

Is this what you mean?

``````library(tidyverse)

tibble::tribble(
~id, ~date,
1,'01/01/2011',
2,'01/05/2011',
3,'01/10/2011',
4,'02/01/2011',
5,'02/10/2011',
6,'02/11/2011',
7,'02/19/2011',
8,'03/01/2011',
9,'03/02/2011',
10,'03/03/2011',
11,'03/04/2011',
12,'04/29/2011',
13,'05/01/2011',
14,'05/02/2011',
15,'05/29/2011',
16,'05/30/2011',
17,'06/01/2011',
18,'06/02/2011',
19,'06/03/2011',
20,'06/10/2011'
) %>%
mutate(date = lubridate::mdy(date),
group = cut(date, "28 days"))
#> # A tibble: 20 x 3
#>       id date       group
#>    <dbl> <date>     <fct>
#>  1     1 2011-01-01 2011-01-01
#>  2     2 2011-01-05 2011-01-01
#>  3     3 2011-01-10 2011-01-01
#>  4     4 2011-02-01 2011-01-29
#>  5     5 2011-02-10 2011-01-29
#>  6     6 2011-02-11 2011-01-29
#>  7     7 2011-02-19 2011-01-29
#>  8     8 2011-03-01 2011-02-26
#>  9     9 2011-03-02 2011-02-26
#> 10    10 2011-03-03 2011-02-26
#> 11    11 2011-03-04 2011-02-26
#> 12    12 2011-04-29 2011-04-23
#> 13    13 2011-05-01 2011-04-23
#> 14    14 2011-05-02 2011-04-23
#> 15    15 2011-05-29 2011-05-21
#> 16    16 2011-05-30 2011-05-21
#> 17    17 2011-06-01 2011-05-21
#> 18    18 2011-06-02 2011-05-21
#> 19    19 2011-06-03 2011-05-21
#> 20    20 2011-06-10 2011-05-21
``````

Created on 2020-05-07 by the reprex package (v0.3.0)

Thanks for the reply, I'm actually looking to create the groups based on all dates that fall within 28 days after the start date. This could either be a number (i.e. 1, 2, 3) or the minimum date of that grouping. The expected output would look like this:

``````tibble::tribble(
~id, ~date, ~date_group
1,'01/01/2011','01/01/2011',
2,'01/05/2011','01/01/2011',
3,'01/10/2011','01/01/2011',
4,'02/01/2011','02/01/2011',
5,'02/10/2011','02/01/2011',
6,'02/11/2011','02/01/2011',
7,'02/19/2011','02/01/2011',
8,'03/01/2011','02/01/2011',
9,'03/02/2011','03/02/2011',
10,'03/03/2011','03/02/2011',
11,'03/04/2011','03/02/2011',
12,'04/29/2011','04/29/2011',
13,'05/01/2011','04/29/2011',
14,'05/02/2011','04/29/2011',
15,'05/29/2011','05/29/2011',
16,'05/30/2011','05/29/2011',
17,'06/01/2011','05/29/2011',
18,'06/02/2011','05/29/2011',
19,'06/03/2011','05/29/2011',
20,'06/10/2011','05/29/2011'
)
``````

We start at `01/01/2011` + 28 days would group all dates between `01/01/2011` and `01/29/2011`, then we go to `02/01/2011` + 28 days and group all the dates in that range (`02/01/2011` to `03/01/2011`), then `03/02/2011` + 28 days and so on. Basically starting with the minimum date + 28 days and re-indexing on the next minimum date that does not fall in the prior 28 day period.

The "incidence" package will do exactly what you want, I think. I use it to convert my migraine numbers to a 28-day month basis.

Thanks! Looks like this doesn't quite get at it. Doing something like:

`incidence(dates, interval = "28 days")`

returns 28 day intervals that start from the very first date. However, after the first 28 days, the starting date needs to be the next available minimum date. For example, if we had dates 01/01/2011, 01/02/2011 and 01/01/2013 -- I wouldn't want 28 day intervals between 2011 - 2013, I would want the first 28 day interval to start on 01/01/2011 and then the next 28 day interval would start on 01/01/2013.

For example, this code with a for loop works but it gets pretty slow as the data frame grows (I left the grouping ID variable in there for testing):

``````library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union

date_df <- tibble::tribble(
~id, ~date,
1,'01/01/2011',
1,'01/05/2011',
1,'01/10/2011',
1,'02/01/2011',
1,'02/10/2011',
1,'02/11/2011',
1,'02/19/2011',
1,'03/01/2011',
1,'03/02/2011',
1,'03/03/2011',
1,'03/04/2011',
2,'04/29/2011',
2,'05/01/2011',
2,'05/02/2011',
2,'05/29/2011',
3,'05/30/2011',
3,'06/01/2011',
3,'06/02/2011',
3,'06/03/2011',
3,'06/10/2011'
) %>%
mutate(date = lubridate::mdy(date))

create_date_periods <- function(dates, time_period = 28) {

# create a vector to hold the results
return_vector <- structure(integer(length(dates)), class = "Date")

for(i in seq_along(dates)) {
# if this is the first record, set to minimum
if (i == 1) min_date <- dates[i]

# if date less than minimum date + time period, use minimum date
if(dates[i] <= min_date + time_period) {

return_vector[i] <- min_date

# otherwise update minimum date to current date
} else {
min_date <- dates[i]
return_vector[i] <- min_date
}}

return(return_vector)
}

# testing
date_df %>%
# group_by(id) %>%
mutate(
period_start = create_date_periods(date, time_period = 28)
)
#> # A tibble: 20 x 3
#>       id date       period_start
#>    <dbl> <date>     <date>
#>  1     1 2011-01-01 2011-01-01
#>  2     1 2011-01-05 2011-01-01
#>  3     1 2011-01-10 2011-01-01
#>  4     1 2011-02-01 2011-02-01
#>  5     1 2011-02-10 2011-02-01
#>  6     1 2011-02-11 2011-02-01
#>  7     1 2011-02-19 2011-02-01
#>  8     1 2011-03-01 2011-02-01
#>  9     1 2011-03-02 2011-03-02
#> 10     1 2011-03-03 2011-03-02
#> 11     1 2011-03-04 2011-03-02
#> 12     2 2011-04-29 2011-04-29
#> 13     2 2011-05-01 2011-04-29
#> 14     2 2011-05-02 2011-04-29
#> 15     2 2011-05-29 2011-05-29
#> 16     3 2011-05-30 2011-05-29
#> 17     3 2011-06-01 2011-05-29
#> 18     3 2011-06-02 2011-05-29
#> 19     3 2011-06-03 2011-05-29
#> 20     3 2011-06-10 2011-05-29
``````

Created on 2020-05-08 by the reprex package (v0.3.0)

This is a little hacky but still simpler than a loop

``````library(tidyverse)

tibble::tribble(
~id, ~date,
1,'01/01/2011',
2,'01/05/2011',
3,'01/10/2011',
4,'02/01/2011',
5,'02/10/2011',
6,'02/11/2011',
7,'02/19/2011',
8,'03/01/2011',
9,'03/02/2011',
10,'03/03/2011',
11,'03/04/2011',
12,'04/29/2011',
13,'05/01/2011',
14,'05/02/2011',
15,'05/29/2011',
16,'05/30/2011',
17,'06/01/2011',
18,'06/02/2011',
19,'06/03/2011',
20,'06/10/2011'
) %>%
mutate(date = lubridate::mdy(date),
group = cut(date, "27 days")) %>%
group_by(group) %>%
mutate(date_group = min(date)) %>%
ungroup() %>%
select(-group)
#> # A tibble: 20 x 3
#>       id date       date_group
#>    <dbl> <date>     <date>
#>  1     1 2011-01-01 2011-01-01
#>  2     2 2011-01-05 2011-01-01
#>  3     3 2011-01-10 2011-01-01
#>  4     4 2011-02-01 2011-02-01
#>  5     5 2011-02-10 2011-02-01
#>  6     6 2011-02-11 2011-02-01
#>  7     7 2011-02-19 2011-02-01
#>  8     8 2011-03-01 2011-03-01
#>  9     9 2011-03-02 2011-03-01
#> 10    10 2011-03-03 2011-03-01
#> 11    11 2011-03-04 2011-03-01
#> 12    12 2011-04-29 2011-04-29
#> 13    13 2011-05-01 2011-04-29
#> 14    14 2011-05-02 2011-04-29
#> 15    15 2011-05-29 2011-05-29
#> 16    16 2011-05-30 2011-05-29
#> 17    17 2011-06-01 2011-05-29
#> 18    18 2011-06-02 2011-05-29
#> 19    19 2011-06-03 2011-05-29
#> 20    20 2011-06-10 2011-05-29
``````

Created on 2020-05-08 by the reprex package (v0.3.0)

Thanks! That's close but it still misclassified some of the dates, which I think is due to the `cut(date, "27 days")` starting relative to the first date and not the minimum of the next period. The `2011-03-01` date should be grouped with `2011-02-01` since it is 28 days after `2011-02-01`. However, when using `cut`, it belongs to a different group since the first three groups become:

`2011-01-01`
`2011-01-29`
`2011-02-26`

all of these are 28 days relative to the very first date of `2011-01-01`

Using a while loop is faster and more vectorized than a for loop but it doesn't feel right putting a while loop inside a function:

``````create_date_periods <- function(dates, time_period = 28) {
# TODO: add some error checking

# create a vector to hold the results
return_vector <- structure(rep(NA_real_, length(dates)), class = "Date")

# if any date in the vector is still missing, keep going
while(any(is.na(return_vector))) {

# set minimum date amongst the values that are missing
min_date <- min(dates[is.na(return_vector)])

# if the date falls in range of interest, set it to the minimum date
return_vector[dates >= min_date & dates <= min_date + time_period] <- min_date
}

return(return_vector)
}
``````

To do the 28 day grouping on an annualized basis, create a "year" variable, and then do incidence inside a group_by(year).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.