Hi I’m trying to find a solution so that I can achieve the outcome below with a new column that counts the number of occurrences of each group within each id where the first occurrence is the earliest date, and second occurrence is next date of that group, within that id etc. And each group occurrence is specific to each id so each new group within each id should start at 1. I’ve tried several different ways of arrange, group_by, sequence, etc with no luck. Thank you for your help.
dat <- data.frame(
id = c(1, 1, 1, 1, 2, 2, 2, 2),
date = c(
"2020-12-24",
"2020-12-24",
"2020-12-25",
"2020-12-26",
"2020-12-24",
"2020-12-24",
"2020-12-25",
"2020-12-27"
),
group = c("A", "B", "A", "C", "C", "A", "C", "C")
)
dat
id date group
<dbl> <fct> <fct>
1 2020-12-24 A
1 2020-12-24 B
1 2020-12-25 A
1 2020-12-26 C
2 2020-12-24 C
2 2020-12-24 A
2 2020-12-25 C
2 2020-12-27 C
answer <- dat %>%
mutate(order_count = c(1, 1, 2, 1, 1, 1, 2, 3))
answer
id date group order_count
<dbl> <fct> <fct> <dbl>
1 2020-12-24 A 1
1 2020-12-24 B 1
1 2020-12-25 A 2
1 2020-12-26 C 1
2 2020-12-24 C 1
2 2020-12-24 A 1
2 2020-12-25 C 2
2 2020-12-27 C 3
dat |> group_by(id,group) |>
+ mutate(Order_count=row_number())
# A tibble: 8 x 4
# Groups: id, group [5]
id date group Order_count
<dbl> <chr> <chr> <int>
1 1 2020-12-24 A 1
2 1 2020-12-24 B 1
3 1 2020-12-25 A 2
4 1 2020-12-26 C 1
5 2 2020-12-24 C 1
6 2 2020-12-24 A 1
7 2 2020-12-25 C 2
8 2 2020-12-27 C 3
That works, except for if a date happened to be earlier later on as it would always need to align to be the first one on the first date. For example, if you add a row to this data for
id= 2
date = 2020-12-23
group= A
It makes the new order_count be a 2 where it would need to be a 1 being that 12/23 was now the new first date. In the data I’m working with, I’ve tried arrange with it for the dates but im not sure if I can just do that as I have a join step later on that gets thrown off. Thanks again for your help.
I'm not sure I understood what you need. In this version, within each id and group, I rank the dates. I added an id=2, date=2020-12-23, group=A data point as the last row of the data set.
dat
id date group
1 1 2020-12-24 A
2 1 2020-12-24 B
3 1 2020-12-25 A
4 1 2020-12-26 C
5 2 2020-12-24 C
6 2 2020-12-24 A
7 2 2020-12-25 C
8 2 2020-12-27 C
9 2 2020-12-23 A
>
> dat |> group_by(id,group) |>
+ mutate(Order_count=rank(date))
# A tibble: 9 x 4
# Groups: id, group [5]
id date group Order_count
<dbl> <chr> <chr> <dbl>
1 1 2020-12-24 A 1
2 1 2020-12-24 B 1
3 1 2020-12-25 A 2
4 1 2020-12-26 C 1
5 2 2020-12-24 C 1
6 2 2020-12-24 A 2
7 2 2020-12-25 C 2
8 2 2020-12-27 C 3
9 2 2020-12-23 A 1