Recount sequence based on multiple columns

jasongeslois · November 7, 2021, 4:33pm

Hi I’m trying to find a solution so that I can achieve the outcome below with a new column that counts the number of occurrences of each group within each id where the first occurrence is the earliest date, and second occurrence is next date of that group, within that id etc. And each group occurrence is specific to each id so each new group within each id should start at 1. I’ve tried several different ways of arrange, group_by, sequence, etc with no luck. Thank you for your help.


dat <- data.frame(
  id = c(1, 1, 1, 1, 2, 2, 2, 2),
  date = c(
    "2020-12-24",
    "2020-12-24",
    "2020-12-25",
    "2020-12-26",
    "2020-12-24",
    "2020-12-24",
    "2020-12-25",
    "2020-12-27"
  ),
  group = c("A", "B", "A", "C", "C", "A", "C", "C")
)
dat
 

id	date	group
<dbl>	<fct>	<fct>
1	2020-12-24	A
1	2020-12-24	B
1	2020-12-25	A
1	2020-12-26	C
2	2020-12-24	C
2	2020-12-24	A
2	2020-12-25	C
2	2020-12-27	C


answer <- dat %>%
  mutate(order_count = c(1, 1, 2, 1, 1, 1, 2, 3))
answer

id	date	group	order_count
<dbl>	<fct>	<fct>	<dbl>
1	2020-12-24	A	1
1	2020-12-24	B	1
1	2020-12-25	A	2
1	2020-12-26	C	1
2	2020-12-24	C	1
2	2020-12-24	A	1
2	2020-12-25	C	2
2	2020-12-27	C	3

FJCC · November 7, 2021, 5:38pm

Like this?

dat |> group_by(id,group) |> 
+   mutate(Order_count=row_number())
# A tibble: 8 x 4
# Groups:   id, group [5]
     id date       group Order_count
  <dbl> <chr>      <chr>        <int>
1     1 2020-12-24 A                1
2     1 2020-12-24 B                1
3     1 2020-12-25 A                2
4     1 2020-12-26 C                1
5     2 2020-12-24 C                1
6     2 2020-12-24 A                1
7     2 2020-12-25 C                2
8     2 2020-12-27 C                3

jasongeslois · November 7, 2021, 9:59pm

That works, except for if a date happened to be earlier later on as it would always need to align to be the first one on the first date. For example, if you add a row to this data for
id= 2
date = 2020-12-23
group= A

It makes the new order_count be a 2 where it would need to be a 1 being that 12/23 was now the new first date. In the data I’m working with, I’ve tried arrange with it for the dates but im not sure if I can just do that as I have a join step later on that gets thrown off. Thanks again for your help.

FJCC · November 7, 2021, 11:28pm

I'm not sure I understood what you need. In this version, within each id and group, I rank the dates. I added an id=2, date=2020-12-23, group=A data point as the last row of the data set.

dat
  id       date group
1  1 2020-12-24     A
2  1 2020-12-24     B
3  1 2020-12-25     A
4  1 2020-12-26     C
5  2 2020-12-24     C
6  2 2020-12-24     A
7  2 2020-12-25     C
8  2 2020-12-27     C
9  2 2020-12-23     A
> 
> dat |> group_by(id,group) |> 
+   mutate(Order_count=rank(date))
# A tibble: 9 x 4
# Groups:   id, group [5]
     id date       group Order_count
  <dbl> <chr>      <chr>       <dbl>
1     1 2020-12-24 A               1
2     1 2020-12-24 B               1
3     1 2020-12-25 A               2
4     1 2020-12-26 C               1
5     2 2020-12-24 C               1
6     2 2020-12-24 A               2
7     2 2020-12-25 C               2
8     2 2020-12-27 C               3
9     2 2020-12-23 A               1

jasongeslois · November 8, 2021, 2:04am

No, that’s exactly what I needed, thank you so much. I knew I was just not getting something right in the correct order. Thank you again.

system · November 15, 2021, 2:05am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.