How do I generate a count in R within mutate()?

Qomisa · May 5, 2020, 12:26pm

Good day, everyone

I'm new in using R and have learnt some useful tricks. I'm trying to get a count of number in column 'P_ID' and create a new column 'count' where I look at what number appears in P_ID and count same numbers in the same column and enter the retuned vaclue in the corresponding cell in 'count' column. I can do this in excel using the 'countif' function (e.g. =COUNTIF($A$2:$A$20,A2) ) and here are results:
P_ID Count
18 1
1 1
4 3
5 2
10 2
12 1
2 2
5 2
2 2
10 2
4 3
4 3
8 1
9 2
7 2
17 1
7 2
19 1
9 2
I've tried the following in R without getting the desired results:
library(dplyr)
dt<-tibble(P_ID=c(18,1,4,5,10,12,2,5,2,10,4,4,8,9,7,17,7,19,9))
dt%>%
mutate(count=for (x in 1:length(P_ID)) {sum(P_ID==P_ID)})%>%
View()
and only get:
P_ID
1 18
2 1
3 4
4 5
5 10
6 12
7 2
8 5
9 2
10 10
11 4
12 4
13 8
14 9
15 7
16 17
17 7
18 19
19 9
Showing 1 to 11 of 19 entries, 1 total columns

I'll appreciate it if somebody can advise me what to do and where I'm going wrong.
Thank you.

mfherman · May 5, 2020, 12:45pm

Hi @Qomisa and welcome! Thanks for sharing your sample data and your expected output.

You were on the right track with using mutate() to create a new column, but there is actually a built in function in dplyr to count the number of rows per group -- it is called n(). In your example, you wanted to get the number of rows per P_ID, so you need to group by that variable and then create a new count variable.

library(dplyr)

df <- structure(list(P_ID = c(18, 1, 4, 5, 10, 12, 2, 5, 2, 10, 4, 
4, 8, 9, 7, 17, 7, 19, 9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -19L))

df %>% 
  group_by(P_ID) %>% 
  mutate(count = n())
#> # A tibble: 19 x 2
#> # Groups:   P_ID [12]
#>     P_ID count
#>    <dbl> <int>
#>  1    18     1
#>  2     1     1
#>  3     4     3
#>  4     5     2
#>  5    10     2
#>  6    12     1
#>  7     2     2
#>  8     5     2
#>  9     2     2
#> 10    10     2
#> 11     4     3
#> 12     4     3
#> 13     8     1
#> 14     9     2
#> 15     7     2
#> 16    17     1
#> 17     7     2
#> 18    19     1
#> 19     9     2

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

Qomisa · May 5, 2020, 1:26pm

Hi Matt,

Thank you very much - I'm sorted.

Regards,
Mbuso

mfherman · May 5, 2020, 1:56pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

siddharthprabhu · May 5, 2020, 2:10pm

Just to add on to mfherman's excellent answer, this is such a common operation that dplyr has a dedicated verb for this task. add_count() is essentially shorthand for group_by() the variables passed to it, add a group-wise count of observations in a new column named n and and then ungroup(). You can control the new column's name with the name parameter.

So df %>% add_count(P_ID, name = "count") will get you the same result.

Qomisa · May 6, 2020, 9:57am

Hi Siddharth,

I appreciate it.

Regards,
Mbuso

system · May 13, 2020, 9:57am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.