I'm new in using R and have learnt some useful tricks. I'm trying to get a count of number in column 'P_ID' and create a new column 'count' where I look at what number appears in P_ID and count same numbers in the same column and enter the retuned vaclue in the corresponding cell in 'count' column. I can do this in excel using the 'countif' function (e.g. =COUNTIF($A$2:$A$20,A2) ) and here are results:
P_ID Count
18 1
1 1
4 3
5 2
10 2
12 1
2 2
5 2
2 2
10 2
4 3
4 3
8 1
9 2
7 2
17 1
7 2
19 1
9 2
I've tried the following in R without getting the desired results:
library(dplyr)
dt<-tibble(P_ID=c(18,1,4,5,10,12,2,5,2,10,4,4,8,9,7,17,7,19,9))
dt%>%
mutate(count=for (x in 1:length(P_ID)) {sum(P_ID==P_ID)})%>%
View()
and only get:
P_ID
1 18
2 1
3 4
4 5
5 10
6 12
7 2
8 5
9 2
10 10
11 4
12 4
13 8
14 9
15 7
16 17
17 7
18 19
19 9
Showing 1 to 11 of 19 entries, 1 total columns
I'll appreciate it if somebody can advise me what to do and where I'm going wrong.
Thank you.
Hi @Qomisa and welcome! Thanks for sharing your sample data and your expected output.
You were on the right track with using mutate() to create a new column, but there is actually a built in function in dplyr to count the number of rows per group -- it is called n(). In your example, you wanted to get the number of rows per P_ID, so you need to group by that variable and then create a new count variable.
If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:
Just to add on to mfherman's excellent answer, this is such a common operation that dplyr has a dedicated verb for this task. add_count() is essentially shorthand for group_by() the variables passed to it, add a group-wise count of observations in a new column named n and and then ungroup(). You can control the new column's name with the name parameter.
So df %>% add_count(P_ID, name = "count") will get you the same result.