Mutate count by group and condition

budugulo · November 16, 2021, 10:35pm

code	name
123	A
123	B
666	X
999	M
999	M
555	P
555	Q
555	R

I want to mutate a variable wanted, which will count the number of different names for each code. For example, for code 123 the new variable will take the value 2 but for code 999 the value will be 1 since names are the same.

Here is the desired output:

code	name	wanted
123	A	2
123	B	2
666	X	1
999	M	1
999	M	1
555	P	3
555	Q	3
555	R	3

library(dplyr)
# toy data
df <- tibble(
  code = c(123, 123, 666, 999, 999, 555, 555, 555),
  name = c("A", "B", "X", "M", "M", "P", "Q", "R"),
  wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
)

FJCC · November 16, 2021, 11:15pm

I think this is what you want, though the count of 999 is 2 not 1.

library(dplyr)
# toy data
df <- tibble(
   code = c(123, 123, 666, 999, 999, 555, 555, 555),
   name = c("A", "B", "X", "M", "M", "P", "Q", "R")#,
   #wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
 )
df
# A tibble: 8 x 2
   code name 
  <dbl> <chr>
1   123 A    
2   123 B    
3   666 X    
4   999 M    
5   999 M    
6   555 P    
7   555 Q    
8   555 R    
df <- df |> group_by(code) |>  mutate( wanted = n())
df
# A tibble: 8 x 3
# Groups:   code [4]
   code name  wanted
  <dbl> <chr>  <int>
1   123 A          2
2   123 B          2
3   666 X          1
4   999 M          2
5   999 M          2
6   555 P          3
7   555 Q          3
8   555 R          3

budugulo · November 17, 2021, 7:51am

Thanks for the reply. Unfortunately, I want the count of 999 to be 1 because it has only one distinct name - M.

mikecrobp · November 17, 2021, 10:42am

Until googling just now I would have gone through 2 group_by/summarise layers. (please excuse the change of pipes)

df <- df %>% group_by(code, name) %>% summarise() # get distinct rows
df <- df %>% group_by(code) %>% summarise(wanted = n()) # get countss

But it turns out you can have n_distinct in summarise:

df <- df %>% group_by(code) %>% summarise(wanted = n_distinct()) 
df <- df %>% group_by(code) %>% summarise() # get distinct rows

budugulo · November 17, 2021, 11:50am

Thanks for the tip on ´n_distinct()´.

The following code solves the problem

library(dplyr)
# toy data
df <- tibble(
  code = c(123, 123, 666, 999, 999, 555, 555, 555),
  name = c("A", "B", "X", "M", "M", "P", "Q", "R")#,
  #wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
)
df %>% group_by(code) %>%
  mutate(wanted = n_distinct(name)) 
#> # A tibble: 8 x 3
#> # Groups:   code [4]
#>    code name  wanted
#>   <dbl> <chr>  <int>
#> 1   123 A          2
#> 2   123 B          2
#> 3   666 X          1
#> 4   999 M          1
#> 5   999 M          1
#> 6   555 P          3
#> 7   555 Q          3
#> 8   555 R          3

^{Created on 2021-11-17 by the reprex package (v2.0.1)}

system · December 8, 2021, 11:50am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.