Mutate count by group and condition

code name
123 A
123 B
666 X
999 M
999 M
555 P
555 Q
555 R

I want to mutate a variable wanted, which will count the number of different names for each code. For example, for code 123 the new variable will take the value 2 but for code 999 the value will be 1 since names are the same.

Here is the desired output:

code name wanted
123 A 2
123 B 2
666 X 1
999 M 1
999 M 1
555 P 3
555 Q 3
555 R 3
library(dplyr)
# toy data
df <- tibble(
  code = c(123, 123, 666, 999, 999, 555, 555, 555),
  name = c("A", "B", "X", "M", "M", "P", "Q", "R"),
  wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
)

I think this is what you want, though the count of 999 is 2 not 1.

library(dplyr)
# toy data
df <- tibble(
   code = c(123, 123, 666, 999, 999, 555, 555, 555),
   name = c("A", "B", "X", "M", "M", "P", "Q", "R")#,
   #wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
 )
df
# A tibble: 8 x 2
   code name 
  <dbl> <chr>
1   123 A    
2   123 B    
3   666 X    
4   999 M    
5   999 M    
6   555 P    
7   555 Q    
8   555 R    
df <- df |> group_by(code) |>  mutate( wanted = n())
df
# A tibble: 8 x 3
# Groups:   code [4]
   code name  wanted
  <dbl> <chr>  <int>
1   123 A          2
2   123 B          2
3   666 X          1
4   999 M          2
5   999 M          2
6   555 P          3
7   555 Q          3
8   555 R          3

Thanks for the reply. Unfortunately, I want the count of 999 to be 1 because it has only one distinct name - M.

Until googling just now I would have gone through 2 group_by/summarise layers. (please excuse the change of pipes)

df <- df %>% group_by(code, name) %>% summarise() # get distinct rows
df <- df %>% group_by(code) %>% summarise(wanted = n()) # get countss

But it turns out you can have n_distinct in summarise:

df <- df %>% group_by(code) %>% summarise(wanted = n_distinct()) 
df <- df %>% group_by(code) %>% summarise() # get distinct rows

Thanks for the tip on ´n_distinct()´.

The following code solves the problem

library(dplyr)
# toy data
df <- tibble(
  code = c(123, 123, 666, 999, 999, 555, 555, 555),
  name = c("A", "B", "X", "M", "M", "P", "Q", "R")#,
  #wanted = c(2, 2, 1, 1, 1, 3, 3, 3)
)
df %>% group_by(code) %>%
  mutate(wanted = n_distinct(name)) 
#> # A tibble: 8 x 3
#> # Groups:   code [4]
#>    code name  wanted
#>   <dbl> <chr>  <int>
#> 1   123 A          2
#> 2   123 B          2
#> 3   666 X          1
#> 4   999 M          1
#> 5   999 M          1
#> 6   555 P          3
#> 7   555 Q          3
#> 8   555 R          3

Created on 2021-11-17 by the reprex package (v2.0.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.