Which order of group_by, count, summarize, or case_when to achieve this result?

So below is some example data, as well as the result I hope to achieve. I've tried several different versions, of dplyr but can't seem to get it to work the way I want it to come out as:

df <- data.frame(name = c("name1", "name1", "name1", "name2", "name2", "name2", "name3", "name3", "name3"), 
D1 = c(0,0,0,1,0,0,2,3,1), 
D2 = c(0,1,0,3,4,0,0,0,0))

What I'm trying to get it to is where any number greater than 1 is counted per name for each name regardless of what that number is. So zero's are not counted, and any number above zero is counted as 1.

answer <- data.frame(name = c("name1", "name2", "name3"), 
D1 = c(0,1,3), 
D2 = c(1,2,0))

Thank you for your help.

I added two new tally variables to df to correspond the D1 and D2. If the value in those corresponding columns is greater than or equal to 1, I mutated a 1 to the tally column, otherwise I mutated zero.

df <- df %>% 
  group_by(name) %>% 
  mutate(
    D1Tally = ifelse(D1 >= 1, 1, 0),
    D2Tally = ifelse(D2 >= 1, 1, 0) 
  )

Now I can summarize the tally columns to arrive at the answer:

df %>% 
  group_by(name) %>% 
  summarize(
    D1 = sum(D1Tally), 
    D2 = sum(D2Tally)
  )

The result is:

  name     D1    D2
  <fct> <dbl> <dbl>
1 name1     0     1
2 name2     1     2
3 name3     3     0
1 Like

An alternative solution with one less step. Note that df is a function in R so I personally refrain from using it as an object name.

library(tidyverse)

df_examp <- data.frame(name = c("name1", "name1", "name1", "name2", "name2", "name2", "name3", "name3", "name3"), 
                 D1 = c(0,0,0,1,0,0,2,3,1), 
                 D2 = c(0,1,0,3,4,0,0,0,0))

df_examp %>%
   group_by(name) %>%
   summarise(
      D1=sum(D1>0),
      D2=sum(D2>0),
      .groups="drop"
   )
#> # A tibble: 3 x 3
#>   name     D1    D2
#>   <chr> <int> <int>
#> 1 name1     0     1
#> 2 name2     1     2
#> 3 name3     3     0

Created on 2021-09-28 by the reprex package (v2.0.1)

1 Like

Could you please explain what does

.groups="drop"

do in summarise() ?

kind regards,
Andrzej

Sure, a snippet from the documentation:

.groups Grouping structure of the result.

  • "drop_last": dropping the last level of grouping. This was the
    only supported option before version 1.0.0.
  • "drop": All levels of grouping are dropped.
  • "keep": Same grouping structure as .data.
  • "rowwise": Each row is its own group.

By default, if you don't use this argument, the last level of grouping is removed. This makes more sense in an example with multiple groups. See an example below:

library(tidyverse)
mtcars_gp <- mtcars %>%
   as_tibble() %>%
   group_by(cyl, vs)

# default .groups="drop_last"
# see how table is still grouped by cyl
mtcars_gp %>%
   summarise(cyl_n = n()) 
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
#> # A tibble: 5 x 3
#> # Groups:   cyl [3]
#>     cyl    vs cyl_n
#>   <dbl> <dbl> <int>
#> 1     4     0     1
#> 2     4     1    10
#> 3     6     0     3
#> 4     6     1     4
#> 5     8     0    14

# .groups = "drop"
# see how table is no longer grouped
mtcars_gp %>%
   summarise(cyl_n = n(), 
             .groups="drop")
#> # A tibble: 5 x 3
#>     cyl    vs cyl_n
#>   <dbl> <dbl> <int>
#> 1     4     0     1
#> 2     4     1    10
#> 3     6     0     3
#> 4     6     1     4
#> 5     8     0    14

# .groups = "keep"
# see how table is still grouped by both cyl and vs
mtcars_gp %>%
   summarise(cyl_n = n(), 
             .groups="keep")
#> # A tibble: 5 x 3
#> # Groups:   cyl, vs [5]
#>     cyl    vs cyl_n
#>   <dbl> <dbl> <int>
#> 1     4     0     1
#> 2     4     1    10
#> 3     6     0     3
#> 4     6     1     4
#> 5     8     0    14

Created on 2021-09-28 by the reprex package (v2.0.1)

Thank you, is it something similar then to ungroup () ?

Yes, .groups="drop" within summarize would have the same behavior as using ungroup() afterwards.

Thank you very much indeed for your kind reply,
best

Thank you so much, the explanations below also help tremendously. .

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.