group_by and summarise: improve the code

Flm · March 6, 2023, 6:21pm

Suppose we need to calculate the relative abundance of positives and negatives for each individual year.
The procedure I used to use is as follows:

library(tidyverse)

# fake data
set.seed(1)
mydf <- tibble(
  id = 1:80,
  year = sample(2000:2010, 80, replace = T),
  result = sample(c("positive", "negative"), 80, replace = T)
)

# code
mydf %>% 
  group_by(year) %>% 
  mutate(count_by_year = n()) %>% # total for each year
  ungroup() %>% 
  group_by(year, result) %>%  
  summarise(count_year_res = n(), # counting of positives and negatives in each year
            perc = count_year_res/count_by_year*100) %>%  # relative abundance 
  unique()

To avoid the following message I can use reframe and everything works as expected.

Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame
  and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Is there a better way to achieve the result without having to use reframe() and then unique()?
Which method do you usually use (to have a better code)?

Result

> mydf %>% 
+   group_by(year) %>% 
+   mutate(count_by_year = n()) %>% 
+   ungroup() %>% 
+   group_by(year, result) %>% 
+   reframe(count_year_res = n(),
+             perc = count_year_res/count_by_year*100) %>% 
+   unique()
# A tibble: 21 × 4
    year result   count_year_res  perc
   <int> <chr>             <int> <dbl>
 1  2000 negative              6  85.7
 2  2000 positive              1  14.3
 3  2001 negative              3  60  
 4  2001 positive              2  40  
 5  2002 negative              2  40  
 6  2002 positive              3  60  
 7  2003 negative              3  50  
 8  2003 positive              3  50  
 9  2004 negative              4  57.1
10  2004 positive              3  42.9
# … with 11 more rows
# ℹ Use `print(n = ...)` to see more rows

nirgrahamuk · March 6, 2023, 7:47pm

I think if you slide unique() inside the summarise , you can keep the summarise and get the same result; and at least on this example data, I dont see the same warnings.

mydf %>% 
  group_by(year) %>% 
  mutate(count_by_year = n()) %>% # total for each year
  group_by(year, result) %>%  
  summarise(count_year_res = n(), # counting of positives and negatives in each year
            perc = unique(count_year_res/count_by_year*100))

system · March 27, 2023, 7:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.