Suppose we need to calculate the relative abundance of positives and negatives for each individual year.
The procedure I used to use is as follows:
library(tidyverse)
# fake data
set.seed(1)
mydf <- tibble(
id = 1:80,
year = sample(2000:2010, 80, replace = T),
result = sample(c("positive", "negative"), 80, replace = T)
)
# code
mydf %>%
group_by(year) %>%
mutate(count_by_year = n()) %>% # total for each year
ungroup() %>%
group_by(year, result) %>%
summarise(count_year_res = n(), # counting of positives and negatives in each year
perc = count_year_res/count_by_year*100) %>% # relative abundance
unique()
To avoid the following message I can use reframe
and everything works as expected.
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame
and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Is there a better way to achieve the result without having to use reframe()
and then unique()
?
Which method do you usually use (to have a better code)?
Result
> mydf %>%
+ group_by(year) %>%
+ mutate(count_by_year = n()) %>%
+ ungroup() %>%
+ group_by(year, result) %>%
+ reframe(count_year_res = n(),
+ perc = count_year_res/count_by_year*100) %>%
+ unique()
# A tibble: 21 × 4
year result count_year_res perc
<int> <chr> <int> <dbl>
1 2000 negative 6 85.7
2 2000 positive 1 14.3
3 2001 negative 3 60
4 2001 positive 2 40
5 2002 negative 2 40
6 2002 positive 3 60
7 2003 negative 3 50
8 2003 positive 3 50
9 2004 negative 4 57.1
10 2004 positive 3 42.9
# … with 11 more rows
# ℹ Use `print(n = ...)` to see more rows