Group_By function not giving me the summary stats for each group, just the overall

technocrat · March 25, 2021, 9:54pm

Hi, and welcome. See the FAQ: How to do a minimal reproducible example reprex for beginners for tips on how to attract more answers. The data in this case were easy enough to synthesize, but having to do so creates friction.

Two suggestions are embedded in the code:

Use short variable names (easier to type); when time come to present results, the headings can be easily changed.
Use whitespace freely; easier to spot inconsistencies. (And prefer spaces over tabs and never mix them.)

suppressPackageStartupMessages({
  library(dplyr)
})

# create synthetic data
set.seed(42)
year_basket <- sample(2000:2020,100, replace = TRUE)
set.seed(137)
fee_basket <- sample(6000:9000,100)
synthetic <- tibble(Year = year_basket, Fee = fee_basket)

# group by Year and summarize stats

synthetic %>% 
  arrange(Year) %>%
  group_by(Year) %>% summarize(
    Count = n(), 
    Mean = mean(Fee), 
    SD = sd(Fee), 
    Median = median(Fee), 
    IQR = IQR(Fee))
#> # A tibble: 21 x 6
#>     Year Count  Mean    SD Median   IQR
#>    <int> <int> <dbl> <dbl>  <dbl> <dbl>
#>  1  2000     4 7827  1033.  8144. 1186 
#>  2  2001     5 7641.  855.  8112   486 
#>  3  2002     5 7601.  997.  7714    75 
#>  4  2003     9 7160.  803.  7317  1220 
#>  5  2004    10 7806.  717.  7864  1085 
#>  6  2005     4 7480.  308.  7548.  321.
#>  7  2006     3 6692.  455.  6735   453 
#>  8  2007     6 7471.  471.  7586.  579 
#>  9  2008     5 7146. 1049.  7477  1482 
#> 10  2009     5 7190.  451.  6930   322 
#> # … with 11 more rows