Hi, and welcome. See the FAQ: How to do a minimal reproducible example reprex
for beginners for tips on how to attract more answers. The data in this case were easy enough to synthesize, but having to do so creates friction.
Two suggestions are embedded in the code:
- Use short variable names (easier to type); when time come to present results, the headings can be easily changed.
- Use whitespace freely; easier to spot inconsistencies. (And prefer spaces over tabs and never mix them.)
suppressPackageStartupMessages({
library(dplyr)
})
# create synthetic data
set.seed(42)
year_basket <- sample(2000:2020,100, replace = TRUE)
set.seed(137)
fee_basket <- sample(6000:9000,100)
synthetic <- tibble(Year = year_basket, Fee = fee_basket)
# group by Year and summarize stats
synthetic %>%
arrange(Year) %>%
group_by(Year) %>% summarize(
Count = n(),
Mean = mean(Fee),
SD = sd(Fee),
Median = median(Fee),
IQR = IQR(Fee))
#> # A tibble: 21 x 6
#> Year Count Mean SD Median IQR
#> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 4 7827 1033. 8144. 1186
#> 2 2001 5 7641. 855. 8112 486
#> 3 2002 5 7601. 997. 7714 75
#> 4 2003 9 7160. 803. 7317 1220
#> 5 2004 10 7806. 717. 7864 1085
#> 6 2005 4 7480. 308. 7548. 321.
#> 7 2006 3 6692. 455. 6735 453
#> 8 2007 6 7471. 471. 7586. 579
#> 9 2008 5 7146. 1049. 7477 1482
#> 10 2009 5 7190. 451. 6930 322
#> # … with 11 more rows