I've attach()ed my data set so I'm able to refer to my variables directly by name.
the dataset contains people from all over the world and their 100M score (time).
variables 'country', 'age' and 'time'.
Say I wanted to output the mean 'time' of individuals from 'Canada' & 'US' who are under 20 as its own variable, i.e., I want to be able to just mean() to get this product if that's possible.
is filter() the function I'm looking for? I've been playing around but seems to just be outputting the entire subsetted rows and I cant figure out how to do this.
I've attached some code that does what I think you're looking for!
The first chunk just generates some data as you haven't provided any in your question.
The second does a group_by / summarise workflow, calculating the mean and standard deviation for each combination of country and age.
The third chunk uses filter to filter for US & Canada athletes who are 20 or under, pulls the time column (turns the column into a vector), then calculates the mean.
library(dplyr, warn.conflicts = F)
# generate data
set.seed(123)
dat <- tidyr::crossing(country = c("UK", "US", "AU", "CA", "NZ"),
age = c(10, 20, 30, 40, 50),
runner_id = LETTERS) %>%
mutate(time = rnorm(n = 650, mean = 200, sd = 15))
# get average times for all categories
avg_times <- dat %>%
group_by(country, age) %>%
summarise(time_mean = mean(time),
time_sd = sd(time))
#> `summarise()` has grouped output by 'country'. You can override using the
#> `.groups` argument.
head(avg_times)
#> # A tibble: 6 x 4
#> # Groups: country [2]
#> country age time_mean time_sd
#> <chr> <dbl> <dbl> <dbl>
#> 1 AU 10 199. 14.7
#> 2 AU 20 203. 12.4
#> 3 AU 30 200. 15.1
#> 4 AU 40 204. 11.6
#> 5 AU 50 196. 11.7
#> 6 CA 10 198. 17.8
# get 20 and under from CA and US
na_20_mean_time <- dat %>%
filter(country %in% c("CA", "US"),
age <= 20) %>%
pull(time) %>%
mean()
na_20_mean_time
#> [1] 200.9996