> df %>% group_by(prot_id) %>%
+ mutate(mean_s = mean(c(S1,S2,S3))) ->
+ out_df
>
> as.data.frame( out_df )
prot_id S1 S2 S3 mean_s
1 A 29.46594 28.92563 30.61345 29.98498
2 B 29.47469 32.36655 30.33432 29.98498
3 C 31.43226 30.52121 30.34889 29.98498
4 D 30.30061 28.39447 30.93854 29.98498
5 E 29.62858 29.72010 29.11866 29.98498
6 F 30.50273 30.91271 29.03644 29.98498
7 G 28.77819 31.73520 29.47392 29.98498
8 H 28.61218 30.34214 30.82327 29.98498
9 I 28.83311 30.00083 30.07306 29.98498
10 J 30.33777 29.50011 29.00388 29.98498
>
Probably irrelevant, but the only difference today doing this and other days is that I'm running another process in the background outside of R that is using 125 GB of my 128 GB of RAM.
I am not sure if `mean(c(S1, S2, S3))` is correct `dplyr` (I would have thought it would be `mean(S1, S2, S3)`; but no error is being throw so notation seems to be allowed). That being said I get a bunch of weird results with this code pattern.
Sorry about the above incorrect note- did not mean to mislead. Leaving it up to avoid further confusion. Obviously only mean(c(S1, S2, S3)) is the correct notation (in mean(S1, S2, S3) all S2 and S3 are lost in the ...).
mean() is a base R function that takes a single vector x, with ... being passed to to methods. I'm not sure why you think it would have a different syntax within dplyr.
Your code simply returns the mean of a single number; you'll notice that mean_s is identical to S1 in your code.
Thanks John and James that was really helpful. I think I must have had a conflicting package loaded that caused the problem as I started a new session to run your code John and everything started working.
Take home lessons being to have the conflicted package loaded and that I don't need c() in my grouped means.
This particular specification of the problem in your reprex is an example of "average of each row" rather than a "grouped mean," since the groups are already fully specified by their row. For this set of data, where each prot_id exists in only one row, the group_by(prot_id) part is redundant.
It's possible to get the same result by using a grouped mean if you first gather your data into "tidy data" form, and then summarize for the mean.
df %>%
gather(S_type, value, S1:S3) %>%
group_by(prot_id) %>%
summarise(mean_s = mean(value))
#> # A tibble: 10 x 2
#> prot_id mean_s
#> <chr> <dbl>
#> 1 A 29.7
#> 2 B 30.7
#> 3 C 30.8
#> 4 D 29.9
#> 5 E 29.5
#> 6 F 30.2
#> 7 G 30.0
#> 8 H 29.9
#> 9 I 29.6
#> 10 J 29.6