Calculating a mean from a specific population within a data set

l.harper · December 17, 2018, 1:23pm

Hello, I am very new to R. I would like to calculate a mean for a variable which is limited to a specific group of data which are defined by their values within another variable in the data set. How could I do this?

for example: "head_circ" in individuals whom are only in in "exp_group" 3

Thanks for any advice

andresrcs · December 17, 2018, 1:29pm

Please share a reproducible example, so we can better help you

reprex : What’s a repr oducible ex ample ( reprex for short) and how do I do one?

stkrog · December 17, 2018, 1:50pm

mean(your_data$head_circ[your_data$exp_group == 3])

l.harper · December 17, 2018, 2:24pm

Thats great, thanks. Is there also a way to perform the calculations even if there is missing data for some of the individuals. In this case case taking the mean of the available data?

andresrcs · December 17, 2018, 2:55pm

mean(your_data$head_circ[your_data$exp_group == 3], na.rm = TRUE)

l.harper · December 17, 2018, 9:11pm

I have then tried to use the same principle to generate a 95% CI

confint(your_data$head_circ[your_data$exp_group == 3], level = 0.95, na.rm = TRUE)

I then got the following error: Error: $ operator is invalid for atomic vectors
How should I edit the code?

andresrcs · December 17, 2018, 10:48pm

confint()function is aplicable for a fitted model object, not a numeric vector. I think you are trying to get a confidence interval for your mean, for a simple approach you can use a normal distribution, something like this.

filtered_data <- your_data$head_circ[your_data$exp_group == 3]
m <- mean(filtered_data, na.rm = TRUE)
s <- sd(filtered_data, na.rm = TRUE)
n <- length(filtered_data)
error <- qnorm(0.975)*s/sqrt(n)
left <- m-error
right <- m+error

l.harper · December 18, 2018, 7:45am

Thanks thats excellent. When I then search for the outliers outside of this confidence interval there are exactly 10 individuals on either side of the confidence interval on analyses of 8 different parameters I have looked at using the following subset analysis, can this be correct?

subset(data_set, variable < left)
subset(data_set, variable > right)

andresrcs · December 18, 2018, 12:41pm

Are you recalculating the left and right limits for each parameter? Remember that they where calculated for the mean of head_circ where exp_group == 3 only.

On the other hand, this approach might not be theoretically correct, if your are just doing exploratory analysis, this is fine, but be careful if your goal is to draw some conclusion from this analysis.

system · December 25, 2018, 12:44pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.