how to find conditional mean?

GenM · March 19, 2019, 4:59pm

Hi all. From yesterday I was trying to find the conditional mean of the variable. In my case, I need two variables. One is continuous (positive), and the second one is binomial (yes=1, no=0). So, I have to find the mean of the first variable (continuous), if the second variable will equal to 1 (yes). And repeat the operation for the the same variable if the second variable is no (0). Also, I need to include na.rm = TRUE so that error didn't appear in my command line, because there are gaps in the table not filled in (NA). I have tried some commands, but they seem to be totally incorrect. Here are some of my attempts (mydata - data name, it was subsetted from the main data, because I needed only one year for all variables among all given years, x1 - continuous variable, x2 - binomial variable)

Part 1

if(mydata$x2 == 1) w <- mydata$x1
mean(w)
error: the condition has length > 1 and only the first element will be used

Part 2

mean(mydata[mydata$x2>0, "x1"])
Answer: [1] NA.

I don't know also how to integrate na.rm = TRUE argument here.
Please, help. Thanks.

Yarnabrina · March 19, 2019, 5:08pm

Why you're getting errors

mydata$x2 is a vector, and you can't use it for comparison this way.
Try with mean(mydata[mydata$x2>0, "x1"], na.rm = TRUE)

Alternative

Use the by function.

Let me illustrate by an example:

dataset <- data.frame(continuous = rnorm(n = 10),
                      binary = sample(x = 0:1, size = 10, replace = TRUE))

dataset
#>     continuous binary
#> 1  -0.01978487      0
#> 2  -1.14185292      0
#> 3   0.20931787      0
#> 4  -0.63720730      0
#> 5   1.07750407      1
#> 6  -1.59274225      0
#> 7  -0.48722740      1
#> 8  -0.64151044      0
#> 9  -0.64111755      0
#> 10  0.99598287      1

# your method
mean(dataset[dataset$binary == 1, 1])
#> [1] 0.5287532
mean(dataset[dataset$binary == 0, 1])
#> [1] -0.6378425

# using by
by(data = dataset$continuous, INDICES = dataset$binary, FUN = mean)
#> dataset$binary: 0
#> [1] -0.6378425
#> -------------------------------------------------------- 
#> dataset$binary: 1
#> [1] 0.5287532

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

Hope this helps.

PS

Please ask your future questions in form of a reproducible example. In this case, it was not too difficult to understand what can be going wrong, but more than often it's not the case. You can go through this great post to know how to make a reprex:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

andresrcs · March 19, 2019, 5:15pm

If you are not constrained to use base R, another approach would be to use a tidyverse based solution like this one

set.seed(123)
library(dplyr)

dataset <- data.frame(continuous = rnorm(n = 10),
                      binary = sample(x = 0:1, size = 10, replace = TRUE))
dataset %>% 
    group_by(binary) %>% 
    summarise(continuos_mean = mean(continuous, na.rm = TRUE))
#> # A tibble: 2 x 2
#>   binary continuos_mean
#>    <int>          <dbl>
#> 1      0         -0.566
#> 2      1          0.235

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

Here you can find a free online book that teaches how to use the tidyverse tools.

GenM · March 20, 2019, 8:33am

Thanks, dear friend. You are helping me second time with R. I will take into account your advice. Thanks very much again.

system · April 10, 2019, 8:33am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.