Slicing and Extracting Data

I currently have a data frame labeled a:j that have numeric values. Then an 11th column that is labeled "new" where the first 50 values in column "new" are 'x' and the last 50 values are 'y'.
Here is a mini made up example of the data frame format:

a b c d e f g h i j new
1 1 1 1 1 1 1 1 1 x
2 2 2 2 2 2 2 2 2 x
3 3 3 3 3 3 3 3 3 y
4 4 4 4 4 4 4 4 4 y

I am needing to get the average from column "d's" numeric values but only in which column "new" has the value of 'x'.

Assuming that the correct code for this task must specify the column "new" and the value of 'x' (or the negation of y) somewhere in the line of code, my coding format has to be wrong. My line of code so far leaves this out and simply highlights where the 'x' in the column "new" would be found. But, in a messier data frame, I would need to be able to specify.

My line of code:

mean(x_dfs$d[1:50])

a <- seq(1:10)
set.seed(137)
d <- rnorm(seq(1:10))
new <- sample(c("y","n"),10,replace = TRUE)
dat <- data.frame(a = a, d = d, new = new)
dat
#>     a          d new
#> 1   1  0.3835199   y
#> 2   2  1.3669469   n
#> 3   3 -0.3452020   n
#> 4   4  1.3491541   y
#> 5   5  0.3029958   n
#> 6   6  0.5207242   n
#> 7   7  1.1434793   n
#> 8   8  0.2162364   n
#> 9   9  1.1301026   n
#> 10 10 -0.6002803   y
mean(dat[which(dat$new == "y"),2])
#> [1] 0.3774646

Created on 2020-09-29 by the reprex package (v0.3.0.9001)

I made some adjustments to your final line of code and that worked out well. Thanks! It mentioned to not store in a value which I failed to mention for the length of the post. As for the initial lines of code you've entered, the set.seed(137) has me confused the most. But, it looks like you are essentially moving the data around into an easier format for extracting key elements. Is that kind of the logic behind it?

set.seed() was used for reproducibility of the d values in the toy data set. It plays no other role.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.