Manipulating stacked datasets

For the below stacked 5 datasets, how can I get the average of y by common values of x1 and x2? Note that the size of each data vary (either 4 or 5). This is like merging the 5 datasets by x1, x2 and evaluating the mean of y
Thanks

Data x1 x2 y
1 0 0 1
1 1 0 0.96580995
1 4 1 0.9488297
1 6 1 0.93190954
2 0 0 1
2 2 0 0.96580995
2 4 1 0.9488297
2 5 1 0.93190954
2 6 1 0.85664971
3 0 0 0.96580995
3 2 0 0.79060542
3 4 0 0.77424697
3 5 1 0.76613367
3 7 10 0.75801814
4 0 0 1
4 2 0 0.9488297
4 4 1 0.93190954
4 5 1 0.70995093
5 0 0 1
5 2 0 0.67061919
5 4 1 0.66277152
5 5 1 0.65492144
5 6 1 0.63914986

Assuming that your data is in a data frame named df, you can load the dplyr library and run the following:

df2 <- df |> group_by(x1, x2) |> summarize(y_mean = mean(y))

The resulting data frame df2 looks like this:

# A tibble: 8 × 3
# Groups:   x1 [7]
     x1    x2 y_mean
  <dbl> <dbl>  <dbl>
1     0     0  0.993
2     1     0  0.966
3     2     0  0.844
4     4     0  0.774
5     4     1  0.873
6     5     1  0.766
7     6     1  0.809
8     7    10  0.758

Thanks so much for the fast response. This is doing exactly what I wanted

As a follow-up to my previous question, how can I get the minimum of y within groups of x1 and x2? for example, the results should be as follows

     x1          x2        min_y
    0-2          0          x.xxx
    3-4          0          x.xxx
    0-2          1          x.xxx
   etc           etc

First create a new column with the x1 grouping (let's call it x1g). Then repeat the previous solution but group by x1g and x2 (or x1g and x2g if you do groupings on both variables) and change mean(y) to min(y).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.