How can I group by variable in one dataframe and use a different variable to look into another dataframe and perform an operation?

I have two dataframes. One is called y25.29 , where head(y25.29) gives:

    pidp       quantile
1 68575973        61
2 69180553        10
3 70003965        43
4 70069925        13
5 70432365        18
6 70510565        35

with # of rows = 310.

The other is called data.07 , where head(data.07) gives:

    pidp        fiyrl 
1 68002045     2000.000
2 68035365     26000.000
3 68058485     0.000
4 68058489     0.000
5 68064605     24795.568
6 68064609     8868.146
      

and # of rows = 5,883.

What I'm looking to do is group all rows in y25.29 by quantile, and, using pidp as an identifier, create a dataframe that returns the average of fiyrl from data.07 of those individuals that have the same quantile in y25.29 . So, for example, all individuals that have quantile = 34 will have a value of fiyrl . The new dataframe will hopefully have a column with all the quantiles and a column with the average of fiyrl that corresponds to all individuals in that quantile.

I have no idea on where to start. Thanks!!

I think you want to do this.

library(dplyr)
AvgFIYRL <- inner_join(data.07, y25.29, by = "pidp") %>%
   group_by(quantile) %>%
   summarize(Avg = mean(fiyrl)

Does that work? I could not test the code, so I hope I did not make silly mistakes.

1 Like

That worked perfectly. I had to use merge instead of inner join because I got an error message. Same result though.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.