Making a lag variable based on another grouping variable

Ruben222 · August 25, 2020, 2:18pm

Hello, I'm looking for some help with constructing lag variables on my dataset. I have data which is structured like the following

data.frame(
  id = c(1,1,1,2,2,2,3,3,3),
  Prestige = c(1,1,1,0,0,0,0,0,0),
  Y = c(5,6,7,4,3,5,2,6,7)
)

What I am looking to do is create a lag variable which contains the Y value for the Prestige ID from the previous round. This is the desired output.

data.frame(
  id = c(1,1,1,2,2,2,3,3,3),
  Prestige = c(1,1,1,0,0,0,0,0,0),
  Round = c(1,2,3,1,2,3,1,2,3) , 
  Y = c(5,6,7,4,3,5,2,6,7) , 
  Prev_Prestige = c(NA,5,6,NA,5,6,NA,5,6)
)

I have been trying to use the lag function from tidyverse but I am unsure how to specify in mutate that I want the Prev_Prestige column to be the same for every id and always be equal to the previous behaviour of the Prestige id. For example, is it possible to use filter or something similar within mutate or make some use of case_when?

Also, the solution needs to be based on the Prestige column as I will have many cases of this within my data. IE many different id values will have Prestige = 1.

Many thanks

siddharthprabhu · August 25, 2020, 2:52pm

The logic for populating Prev_Prestige isn't clear to me. Would you mind providing a more detailed explanation? Also, is the Round variable present in your data or does it need to be generated?

Ruben222 · August 25, 2020, 3:57pm

Thanks for your reply.

Sure, no problem. Concerning Round, that's my bad, I forgot to place it in my first data frame (sorry about that!). The correct initial data frame would be this.

data.frame(
  id = c(1,1,1,2,2,2,3,3,3),
  Prestige = c(1,1,1,0,0,0,0,0,0),
  Y = c(5,6,7,4,3,5,2,6,7) , 
  Round = c(1,2,3,1,2,3,1,2,3)
)

This data comes from a pilot experiment in which 1 participant per group is assigned as the "prestigious" participant. Y in this case is their responses in the experiment across rounds. What I am looking to do is create a new variable (Prev_Prestige) which simply records what the "Prestigious" participant answered in the previous Round. For example, in Round 1 the list would be NA because it was the first round. Then, the Round 2 value for Prev_Prestige would be whatever the prestigious participant responded in Round 1. Round 3 would be equal to whatever they responded in Round 2 and so on. Therefore, the Prev_Prestige list would be identical for all 3 of these participants.

I hope that makes sense? Please do let me know if anything still isn't clear and thank you for taking the time to help me.

siddharthprabhu · August 25, 2020, 4:22pm

That helps. And what constitutes a group? Is it all id's in a given round? I presume that in each of these groups, only 1 id will have a value of 1 for Prestige?

siddharthprabhu · August 25, 2020, 4:36pm

Here's my first attempt at the puzzle. But I suspect this will not be robust enough to handle cases where more than one id has Prestige == 1.

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  id = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  Prestige = c(1, 1, 1, 0, 0, 0, 0, 0, 0),
  Y = c(5, 6, 7, 4, 3, 5, 2, 6, 7),
  Round = c(1, 2, 3, 1, 2, 3, 1, 2, 3)
)

prestige_participants <- df %>% 
  filter(Prestige == 1) %>% 
  mutate(Prev_Prestige = lag(Y)) %>% 
  select(Round, Prev_Prestige)

left_join(df, prestige_participants)
#> Joining, by = "Round"
#>   id Prestige Y Round Prev_Prestige
#> 1  1        1 5     1            NA
#> 2  1        1 6     2             5
#> 3  1        1 7     3             6
#> 4  2        0 4     1            NA
#> 5  2        0 3     2             5
#> 6  2        0 5     3             6
#> 7  3        0 2     1            NA
#> 8  3        0 6     2             5
#> 9  3        0 7     3             6

^{Created on 2020-08-25 by the reprex package (v0.3.0)}

Let me know where it fails and we can improve it.

Ruben222 · August 25, 2020, 5:35pm

Thanks for this, I had also wondered about just making a separate data frame and attempting to left join them. I will give this a try tomorrow and let you know if this will work.

In regards to your other questions, a group is 4 participants that played the experiment together and id will be unique for every participant. In other words, every value of Group is unique and within each group there are 4 unique id's. Within each group one of these will have a value of 1 for Prestige.

In the interest of keeping the reprex as simple as possible I omitted the Group variable I also have for my data. Though using it should be simple enough as I can just add group_by(Group) to your suggestion.

Cheers.

Ruben222 · August 26, 2020, 7:43am

Hi, thanks again for your replies. Your solution worked great. For clarity, all I needed to do was just add a group_by(Group) in the function and it did exactly what I needed it to

system · September 2, 2020, 7:43am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.