For each group get another group by comparing two other columns

robinkohrs · May 3, 2021, 2:34pm

Hi everyone,

I have a little problem which I simply cannot get solved. I have a dataframe where I want to compare to columns for each group. When the two columns match, I want to extract all the rows from a fourth colum that are one group as well. This sounds super strange, so I tried to create an example. It is not the best, but I think it shows the point more or less:)

set.seed(2)
dr = seq(as.Date("2020-01-01"),as.Date("2021-01-01"), by = "day")
df = data.frame(
  date.x = dr,
  date.y = sample(dr, size = length(dr), replace = T)
)
g1 = rep(seq(1,length(dr), by = 10), each = 10)
g2 = rep(seq(1,length(dr), by = 5), each = 5)
df["g1"] = g1[1:nrow(df)]
df["g2"] = g2[1:nrow(df)]

       date.x     date.y g1 g2
1  2020-01-01 2020-12-06  1  1
2  2020-01-02 2020-07-16  1  1
3  2020-01-03 2020-09-18  1  1
4  2020-01-04 2020-09-29  1  1
5  2020-01-05 2020-12-14  1  1
6  2020-01-06 2020-07-22  1  6
7  2020-01-07 2020-10-23  1  6
8  2020-01-08 2020-06-26  1  6
9  2020-01-09 2020-03-15  1  6
10 2020-01-10 2020-05-10  1  6

So I want to comple date.x and date.yfor each group of g1. When they match I want to look at the column g2and extract all the same values.
So I received the tip about doing it like this:

df %>% 
  group_by(g1, g2) %>% 
  filter(
    g2 == g2[date.x == date.y]
  )

But this does not work. Maybe someone has a tip here:) Thanks a lot already!

robinkohrs · May 3, 2021, 3:08pm

I am sorry, I think my quesion is really bad. g1and g2do not need to be the same. I just want to check if date.x is the same as date.yfor each group of g1. And the problem is also that the filter(date.x == date.y)will always make that it only returns one row. But for example if date.x == date.yis TRUE than I want to return all the rows which have the same values in g2for the row where date.xand date.ymatch.

I am really sorry, I think it did not really make things clearer...

Yarnabrina · May 4, 2021, 6:48am

This is my understanding:

you want to get g2 value of that row where date.x matches date.y
then you want all rows where g2 has that same value.

If it's right, it'll do the job:

df[df["g2"] == df[df["date.x"] == df["date.y"], "g2"],]

But to be frank, I'm not confident I understood you. May be someone else will help you out.

robinkohrs · May 4, 2021, 8:13am

Thank you so much for your help! And sorry about my bad problem description...
I kind of figured it out. This does more or less what I want it to do:)

df %>% 
  group_by(g1, g2) %>% 
  mutate(
   new_var = ifelse(any(date.x == date.y), g2, NA) 
  ) %>% 
  filter(
    !is.na(new_var)
  )

system · May 25, 2021, 8:14am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.