Hi everybody
I would like to find matches between two groups of variables and create a binary variable that indicates whether such matches exist or not.
Example dataset:
df <- data.frame(groupA_1 = c(1, 2, 1, 2, 1),
groupA_2 = c(2, 3, 2, 3, 2),
groupA_3 = c(3, 4, 5, 6, 7),
groupB_1 = c(1, 8, 3, 2, 1),
groupB_2 = c(10, 11, 12, 13, 14),
groupB_3 = c(15, 16, 17, 18, 19))
If any of the groupA-variables have a match in any of the groupB-Variables, the new column match is assigned the value 1, otherwise the value is 0.
The final dataset should look like this:
groupA_1 groupA_2 groupA_3 groupB_1 groupB_2 groupB_3 match
1 1 2 3 1 10 15 1
2 2 3 4 8 11 16 0
3 1 2 5 3 12 17 0
4 2 3 6 2 13 18 1
5 1 2 7 1 14 19 1
My attempt to achieve this looks something like this:
df <- df %>%
mutate(match = case_when((groupA_1 == groupB_1 |
groupA_1 == groupB_2 |
groupA_1 == groupB_3) |
(groupA_2 == groupB_1 |
groupA_2 == groupB_2 |
groupA_2 == groupB_3) |
(groupA_3 == groupB_1 |
groupA_3 == groupB_2 |
groupA_3 == groupB_3)) ~ 1,
TRUE ~ 0)
Apart from this not working (I have not looked too much into why not, since I do not intend to keep the code in anyway close to this if at all possible), its also extremely susceptible to mistakes and terribly inefficient.
My actual dataset has a lot more variable groups and I would have to do this repeatedly for all of them. I feel like there should be an easier and more efficient way to achieve this. Does anyone have a solution?
Thank you in advance.