I need to subset a dataframe which has common combination of values of a certain select number of columns in another dataframe.
So lets say:
df1<- data.frame(
key1= c('a', 'b', 'c', 'd'),
key2=c(1,2,3,4)
var3=........
var4=.......
)
df2<-data.frame(
key1= c('a', 'b', 'e', 'f'),
key2=c(1,2,4,5)
var3=........
var4=.......
)
So i need only those values observations of df2 which have common combinations of key1 and key2 in df1, so like common_df<-data.frame(
key1=c('a','b'),
key2=c(1,2),
var3=....,
var4=......
)
How do i do that.
merge/join your sets based on the common key.
with an inner join which preserves only rows that match in both sets your issue is solved in one step, plus some cleanup to say where they have variables in common like var3 , you only want to keep what came from one side
library(tidyverse)
df1 <- data.frame(
key1 = c("a", "b", "c", "d"),
key2 = c(1, 2, 3, 4),
var3 = c(1, 2, 3, 4)
)
df2 <- data.frame(
key1 = c("a", "b", "e", "f"),
key2 = c(1, 2, 4, 5),
var3 = c(1, 2, 5, 6)
)
# So i need only those values observations of df2 which have common combinations of key1 and key2 in df1, so like
common_df <- data.frame(
key1 = c("a", "b"),
key2 = c(1, 2),
var3 = c(1, 2)
)
result <- inner_join(df1,
df2,
by = join_by(key1, key2),
suffix = c("", "_y")
) |>
select(-ends_with("_y"))
identical(common_df,result)
# TRUE
For this I often concatenate multiple key values to a single key value with paste0 in both tables, and use that to find the matching rows. The join approach also is good, though this would be a little less code, and does not copy any data in memory. This just subsets the rows of df2 in which that row's key value also exists in the df1 key values.
Base R:
common_df <- df2[which(paste0(df2$key1,df2$key2) %in% paste0(df1$key1,df1$key2)), ]