I am recently on board of Studio coming from SPSS, and I ask for help.
I am working with two different data.frames (x and y), both of them contains around 300 columns/variables. I learn how to do a merging based on the same ID code from the subject ( merge (x, y, by = "ID", all= TRUE) but it merge both df with a total of 600 columns. Of course, this both df contains others similar variables, I was wondering if:
Is there a way to identify the shared columns/variable between both df?
How to merge all the same columns/variables of both df?
Typically, when you merge data frames you're looking to bring together different columns.
In this case, you probably want to isolate the common column names and check if they have the same information. For those identical columns, identify them and only merge from x to prevent redundant information.
common <- setdiff(intersect(names(x), names(y)), "id")
iden <- sapply(common, function(mycol) identical(x$mycol, y$mycol))
cors <- sapply(common, function(mycol) identical(x[, mycol], y[, mycol]))
z <- merge(
# merge of different columns
merge(
x[, setdiff(names(x), common[iden]],
y[, setdiff(names(y), common[iden]],
by = "id"
),
# merge of common columns
x[, c("id", common[iden])],
by = "id"
)