The left and right of the screenshots each have two columns, with the first being an identifier, apparently, and the second being an integer representing some variable. The visible IDs appear identical. If they truly are, all the way down, merge
isn't necessary. For two data frames with identical dim()
, with one variable identical to the other, you would just need Left[3] <- Right[2]
.
However, some identifiers present in Left[1]
may be missing in Right[1]
and vice-versa. And the first columns need be sorted in the same order.
So, doing this with a joining operation is the way to go. If you are sure that no IDs are missing, you can use merge
. Otherwise one of the {dplyr}
joins is easier.
dplyr’s
inner_join()
,left_join()
,right_join()
, andfull_join()
add new columns fromy
tox
, matching rows based on a set of “keys”, and differ only in how missing matches are handled. They are equivalent to calls tomerge()
with various settings of theall
,all.x
, andall.y
arguments. The main difference is the order of the rows:
- dplyr preserves the order of the
x
data frame.merge()
sorts the key columns.
Reference
See the FAQ: How to do a minimal reproducible example reprex
for beginners for how to include representative data and the code used. Sometimes the source of an error message like yours is obvious, especially when the error message is as clear as this one. If merge
is truly being given what's shown in the screenshot there is no third column.
Here's a reprex
with made-up data to illustrate how joins work.
IDs <- paste0("ID",1:100)
var1 <- sample(1:1000,100)
var2 <- sample(1:1000,100)
Left <- data.frame(ID = IDs, var1 = var1)
Right <- data.frame(ID = IDs, var2 = var2)
Combined <- merge(Left,Right) # resorts ID column
suppressPackageStartupMessages({
library(dplyr)
})
# these keep ID columns in order
lj <- left_join(Left,Right)
#> Joining, by = "ID"
rj <- right_join(Left,Right)
#> Joining, by = "ID"
ij <- inner_join(Left,Right)
#> Joining, by = "ID"
fj <- full_join(Left,Right)
#> Joining, by = "ID"
identical(Combined,lj)
#> [1] FALSE
identical(lj,rj)
#> [1] TRUE
identical(rj,ij)
#> [1] TRUE
identical(ij,fj)
#> [1] TRUE
Created on 2022-12-01 by the reprex package (v2.0.1)