merging two data.frames

,

I have two data.frames, they each have an indexing column, but these do not match identically across data.frames (i.e. some values are in one and not the other and vice verse). They each also have a list column. I am trying to merge them using all=T (i.e. a full merge). However, I'm getting inconsistent behaviour in the output. If the index is missing for the first data.frame, then the list value is NULL, but if it is missing for the second data.frame, then the list value is NA

tbl1 <- data.frame(
  x = c(1,2)
)
tbl1$y <- list(c("a","b"),c("d"))

tbl2 <- data.frame(
  x = c(1,3)
)
tbl2$z <- list(c("e"),c("f","g","h"))
> tbl1
  x    y
1 1 a, b
2 2    d
> tbl2
  x       z
1 1       e
2 3 f, g, h

> merge(tbl1,tbl2,by="x",all=T)
  x    y       z
1 1 a, b       e
2 2    d      NA
3 3 NULL f, g, h

Is there a way to combat this explicitly within the merge() function? It would also be easier for the missing output to be NA in both columns as then I can just use is.na() rather than vapply() & is.null() to get them.

(PS I am aware that I can use dplyr::full_join() but I am creating a package and would like to minimise dependencies)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.