Combine columns with same names while leaving columns with different names present.

aiyer1217 · May 16, 2022, 1:01pm

Hi everyone,

Supposing I have two data frames, such as this:

a <- c(10,20,30,40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5, 8, 10, 7)
e <- c(2.4, 5, 10, 7)

df1<-data.frame(a,b,c)
df2<-data.frame(b,c,d,e)

Is there a function that will create a new data frame combining the df1 and df2 where it automatically detects columns that overlap, combines them, but then leaves the unique ones (in this case, just e) in the data frame as well?

Important! I know you can do this via the merge function where you pass the overlapping column names into the "by" argument, but it can be a bit tiresome and error proof to manually input all the overlapping columns when there are many of them!

Thanks in advance :).

FJCC · May 16, 2022, 2:19pm

Is this what you want?

library(dplyr)
a <- c(10,20,30,40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5, 8, 10, 7)
e <- c(2.4, 5, 10, 7)
df1<-data.frame(a,b,c)
df2<-data.frame(b,c,d,e)
inner_join(df1, df2)
Joining, by = c("b", "c")
   a           b     c    d    e
1 10        book  TRUE  2.5  2.4
2 20         pen FALSE  8.0  5.0
3 30    textbook  TRUE 10.0 10.0
4 40 pencil_case FALSE  7.0  7.0

I do not understand why you say that only e is a unique column. Column a only appears in df1 and columns d and e only appear in df2. Thus the join is done on the shared columns b and c.

system · June 6, 2022, 2:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.