I have two dataframes with same number of columns (number of rows can differ). Both dataframe contains an unique identifier column. I want to merge these dataframe as such that unique identifier matched column are binded in one row together and if the unique identifier is not in any one of these then append at the end of that specific dataframe.
I tried many non-tidyverse solution and nothing worked out. Please help if this can be done using tidyverse.
I’m afraid I’m having trouble visualizing your desired output here. It’s much easier to understand coding questions by looking at code, rather than reading a verbal description. I think you’ll be more likely to get helpers to take an interest in this question if you can make a small reproducible example demonstrating what you mean.
No need to include your actual data frames — small example data frames with a similar structure will probably be easier to understand. It would also be a big help if you can include an example of what the desired output should look like.
Take a look at dplyr::full_join or data.table::merge. You'll need to convert NA's afterwards to 0, but that shouldn't be too difficult. Also, you probably don't need 2 columns D and D2 since it's a key you merge with meaning that you'll only have one of them in the result.
You know your code better, of course, but keep in mind that both dplyr::full_join and data.table::merge won't return you a duplicated key column. Also, both of the columns contain identical (not similar) information, so once you've checked it with one column, you've checked it with second as well.
full_join()
return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.
So it'll keep track of those things by nature of having NA's in some of the columns.
I would show you how it looks in code, but it's not possible right now to copy-paste your example into my session. Reprex link @jcblum shared above can help you create one.
data_1 <- read.table(text = "1 2 3 A
1 2 3 B
1 2 3 C
1 2 3 D
1 2 3 E", stringsAsFactors = FALSE)
colnames(data_1) <- c('A','B','C','D')
data_2 <- read.table(text = "
1 2 3 D
1 2 3 E
1 2 3 F
1 2 3 A
1 2 3 B
1 2 3 P", stringsAsFactors = FALSE)
colnames(data_2) <- c('A1','B1','C1','D1')
dplyr::full_join(data_1, data_2, by = c("D" = "D1"))
#> A B C D A1 B1 C1
#> 1 1 2 3 A 1 2 3
#> 2 1 2 3 B 1 2 3
#> 3 1 2 3 C NA NA NA
#> 4 1 2 3 D 1 2 3
#> 5 1 2 3 E 1 2 3
#> 6 NA NA NA F 1 2 3
#> 7 NA NA NA P 1 2 3