Merge and Full Join Confusion

I am stumped with this. I have two objects that I had to split due to their structure. I am familiar with SQL and the UNION function where you can combine two small datasets and combine them without having duplicate columns. I have a problem where I have two data sets and I am trying to combine them but I get duplicate columns with empty or NAs rows. I assumed that the rows of the data would be find . The number of rows are correct but not the columns. I placed the two solutions I have tried below.

full_join <- merge(dt4.1, dt4.2, by = c("MDRLOGD_PROCESS","UPDATE_NAME","MDRLOGD_RUN_DATE2"), all = TRUE)
print(full_join)

full_join2 <- full_join(dt4.1, dt4.2, by = c("MDRLOGD_PROCESS","UPDATE_NAME","MDRLOGD_RUN_DATE2"))
print(full_join2)

left_join should work. Haven’t tested because no reprex

To vertically stack two frames use bind_rows(), whereas
using merge or full_join is a horizontal merge, which it doesnt seem you want.

full_join()

The full join includes all rows from the x dataset or the y dataset and takes in the following arguments. In your case, it would include all the rows from dt4.1 or dt4.2.

The full_join() takes in the following arguments:

full_join(x dataset, y dataset,  by = c("character vector of variables to join by"))

full_join() Example:

Here are two datasets I made up: my_pets and friends_pets

my_pets
#> # A tibble: 6 x 3
#>   pet_name species met_friends_pets
#>   <chr>    <chr>   <chr>           
#> 1 Smoochie cat     yes             
#> 2 Salem    cat     yes             
#> 3 Samson   cat     no              
#> 4 Brutus   cat     no              
#> 5 Gus      cat     no              
#> 6 Sage     dog     yes

friends_pets
#> # A tibble: 3 x 3
#>   pet_name species met_friends_pets
#>   <chr>    <chr>   <chr>           
#> 1 Nala     dog     yes             
#> 2 Frankie  dog     yes             
#> 3 Kila     dog     no

Created on 2023-10-10 by the reprex package (v2.0.1)

Now I will join these datasets together by the column they have in common, which is met_friends_pets, by using full_join():

full_join(my_pets,friends_pets, met_friends_pets = "yes")
#> Joining, by = c("pet_name", "species", "met_friends_pets")
#> # A tibble: 9 x 3
#>   pet_name species met_friends_pets
#>   <chr>    <chr>   <chr>           
#> 1 Smoochie cat     yes             
#> 2 Salem    cat     yes             
#> 3 Samson   cat     no              
#> 4 Brutus   cat     no              
#> 5 Gus      cat     no              
#> 6 Sage     dog     yes             
#> 7 Nala     dog     yes             
#> 8 Frankie  dog     yes             
#> 9 Kila     dog     no

Created on 2023-10-10 by the reprex package (v2.0.1)
The new table includes every row from each dataset. Notice I specified in the full_join() argument to join based on met_friends_pets = "yes". Not every pet has a "yes" in the met_friends_pets column, but they are still included in the full join.

1 Like

full_join <- rbind(dt4.1, dt4.2)

This actually worked

Thank you, this is what helped me understand this concept

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.