dplyr::full_join drops unused values with multiple "groups" in y

I'm trying to retain all columns in x, even tho they are missing in y.

This works when I join with y having only a single "group", adding NA entries as expected.

However when I add another "group" to y values are silently dropped.

What am I missing?

Thanks!

library(magrittr)

base <- data.frame(
  "key" = c(1,2,3,4),
  "v_a" = c("asdf","fdsa","qwert","trewq")
)

join1 <- data.frame(
"key" = c(1,2,3),
"group" = c("g_a","g_a","g_a"),
"value" = c(111,222,333)
)

join2 <- data.frame(
  "key" = c(1,2,3,4),
  "group" = c("g_b","g_b","g_b","g_b"),
  "value" = c(444,555,666,777)
)

joincombined <- dplyr::bind_rows(join1, join2)


# joining a single group retains all keys in base
# NAs are correctly assigned for missing values in "group" and "value"

res_singleJoin <- base %>% 
  dplyr::full_join(join1, by = "key")

# joining the combined df drops missing entries "group" and "value"
res_combinedJoin <- base %>% 
  dplyr::full_join(joincombined, by = "key")

This works as expected for the definition of full join, try to join everything you can on the left without discard to everything on the right without discard, and it does that.
the 4th key no longer matches with nothing , it now can match with something.
if you want to full join base to join1 and join2 independently and then join the result you can of course do that, here is a way

library(tidyverse)
map_df(list(join1,join2),~full_join(base,.x,by = "key"))
1 Like

Thanks!

Two different solutions found on stackoverflow/reddit:

Using tidyr::complete

base %>% full_join(joincombined, by = "key") %>% complete(key, group)

Using dplyr::full/cross_join

all_key_groups <- base %>% 
select(key) %>% 
 dplyr::full_join(joincombined %>%
 select(group) %>%
 distinct(), by = character())

res_combinedJoin <- base %>% 
 dplyr::full_join(all_keys, by = "key") %>% 
 dplyr::left_join(joincombined, by = c('key', 'group'))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.