The multiple
argument of dplyr
joins has four options for handling multiple matches: "all", "any", "first", "last". Why not "none", i.e., when a unique match exists, use it, and when there are multiple, fill whatever columns we were getting with NA
s? Is there a straightforward way to get this behaviour? For complex join criteria, it can be hard to determine in advance whether there will be multiple matches for a given row or not.
(Sorry, I misread your requirements and posted an irrelevant answer previously... then panicked and deleted it instead of just editing it)
I can't comment on why there is no "none" option, but you can get this behaviour with a summarise following the join. You can join using the default "all" and then collapse groups with more than 1 row. Targetting the specific columns you want replaced with NAs may be a bit trickier, depending on your data, but something like the following should get you started:
library(dplyr)
df1 <- data.frame(a = 1:5)
df2 <- data.frame(a = c(1, 2, 2, 3, 4, 5, 5),
b = letters[1:7])
# I split it into two sections for comparison purposes
df3_all <- left_join(df1, df2,
by = "a")
df3_none <- df3_all %>%
summarise(across(.cols = everything(),
.fns = ~ifelse(n() == 1, .x, NA)),
.by = a)
I don't know how complex your joins are, but for simply joining on a few variables, this should work.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.