I currently have two datasets that I would like to merge:
year <- c("2002", "2002", "1999", "1999", "1997", "2002", "2005", "2008")
state <- c("TN", "TN", "AL", "AL", "CA", "TN", "NY", "NC")
name <- c("Brass; Smith", "Joe", "Christopher", "Bob; Holland", "Wayne", NA, "Joseph A. Freeland", "")
df1 <- data.frame(year, state, name)
year <- c("2002", "2002", "2005")
state <- c("TN", "TN", "NY")
versus <- c("Carl Brass", "Joe", "Freeland")
color <- c("Blue", "Red", "Yellow")
df2 <- data.frame(year, state, versus)
I want to combine year
and state
exactly. I then want to match name
and versus
in df1
and df2
respectively. I also want the fuzzy match to occur before the ;
.
I want an output that looks like this:
year <- c("2002", "2002", "1999", "1999", "1997", "2002", "2005", "2008")
state <- c("TN", "TN", "AL", "AL", "CA", "TN", "NY", "NC")
name <- c("Brass; Smith", "Joe", "Christopher", "Bob; Holland", "Wayne", NA, "Joseph A. Freeland", "")
versus <- c("Carl Brass", "Joe", NA, NA, NA, NA, "Freeland", NA)
color <- c("Blue", "Red", NA, NA, NA, NA, "Yellow", NA)
df3 <- data.frame(year, state, name, versus, color)
I tried the following:
f <- function(n,v) {
wrds = stringr::str_extract_all(n, "\\b\\w*\\b")[[1]]
sum(sapply(wrds[which(nchar(wrds)>1)], grepl,x=v,ignore.case=T))>0
}
df4 <- left_join(df1, df2, by=c("year","state")) %>%
rowwise() %>%
mutate(versus:=if_else(f(name, versus), name, NA_character_))
But I keep getting this error message:
Error in `mutate()`:
! Problem while computing `versus = if_else(f(name, versus), name, NA_character_)`.
ℹ The error occurred in row 8.
Caused by error in `sum()`:
! invalid 'type' (list) of argument
I'm not sure what is going on and any help would be appreciated!