Hi guys,
I have 2 dataframes. I need add a new column in the dataframe x, using the more similar data in the dataframe Y.
x<- data.frame(customer=c("2BOSA7A6T"))
y<- data.frame(supplier=c("2BOS A7A6T;SC4","2BOS A7A6T;SL4",
"2BOS A7M6T;SC4", "2BOS A7M6T;SL4"))
### the result would be that:
x<- data.frame(customer=c("2BOSA7A6T"),
supplier=c("2BOS A7A6T;SC4",
"2BOS A7A6T;SL4"))
### I think sapply function could help
Hi again,
Thank you for quick response.
I keep having issues becuase it is not runing for the following example:
x <- data.frame(customer = "SHBUF7WF2ZIEZ221T")
y <- data.frame(supplier = c("SHBUF7;WF2,ZIE,Z22,1T9","SHBUF7;WF2,ZIE,Z22,1T8",
"SHBUF7;WF2,ZIE,Z22,1T", "SHB UF7;WF2,ZIE,Z22",
"SHBUF7;WF2,ZIE,Z22,1T9999999"))
#### The solution would be
x<- data.frame(customer=c("SHBUF7WF2ZIEZ221T"),
supplier=c("SHBUF7;WF2,ZIE,Z22,1T9",
"SHBUF7;WF2,ZIE,Z22,1T8","SHBUF7;WF2,ZIE,Z22,1T",
"SHBUF7;WF2,ZIE,Z22,1T9999999"))

It won't because the join I've designed earlier assumes that the entire customer ID precedes the semi-colon which it doesn't in your second example. It's even more problematic because in your second example, the customer ID appears to be separated by a semi-colon which is used to distinguish between the customer and supplied ID in the first example.
If your data contains both these cases, it'll be hard to create a generalized solution.