Hi there, I want to find out the best matching between the variables "o" and "p".
The logic should permit some structure detect to allow possible strings that are not typed properly like "benit = benito".
In addition the logic should give priority to the observations in variable "p" that have a Full match ( all/ most of strings in the varaible "o" have ben detected with strings in the variable "p") and have less excess of strings. For example:
o -> c("rosa benito 1 2 3")
p -> c("rosa benit 1 2 3 5 8 4 6", "ros benito 1 2 3 4", "rosa benito 1 2 3 4 5 6 7")
In in this case the best matching would be for "ros benito 1 2 3 4"
Can someone help me here? I am going to type below a better. Many thanks
id_o = c(1,2), description_o = c("adam carla bryan 19 18 17", "rosa benito 1 2 3"))
p<- data.frame(id_p = c(1,2, 3, 4, 5, 6, 7, 8, 9, 10), description_p = c("adam bryan carla 18 17 19",
"adam carla bryan 19 18 17 16",
"adam carla bryan 19 18 17 16 15 14 13",
"adam carla bryan 19 18 17 16 15 14 13",
"adam carla 19 18 17 16 15",
"adam car bry 19 18 17 16 15",
"rosa benito 1 3",
"rosa benito 2 3",
"rosa benit 1 2 3",
"rosaaa benito 1 2 3"))
q<- data.frame(id_o = c(1,2), description_o = c("adam carla bryan 19 18 17", "rosa benito 1 2 3"),
id_p = c(1, 9),
description_p = c("adam bryan carla 18 17 19",
"rosa benit 1 2 3"))