@andresrcs Thanks a lot Andresrcs for taking out time to solve this problem! This code seems to work perfect with the sample code. But I fail to understand the logic of parameters inside filter and gather functions. What does 'soundex==0' and 'starts_with("V")' imply?
When I try to work with this code on my original dataset, it gives out names in the output which are little similar but with large differences, e.g., 'Aadiyta' and 'Aaram Techserve'. Also, which method is the code using for calculating string distance?
I had written the following code for my original dataset:
comb <- tidy_comb_all(df$`Name 1`)
out=tidy_stringdist(comb,method="lcs")
out=subset(out,(out$lcs>=1 & out$lcs<=10))
This worked pretty well in identifying the differences, but the problem with this code is that it would compare 'Antila, Thomas' with 'ANTILA, THOMAS' once and would again compare 'ANTILA, THOMAS' with 'Antila, Thomas', so, it is generating duplicates in the output. Also, it only lists out the 'Name' column whereas, I want to have all the other columns of my original dataset in the output. How is that achievable?