I have a very large dataset, which looks like this.
I have two types of data frames
- my reference data.frame
ref=c("cake","brownies")
and my experimental data.frame
expr=c("cak","cakee","cake", "rownies","browwnies")
I want to match the ref and expr data.frames and find the levenstein distance between them. The output could look like this...
ref expr distance
cake cak 1
cake cakee 1
cake cake 0
cake rownies ...
after I have measured their levenstein distance I want to cluster any string that has distance less than 3 to one cluster and my data to maybe look like
ref expr distance cluster
cake cak 1 1
cake cakee 1 1
cake cake 0 1
brownies rownies 1 2
brownies browwnies 1 2
any help or advice on how to move on is appreciate it. At the moment I am trying a lot
of R packages to find the distance between data.frame such as
library("DescTools")
but they do not seem to work well.