identifying exact or near duplicate names in a dataset

In past projects I've use stringdist with good success. Colin Fay created a few vinettes on this method here:

For a longer exploration, Colin has a blog post below, which works up to string distance on the Game of Thornes dataset (about halfway down for where the string dist discussion starts).

String distance might be problematic if you have too many companies with similar names.

Stack overflow has a nice discussion:

