I have a very large database (48 thousand rows) and I want to review rows that are similar or the same but I can't find a way to do it.
I have this example base:
tribble(
~id, ~name, ~lastname, ~pasport, ~State,
1, "Peter", "Gomez", "1234", "Texas",
2, "Maria", "Perez", "4567", "Texas",
3, "Peterr", "Gomes", "1234", "Texas",
4, "Maria", "Perez", "4567", "Texas",
5, "Lucy", "Batista", "5784", "California",
6, "Peter", "Gomez", "1234", "Texaas",
7, "Maria", "Perezz", "4567", "Texas",
8, "John", "Mark", "9423", "California",
9, "Ben", "Aro", "3201", "Washington",
10, "Jennifer", "Cruz", "3456", "Ohio")
This is the result:
id name lastname pasport State
<dbl> <chr> <chr> <chr> <chr>
1 Peter Gomez 1234 Texas
2 Maria Perez 4567 Texas
3 Peterr Gomes 1234 Texas
4 Maria Perez 4567 Texas
5 Lucy Batista 5784 California
6 Peter Gomez 1234 Texaas
7 Maria Perezz 4567 Texas
8 John Mark 9423 California
9 Ben Aro 3201 Washington
10 Jennifer Cruz 3456 Ohio
I want a way that identifies similar or identicals rows and shows them to me. The output I want would be this (or something similar):
id name lastname pasport State
1 Peter Gomez 1234 Texas
3 Peterr Gomes 1234 Texas
6 Peter Gomez 1234 Texaas
2 Maria Perez 4567 Texas
4 Maria Perez 4567 Texas
7 Maria Perezz 4567 Texas
So I can know which observations have problems or were entered incorrectly.
I did not find a function or package that can do this. Anyone know a way? Thanks a lot!