Hi everyone,
So I'm fairly new to the RStudio world and am having some trouble comparing data between rows based upon multiple column parameters. Say I have some excel table (shown in R) that looks like this:
A tibble: 6 x 4
'Fruit Number' 'Fruit' 'Length' 'Color complexion'
1 1 Apple 2 0.34
2 2 Banana 4 0.23
3 3 Orange 2 0.68
4 4 Peach 3 0.11
5 5 Guava 4 0.47
6 6 Banana 4 0.25
In my analysis, I want to essentially loop through each row so that row 1 is compared to all other rows, row 2 is compared to all other rows, etc. (each row is compared to every other row). Then I want to compare the 'Fruit' , 'Length', and 'Color complexion'. If the row has the same 'Fruit' name and 'Length' as another row with a 'Color complexion' within 25% similarity then I want to take both rows out of the data set. I know this can be achieved relatively easily with a loop and some if statements, but am unfamiliar with how to do this in RStudio.
This is also my first post, so I do apologize if it is a little unconventional. Thanks!