Are they all numbers? I think it might be more efficient to use matrices than data frames. You can loop through the rows and then use matrix operations to calculate the mismatches.
This is really quick to do using base R. I added an extra row to you example dataframe.
This will return a vector of number of non-matches. If you need the whole matrix of 1s and 0s, you could replace the sum function with as.integer
# create data
structure(list(A = c(1L, 3L, 5L, 5L), B = c(4L, 1L, 2L, 1L),
C = c(3L, 5L, 4L, 4L), D = c(2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
# Matrix of all combinations of rows
com <- combn(nrow(df1), 2)
# Loop through all the row combos and add the sum number that match
apply(com, 2, function(i) sum(df1[i[1], ] != df1[i[2], ]))
#> [1] 4 4 4 3 3 2
Hi, thank you very much. It is very fast. But what if I wanted to select the first one and the compare it with the rest of the data points. From your example:
-select A
-compare: A and B, A and C, A and D.
-Sum mismatches for each pair: 4, 4, 3
-Sum all them: 11
-then do it with the next object B and so on
Hi @StephanieBR, I hope you have already managed to find a solution for this yourself.
I am not sure if I quite understand what you're looking for, but maybe this, using gtools::combinations?
# Data
df1 <- structure(list(A = c(1L, 3L, 5L, 2L), B = c(4L, 1L, 2L, 3L), C = c(3L, 5L, 4L, 3L), D = c(2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, -4L))
# Get ALL combinations using gtools::combinations
combs <- gtools::permutations(nrow(df1), 2)
# Loop through all the row combos and sum the numbers that match
# Note that we use `1` here instead of `2` as in the previous answer - you can compare them to see the difference
result <- apply(combs, 1, function(i) as.integer(df1[i[1], ] != df1[i[2], ]))
# Identify the results if needed
colnames(result) <- paste(combs[, 1], combs[, 2], sep = '_')
# Sum the mismatches
colSums(result)
#> 1_2 1_3 1_4 2_1 2_3 2_4 3_1 3_2 3_4 4_1 4_2 4_3
#> 4 4 3 4 3 4 4 3 4 3 4 4
# Or view the whole matrix of results. I have transposed the results here with `t()` because I think it is easier to view
t(result)
#> [,1] [,2] [,3] [,4]
#> 1_2 1 1 1 1
#> 1_3 1 1 1 1
#> 1_4 1 1 0 1
#> 2_1 1 1 1 1
#> 2_3 1 1 1 0
#> 2_4 1 1 1 1
#> 3_1 1 1 1 1
#> 3_2 1 1 1 0
#> 3_4 1 1 1 1
#> 4_1 1 1 0 1
#> 4_2 1 1 1 1
#> 4_3 1 1 1 1