# compare objects in a dataframe

If I have a data frame X col and Y row. I want to compare all the rows, column by column and sum the mismatches. For example:

. A B C D
1 1 4 3 2
2 3 1 5 3
3 5 2 4 3
And I want to compare 1 and 2
1 and 3
2 and 3

And for each mismatch record 1 and if they are equal 0 and sum the mismatches
1 1 1 1 = 4
1 1 1 1 = 4
1 1 1 0 = 3

And return the object with the smallest number

Are they all numbers? I think it might be more efficient to use matrices than data frames. You can loop through the rows and then use matrix operations to calculate the mismatches.

1 Like
``````(mydf <- structure(list(A = c(1L, 3L, 5L), B = c(4L, 1L, 2L), C = c(
3L,
5L, 4L
), D = c(2L, 3L, 3L)), row.names = c(NA, -3L), class = c(
"tbl_df",
"tbl", "data.frame"
)))

(index_combs <- combn(seq_len(nrow(mydf)),2,simplify = FALSE))

distfunc <- function(x,y){
as.integer(x!=y)
}

(raw_evals <- purrr::map(index_combs,
~distfunc(mydf[.[],],
mydf[.[],])) )
(sum_evals <-  purrr::map(raw_evals,sum))

# which is the min ?
(theminis <- which.min(sum_evals))

raw_evals[[theminis]]``````

This is really quick to do using base R. I added an extra row to you example dataframe.
This will return a vector of number of non-matches. If you need the whole matrix of 1s and 0s, you could replace the `sum` function with `as.integer`

``````# create data
structure(list(A = c(1L, 3L, 5L, 5L), B = c(4L, 1L, 2L, 1L),
C = c(3L, 5L, 4L, 4L), D = c(2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-4L))

# Matrix of all combinations of rows
com <- combn(nrow(df1), 2)

# Loop through all the row combos and add the sum number that match
apply(com, 2, function(i) sum(df1[i, ] != df1[i, ]))

#>  4 4 4 3 3 2
``````

I am using numbers. Thank you very much. It is easier with matrices

Hi, thank you very much. It is very fast. But what if I wanted to select the first one and the compare it with the rest of the data points. From your example:
-select A
-compare: A and B, A and C, A and D.
-Sum mismatches for each pair: 4, 4, 3
-Sum all them: 11
-then do it with the next object B and so on

Hi @StephanieBR, I hope you have already managed to find a solution for this yourself.

I am not sure if I quite understand what you're looking for, but maybe this, using `gtools::combinations`?

``````# Data
df1 <- structure(list(A = c(1L, 3L, 5L, 2L), B = c(4L, 1L, 2L, 3L), C = c(3L, 5L, 4L, 3L), D = c(2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, -4L))

# Get ALL combinations using gtools::combinations
combs <- gtools::permutations(nrow(df1), 2)

# Loop through all the row combos and sum the numbers that match
# Note that we use `1` here instead of `2` as in the previous answer - you can compare them to see the difference
result <- apply(combs, 1, function(i) as.integer(df1[i, ] != df1[i, ]))

# Identify the results if needed
colnames(result) <- paste(combs[, 1], combs[, 2], sep = '_')

# Sum the mismatches
colSums(result)
#> 1_2 1_3 1_4 2_1 2_3 2_4 3_1 3_2 3_4 4_1 4_2 4_3
#>   4   4   3   4   3   4   4   3   4   3   4   4

# Or view the whole matrix of results. I have transposed the results here with `t()` because I think it is easier to view
t(result)

#>     [,1] [,2] [,3] [,4]
#> 1_2    1    1    1    1
#> 1_3    1    1    1    1
#> 1_4    1    1    0    1
#> 2_1    1    1    1    1
#> 2_3    1    1    1    0
#> 2_4    1    1    1    1
#> 3_1    1    1    1    1
#> 3_2    1    1    1    0
#> 3_4    1    1    1    1
#> 4_1    1    1    0    1
#> 4_2    1    1    1    1
#> 4_3    1    1    1    1

``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.