Comparison of column pairs

Hi there,

I am struggling with performing a comparison of all possible column combinations. More precisely, I would like to perform an if else comparison of each column pair of the table below: "ab", "bc" and "ac. The comparison should check whether both columns have an entry >0 in the same row, if so return 1, if not return 0 and save this in a new column.

As I am not an expert in R, I do not know which methodology is best for this issue. A for loop?

You'd really help me out. Thx in advance,
Felix

t <- c("t1", "t2", "t3", "t4", "t5", "t6")
a <- c(10,4,8,0,0,0)
b <- c(0,5,4,0,0,5)
c <- c(5,0,5,1,4,0)

data.frame(t,a,b,c)

t a b c
1 t1 10 0 5
2 t2 4 5 0
3 t3 8 4 5
4 t4 0 0 1
5 t5 0 0 4
6 t6 0 5 0

Hi there,

Here is some example code to derive all possible combinations of pairs with combn. As you will see I have added in a loop for you to print the specific configuration. There are several ways in which you'd be able to call up the column based on the labels you now have access to.

letters[1:4]
#> [1] "a" "b" "c" "d"

df1 <- combn(letters[1:4], 2)


for(i in 1:ncol(df1)){
  
  print(df1[,i])
}
#> [1] "a" "b"
#> [1] "a" "c"
#> [1] "a" "d"
#> [1] "b" "c"
#> [1] "b" "d"
#> [1] "c" "d"

Created on 2022-05-26 by the reprex package (v2.0.1)

1 Like

Hi @GreyMerchant ,

thank you for the quick reply! I didn't know that function which is why I have done the combinations manually to this date, thank you.

One follow-up question: Can you help me to formulate the loop that would do the following? As I am a beginner in R I haven't yet understood how to call up the columns in this loop.

data$ab <- ifelse(data$a>0 & data$b>0, 1, 0)
data$bc <- ifelse(data$b>0 & data$c>0, 1, 0)
data$ac <- ifelse(data$a>0 & data$c>0, 1, 0)

image

In words, I need to create a column for each pair and print "1" if column i and j are both >0 and "0" else.

I quickly took your example and created the steps to run all those 3 comparisons in a loop. You'll see below how it now works. There are nicer/cleaner ways to write this but this should give you an idea of how to do it. If you have any questions let me know

t <- c("t1", "t2", "t3", "t4", "t5", "t6")
a <- c(10,4,8,0,0,0)
b <- c(0,5,4,0,0,5)
c <- c(5,0,5,1,4,0)

df <- data.frame(t,a,b,c)

comb_x <- combn(letters[1:3], 2)

result_list <- list()

for(i in 1:ncol(comb_x)){
  
  result_list[[i]] <-
  ifelse(  df[[comb_x[,i][1]]] >0 & df[[comb_x[,i][2]]] >0, 1, 0) 
  
}

result_list
1 Like

Thanks for getting back to me! Understood what you did there and the loop does the job, thank you!

However, as I would like to apply this code to another data set with column headers not consisting of letters but strings, I was wondering whether there is a way to extract the column header combinations? I tried to replace "letters" by the data set, but then it combn returns combinations of the entire columns, not just of the headers.

Do you have an idea?

Best,
Felix

this is a minor modification to draw out the connection between the column names, and the pairing of those names

t <- c("t1", "t2", "t3", "t4", "t5", "t6")
a1 <- c(10,4,8,0,0,0)
b2 <- c(0,5,4,0,0,5)
c3 <- c(5,0,5,1,4,0)

df <- data.frame(t,a1,b2,c3)

(cols_of_interest <- setdiff(names(df),"t"))

comb_x <- combn(cols_of_interest, 2)

result_list <- list()

for(i in 1:ncol(comb_x)){
  
  result_list[[i]] <-
    ifelse(  df[[comb_x[,i][1]]] >0 & df[[comb_x[,i][2]]] >0, 1, 0) 
  
}

result_list

# can also do the loop differently

comb_x2 <- combn(cols_of_interest, 2,simplify = FALSE)

rl2 <- map( comb_x2,
           ~{    (df[.x][[1]] > 0 & df[.x][[2]] >0 ) * 1})

#same result, for what I think is a more pleasant syntax.
identical(result_list,rl2)
2 Likes

Hello,

All the above answers are great, but I thought I'd add my approach as well (just for the fun of it :slight_smile: ). It very similar to @nirgrahamuk but I don't use the map function.

t <- c("t1", "t2", "t3", "t4", "t5", "t6")
a <- c(10,4,8,0,0,0)
b <- c(0,5,4,0,0,5)
c <- c(5,0,5,1,4,0)

myData = data.frame(t,a,b,c)

colComb = combn(colnames(myData[,-1]), 2)
check = myData[,colComb[1,]] > 0 & myData[,colComb[2,]] > 0
colnames(check) = paste0(colComb[1,],colComb[2,])

cbind(myData, ifelse(check == T, 1, 0))
#>    t  a b c ab ac bc
#> 1 t1 10 0 5  0  1  0
#> 2 t2  4 5 0  1  0  0
#> 3 t3  8 4 5  1  1  1
#> 4 t4  0 0 1  0  0  0
#> 5 t5  0 0 4  0  0  0
#> 6 t6  0 5 0  0  0  0

Created on 2022-05-27 by the reprex package (v2.0.1)

PJ

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.