multiple comparisons (loop) matching a condition from other varible

  df <- data.frame(gene1 = c("A", "B", "C", "D", "E"), 
                         gene2 = c("F", "B", "G", "H", "I"), 
                         gene3 = c("J", "B", "K", "L", "E"), 
                         gene4 = c("N", "O", "P", "Q", "R"), 
                         gene5 = c("S", "T", "U", "V", "W"),
                         gene6 = c("S", "T", "U", "V", "W"),
                         result_gene1 = c(“positive”, "negativo", "negativo", "negativo", “positive”), 
                         result_gene2 = c("negativo", “positive”, "negativo", "negativo", "negativo"), 
                         result_gene3 = c("negativo", “positive”, "negativo", "negativo", "negativo"), 
                         result_gene4 = c("negativo", "negativo", "negativo", "negativo", “positive”), 
                         result_gene5 = c("negativo", "negativo", "negativo", "negativo", "negativo"),
                         result_gene6 = c("negativo", "negativo", "negativo", "negativo", "negativo"),
                         A1 = c(“positive”, "negativo", "negativo", "negativo", “positive”), 
                         A2 = c("negativo", “positive”, "negativo", "negativo", "negativo"), 
                         A3 = c("negativo", “positive”, "negativo", "negativo", "negativo"), 
                         A4 = c("negativo", "negativo", "negativo", "negativo", “positive”), 
                         A5 = c("negativo", "negativo", "negativo", "negativo", "negativo"),
                         A6 = c("negativo", "negativo", "negativo", "negativo", "negativo"))

In this dataframe example, I want to compare gene1 which is "positive" in result_gene1 with gene2 which is "positive" in result_gene2. If they are equal, I want to note this as "ok" in a new column. The gene columns form pairs such as gene3 with result_gene3, and so forth. I need a script that compares gene1 (matching the condition) to gene2, gene3, gene4, gene5, gene6, one by one. I need the comparison of all genes with each other that are "positive" in their respective "result_gene" columns. I tried a verbose approach but it did not work properly. I also tried a loop suggested by chatGPT, but it was incorrect. Do I need to select the columns I will use? Can some human help me with this issue? -

You have a problem with your data. thanks BTW for supplying it. Unfortunately you have some curly quotes “positive” mixed in with straight ones "negativo".

I think that this is what you intended---note I changed the name of the data.frame to df1 because df() in an R function.

df1  <- structure(list(gene1 = c("A", "B", "C", "D", "E"), gene2 = c("F", 
"B", "G", "H", "I"), gene3 = c("J", "B", "K", "L", "E"), gene4 = c("N", 
"O", "P", "Q", "R"), gene5 = c("S", "T", "U", "V", "W"), gene6 = c("S", 
"T", "U", "V", "W"), result_gene1 = c("positive", "negativo", 
"negativo", "negativo", "positive"), result_gene2 = c("negativo", 
"positive", "negativo", "negativo", "negativo"), result_gene3 = c("negativo", 
"positive", "negativo", "negativo", "negativo"), result_gene4 = c("negativo", 
"negativo", "negativo", "negativo", "positive"), result_gene5 = c("negativo", 
"negativo", "negativo", "negativo", "negativo"), result_gene6 = c("negativo", 
"negativo", "negativo", "negativo", "negativo"), A1 = c("positive", 
"negativo", "negativo", "negativo", "positive"), A2 = c("negativo", 
"positive", "negativo", "negativo", "negativo"), A3 = c("negativo", 
"positive", "negativo", "negativo", "negativo"), A4 = c("negativo", 
"negativo", "negativo", "negativo", "positive"), A5 = c("negativo", 
"negativo", "negativo", "negativo", "negativo"), A6 = c("negativo", 
"negativo", "negativo", "negativo", "negativo")), class = "data.frame", row.names = c(NA, 
-5L))

Thanks. I didn´t know this would be a problem. I was actually getting the same df structure as the one you updated.

Very strange, I was getting

df <- data.frame(gene1 = c("A", "B", "C", "D", "E"), 
+                  gene2 = c("F", "B", "G", "H", "I"), 
+                  gene3 = c("J", "B", "K", "L", "E"), 
+                  gene4 = c("N", "O", "P", "Q", "R"), 
+                  gene5 = c("S", "T", "U", "V", "W"),
+                  gene6 = c("S", "T", "U", "V", "W"),
+                  result_gene1 = c(“positive”, "negativo", "negativo", "negativo", “positive”), 
Error: unexpected input in:
"                 gene6 = c("S", "T", "U", "V", "W"),
                 result_gene1 = c(“"

when trying to load the data.

Sorry, you're right. In my code I was using the correct syntax, but when I pasted it here, some problems may have occurred. Anyway, do you have any ideas on how I could run a loop for the comparison? Do I need to select the columns I want to compare beforehand?

I have not thought about it but I will have a look. BTW, in R a lot of the things you would do with a loop in a "conventional" programming language can be done faster and with less code in other way so I would not limit myself to a loop here.

library(tidyverse)
library(glue)
df1 <- tribble(
  ~gene1, ~gene2, ~gene3, ~gene4, ~gene5, ~gene6, ~result_gene1, ~result_gene2, ~result_gene3, ~result_gene4, ~result_gene5, ~result_gene6, ~A1, ~A2, ~A3, ~A4, ~A5, ~A6,
  "A", "F", "J", "N", "S", "S", "positive", "negativo", "negativo", "negativo", "negativo", "negativo", "positive", "negativo", "negativo", "negativo", "negativo", "negativo",
  "B", "B", "B", "O", "T", "T", "negativo", "positive", "positive", "negativo", "negativo", "negativo", "negativo", "positive", "positive", "negativo", "negativo", "negativo",
  "C", "G", "K", "P", "U", "U", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo",
  "D", "H", "L", "Q", "V", "V", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo", "negativo",
  "E", "I", "E", "R", "W", "W", "positive", "negativo", "negativo", "positive", "negativo", "negativo", "positive", "negativo", "negativo", "positive", "negativo", "negativo"
)


(comparisons_to_make <-
  combn(
    x = paste0("gene", 1:6),
    m = 2, simplify = FALSE
  ))

(nms_of_compares <- map_chr(comparisons_to_make, paste0, collapse = "_"))

names(comparisons_to_make) <- nms_of_compares

(compare_results <- map_lgl(comparisons_to_make, \(x_){
  gname_left <- x_[[1]]
  gname_right <- x_[[2]]
  rname_left <- paste0("result_", gname_left)
  rname_right <- paste0("result_", gname_right)
  print(glue("Processing {gname_left} & {gname_right}"))
  v_left <- filter(
    df1,
    !!sym(rname_left) == "positive"
  ) |> pull(gname_left)
  v_right <- filter(
    df1,
    !!sym(rname_right) == "positive"
  ) |> pull(gname_right)


  res <- identical(v_left, v_right) & !identical(v_left, character(0))
  print(glue("comparing {toString(v_left)} to  {toString(v_right)} \t eval: {res} "))
  res
}))

results_as_a_frame <- enframe(compare_results) |>
  pivot_wider() |>
  mutate(across(
    .cols = everything(),
    \(x)ifelse(x, "ok", "")
  ))

# bashed together
(df2 <- df1 |> bind_cols(results_as_a_frame))

glimpse(df2)

Thanks.

I got this error message: "Error in map_chr(comparisons_to_make, paste0, collapse = "_") :
could not find function "map_chr""

Just to provide feedback, I ended up with this code. I had to compare each gene one by one, which was verbose, but it worked. Thank good there were not many genes. lol

   > df1 <-  df1 %>% 
     mutate(g1_g2 = ifelse(result_gene1 == "positive"& result_gene2 == "positive" ,
                             ifelse(gene1 == gene2, "equal", "not equal"),
                             "not positive"))

nirgrahamuk's solution worked for me. It may take me months to understand a couple of things that were done.
mawy not have load

"map_chr" is in the purr package and I think loads automatically with tidyverse but it may not have loaded correctly for you. an explicit library(purr might do it.

Thank you both! It worked after loading purr. It will take me several months to fully understand the code too.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.