Double Condition

PalomaLlorente · December 10, 2019, 7:37pm

I need to find info from a data.frame into another data.frame and identify the data as FOUND and NOT FOUND.
Let's call 1st data.frame "conflict" and 2nd "segment". Each data.frame has 2 columns: ID1 and ID2. When info from "conflict" is searched in "segment" both ID1 and ID2 from a row has to be found to not be identified as FOUND in the "result" data.frame; if not, that row is identified as NOT FOUND.
I present a short reprex to expose my issue:

segment<- data.frame(stringsAsFactors=FALSE,
ID1_seg = c("AAAA_AAAB", "AAAB_AAAC", "AAAG_AAAH"),
ID2_seg = c("AAAC_AAAD", "AAAD_AAAF", "AAAE_AAAF")
)
conflict<- data.frame(stringsAsFactors=FALSE,
ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF")
)
result<- data.frame(stringsAsFactors=FALSE,
ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF"),
Result = c("FOUND", "NOT FOUND", "NOT FOUND")
)

As it can be seen, only 1st row was found as ID2_conf in 2nd row wasn´t in "segment" data.frame and neither ID1_conf nor ID2_conf were found in "segment" data.frame.

Thanks in advance!

raytong · December 11, 2019, 5:42am

Hi @PalomaLlorente. You may use pmap and pass all columns in a list to check the condition.

library(tidyverse)

segment<- data.frame(stringsAsFactors=FALSE,
                     ID1_seg = c("AAAA_AAAB", "AAAB_AAAC", "AAAG_AAAH"),
                     ID2_seg = c("AAAC_AAAD", "AAAD_AAAF", "AAAE_AAAF")
)

conflict<- data.frame(stringsAsFactors=FALSE,
                      ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
                      ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF")
)

conflict %>%
  mutate(Result = pmap(list(ID1_conf, ID2_conf, segment$ID1_seg, segment$ID2_seg),
                       function(a, b, c, d) {
                         ifelse(a == c && b == d, "FOUND", "NOT FOUND")
                       }))
#>    ID1_conf  ID2_conf    Result
#> 1 AAAA_AAAB AAAC_AAAD     FOUND
#> 2 AAAB_AAAC BBBD_BBBF NOT FOUND
#> 3 BBBG_BBBH BBBE_BBBF NOT FOUND

^{Created on 2019-12-11 by the reprex package (v0.3.0)}

PalomaLlorente · December 11, 2019, 6:58pm

Hi @raytong, thank you for your quick response. Unfortunately, your answer didn´t solve my problem, perhaps because I haven´t explain right myself. IDs from "conflict" data.frame have to be found in "segment" data.frame, doesn´t mind ID row or what ID_seg is (1 or 2), it just needs to be in the data.frame "segment" to be FOUND; that is to say, ID2_conf row 1 can be found in ID1_seg row 35. What I would like to stress is that both, ID1_conf and ID2_conf have to be found in the data.frame "segment" to be categorized as FOUND. That IDs are ligated and should be categorized as a set.

I expose another example (I have eliminate column "ID1_seg" just to make things simplier and have all data to be searched in 1 column):

segment<- data.frame(stringsAsFactors=FALSE,
ID1_seg = c("AAAA_AAAB", "AAAC_AAAD", "AAAG_AAAH", "AAAD_AAAD", "AAAD_AAAF", "AAAE_AAAF")
)

conflict<- data.frame(stringsAsFactors=FALSE,
ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF")
)

result<- data.frame(stringsAsFactors=FALSE,
ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF"),
Result = c("FOUND", "NOT FOUND", "NOT FOUND")
)

As you can see, the solution is the same as the other example, as data has not changed in the "segment" data.frame (only row position has changed).

Hope to have explained right my issue. Thanks anyway

raytong · December 11, 2019, 11:15pm

@PalomaLlorente. I am confusing. It is contradict to your first post as the result will be "FOUND", "FOUND" and "NOT FOUND".

library(tidyverse)

segment<- data.frame(stringsAsFactors=FALSE,
                     ID1_seg = c("AAAA_AAAB", "AAAB_AAAC", "AAAG_AAAH"),
                     ID2_seg = c("AAAC_AAAD", "AAAD_AAAF", "AAAE_AAAF")
)

conflict<- data.frame(stringsAsFactors=FALSE,
                      ID1_conf = c("AAAA_AAAB", "AAAB_AAAC", "BBBG_BBBH"),
                      ID2_conf = c("AAAC_AAAD", "BBBD_BBBF", "BBBE_BBBF")
)

conflict %>%
  mutate(Result = map2_chr(ID1_conf, ID2_conf, ~{
    ifelse(any(c(.x, .y) %in% c(segment$ID1_seg, segment$ID2_seg)), "FOUND", "NOT FOUND")
  }))
#>    ID1_conf  ID2_conf    Result
#> 1 AAAA_AAAB AAAC_AAAD     FOUND
#> 2 AAAB_AAAC BBBD_BBBF     FOUND
#> 3 BBBG_BBBH BBBE_BBBF NOT FOUND

^{Created on 2019-12-12 by the reprex package (v0.3.0)}

PalomaLlorente · December 12, 2019, 5:32pm

No, second row is NOT FOUND. As I explain in the post, BOTH IDs (ID1_conf and ID2_conf) should be in the segment data.frame to be categorized as FOUND. AAAB_AAAC is in it, but BBBD_BBBE is not, then that row is identified as NOT FOUND.

The idea is: If "conflict$ID1_conf" is in "segment$ID_seg" && "conflict$ID2_conf" is also in "segment$ID_seg", then "Result == FOUND" else "Result == NOT FOUND". Have I explain better myself? Hope so haha. Thank you so much for your help!

raytong · December 12, 2019, 10:50pm

@PalomaLlorente. So, change any to all in ifelse.

conflict %>%
  mutate(Result = map2_chr(ID1_conf, ID2_conf, ~{
    ifelse(all(c(.x, .y) %in% c(segment$ID1_seg, segment$ID2_seg)), "FOUND", "NOT FOUND")
  }))```

system · December 19, 2019, 10:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.