Fill the column based on multiple conditions

Hello everybody!

I've been stuck on this for a while and my usual workflow fails miserably for the increased number of columns. The problem is that I'm trying to create a column that would follow the rule:
if there's a single match for N columns then it should be coded as the match name of the column( or any other specific string). If there is more than a single match then it should code variable using the specific scheme: condition 1> condition 2 > condition 3 (e.g if C1,C2,C3 all evaluate as T then it should output condition 1).

Previously, I've been "solving" the same problem with case_when and just wrote every combination of conditions, obviously, this has gotten way out of hand when the number of columns has gotten 4+, and my brain breaks writing numerous case_whens. This is a truncated version of case_when for a couple of possible cases. Thank you for your help!

library(tidyverse)

df<- tribble(~A,~B,~C,~D,
            "I123","I121","I1908","I129",
            "I128","I123","I124","I109",
            "I126","I1855","I129","I183",
            "I121","I163","F121","I8773",
            "I123","I129","I1563","I121",
            "I129","I1665","I128", "F843" )


df1<- df%>%
mutate(Condtion_X1_present = if_any(A:D, ~ str_detect(., pattern = "I123|I128")))%>%
mutate(Condtion_X2_present = if_any(A:D, ~ str_detect(., pattern = "I121"))) %>%
mutate(Condtion_X3_present = if_any(A:D, ~ str_detect(., pattern = "I129"))) %>%
mutate(column_for_multiple_condtions = case_when(Condtion_X1_present == TRUE & Condtion_X2_present == TRUE &
                                                 Condtion_X3_present == TRUE ~ "Condtion one",
                                                 Condtion_X1_present == TRUE & Condtion_X2_present == FALSE &
                                                 Condtion_X3_present == FALSE ~ "Condtion one"))

Does this get you what you want? The column WHICH shows which condition is met.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
df<- tribble(~A,~B,~C,~D,
             "I123","I121","I1908","I129",
             "I128","I123","I124","I109",
             "I126","I1855","I129","I183",
             "I121","I163","F121","I8773",
             "I123","I129","I1563","I121",
             "I129","I1665","I128", "F843",
             "X","Y","Z","ZZ")


df <- df |> mutate(Row=row_number()) 
Long <- df |> pivot_longer(cols = A:D,names_to = "name")
Long <-  Long |> group_by(Row) |> 
  summarize(C1 = any(str_detect(value,"I123|I128")),
            C2 = any(str_detect(value,"I121")),
            C3 = any(str_detect(value,"I129"))) |> 
  rowwise() |> 
  mutate(WHICH=which(c_across(C1:C3))[1])
FINAL <- inner_join(df,Long,by="Row")
FINAL
#> # A tibble: 7 x 9
#>   A     B     C     D       Row C1    C2    C3    WHICH
#>   <chr> <chr> <chr> <chr> <int> <lgl> <lgl> <lgl> <int>
#> 1 I123  I121  I1908 I129      1 TRUE  TRUE  TRUE      1
#> 2 I128  I123  I124  I109      2 TRUE  FALSE FALSE     1
#> 3 I126  I1855 I129  I183      3 FALSE FALSE TRUE      3
#> 4 I121  I163  F121  I8773     4 FALSE TRUE  FALSE     2
#> 5 I123  I129  I1563 I121      5 TRUE  TRUE  TRUE      1
#> 6 I129  I1665 I128  F843      6 TRUE  FALSE TRUE      1
#> 7 X     Y     Z     ZZ        7 FALSE FALSE FALSE    NA

Created on 2022-06-15 by the reprex package (v2.0.1)

2 Likes

Works like magic, way better than writing for hours case_whens. I feel like I should start paying you money @FJCC
Thank you again!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.