If Many Columns Contains String then enter value for that row with conditional

I have a dataframe called bd that looks like that:

The dataframe is this Link Basedatos.

  FAC_1                                       FAC_2                     FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11
   <chr>                                       <chr>                     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> 
 1 Substrato de mala calidad (muy meteorizado) "  Naturaleza del suelo ~ "  M~  NA    NA    NA   NA    NA    NA    NA     NA    
 2 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  N~ "  M~ "  M~  NA   NA    NA    NA    NA     NA    
 3 NA                                           NA                        NA    NA    NA    NA   NA    NA    NA    NA     NA    
 4 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  O~ "  P~ "  M~ NA    NA    NA    NA     NA    
 5 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  O~ "  N~ "  P~ NA    NA    NA    NA     NA    
 6 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  P~ "  M~  NA   NA    NA    NA    NA     NA    
 7 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  N~ "  P~ "  M~ NA    NA    NA    NA     NA    
 8 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  M~ "  P~ "  M~  NA   NA    NA    NA    NA     NA    
 9 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  N~ "  P~ "  M~ NA    NA    NA    NA     NA    
10 Substrato de mala calidad (muy meteorizado) "  Rocas muy fracturadas~ "  O~ "  N~ "  M~ "  P~ NA    NA    NA    NA     NA   

So I need to create 11 new columns, the first column have to search all the values that contains "Substrato" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string; the second column have to search all the values that contains "Alternancia" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string, and the same for all the rest of the columns.

To achieve this my code is the following:

The vector containers is the string that I have to find in the dataframe bd .

vect <- 1:11 #index vector
variables <- paste("FAC", vect, sep = "_") # variables names
containers <- c("Substrato","Alternancia", "Presencia", "fracturadas","desfavorable",
                "Naturaleza", "Material", "Pendiente", "MorfologĂ­a", "escacez", "Otro") # strings to find

bd$var1 <- character() #empty new column

# Create the first new column with "Substrato" like string:

for (i in 1:length(vect)){
  out <- if_else(grepl(containers[1], bd[ ,i]), "1", "0") 
  bd$var1<- c(var1, out)
}

#For the next columns I changed containers[1] by containers[2],[3],[4],....[11] and var1 by var2,3,4,..11.

But I have a problem in the code that not create the variables, I checked many times but I can not solve the problem .

I am going to feel so happy is someone can help me. Thank so much!!

Do you want the first new column to contain a 1 if any of the FAC_X columns contains Substrato?

Yes, and the second column for Alternancia, and so on. Thank so much.

This seems to work but is VERY slow. I expect there is a better way.

library(tidyverse)
library(rlang)
BD <- read.csv("bd.csv")
containers <- c("Substrato","Alternancia", "Presencia", "fracturadas","desfavorable",
                      "Naturaleza", "Material", "Pendiente", "MorfologĂ­a", "escacez", "Otro")
Vars <- paste("Var",1:11,sep = "_")

MyFunc <- function(Word, VarNm) {
  BD |> rowwise() |> 
  mutate({{VarNm}} := ifelse(any(str_detect(c_across(FAC_1:FAC_11),Word),na.rm=TRUE), 1, 0)) |> 
    select(all_of(VarNm))
}
OUT <- map2_dfc(containers, Vars, MyFunc)
BD <- bind_cols(BD, OUT)
1 Like

Solve the problem that is the most important.
I found another way:

for (i in seq_along(containers)){
  bd <- bd %>%
    mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}

Also solve with less time.

I have a little final question: How can I do to instead of "1" get the cell value. Thank so much!!

Try this version of your loop to get the cell in the text that matched.

for (i in seq_along(containers)){
  BD <- BD %>%
    mutate(!!sym(paste0("var",i)) := apply(BD[,1:11], 1, function(vec) ifelse(length(str_which(vec, containers[i]) > 0),
                                                                              vec[str_which(vec, containers[i])], "N/A" )))
}
1 Like

Thank so much for all. Have I nice day!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.