Merging string variables

Hi,
I am trying to merge all string variables including comments longer than 3 characters so URN, Q6, Score and Year would be excluded in this df:

source <- data.frame(
  stringsAsFactors = FALSE,
                                    URN = c("H732585",
                                            "H933818","H902416",
                                            "H793061","H592160","H972119",
                                            "H945230","H955322",
                                            "H814977","H972992"),
                                   QN3a = c("dsafs",NA,"aaa",
                                            "djdjd fdfj","fff",
                                            "f  d ffsa j dsf aa sd","dffg ddjfcj mj",
                                            "dkvvf ffk vjf fj","ttt","fddd"),
                                   QN3b = c(NA,NA,"nil",NA,NA,
                                            "no comments",NA,NA,"all good",NA),
                                   QN5a = c("xxxxx","Nothing at all",
                                            "I did not have any","Non",NA,
                                            "Nothing","N/A","None",NA,
                                            "Nothing really"),
                                   QN5b = c("All good",NA,NA,NA,
                                            "nothing",NA,NA,"na","daa ffss fssfsff sfasfa",NA),
                                   Q6 = c("Yes","No",NA,NA,
                                  "Yes",NA,NA,"No",NA,NA),
                                  Score = c(100,90,35,20,50,90,
                                            100,100,90,80),
                                   Year = c(2021,2021,2020,2020,
                                            2021,2021,2021,2021,2020,2020)
                     )
library(dplyr)
library(stringr)
library(tidyr)

result <-  source %>% 
  mutate_at(vars(matches("QN5$|QN3$")), ~str_remove_all(.x, "^.{1,5}$")) %>% # Remove sentences with less than 5 characters
  mutate_at(vars(matches("QN5$|QN3$")), ~str_remove_all(.x, "^(All//sgood|No\\scomments|N.?A|Nothing|None|Nil)$")) %>% # Remove sentences with no comments
  mutate(all_comments = paste(QN3a,QN3b,QN5a,QN5b, sep="/"),
         all_comments = str_remove_all(all_comments, "NA"), # Removes NAs
         all_comments = str_remove_all(all_comments, "[:cntrl:]"), # Removes control characters like /n/r
         all_comments = str_replace_all(all_comments, "\\s\\s+", " "),  #Removes duplicated /
         all_comments = str_replace_all(all_comments, "//+", "/"), # Removes extra spaces
         all_comments = str_remove (all_comments, "/$"), # Removes / in the end
         all_comments = str_remove (all_comments, "^/"))  %>% # Removes / in the beginning
  mutate(All_len = nchar(all_comments),
         All_wcount = str_count(all_comments,'\\w+'))

Unfortunately:

  1. My str_remove_all rules do not work
  2. I don't know how to specify a code to take into account str with comments longer than 10 characters, rather then specifying them clearly in my code (QN3a,QN3b,QN5a,QN5b)

Can you help?

this wont evaluate to any source variable . none of them end with QN5, nor QN3. perhaps choose that starts with symbol for regex ^, or simply drop the $ end of line regex.

Thank you but I was looking for something like this:

paste where(~ is.character(.) & any(. > 10 , na.rm = TRUE))

so taking into account character variables with text longer than 10 characters.
I used a shortcut to take into account all variables start5ing from QN3 or QN5.

Also, I don't know why my str_remove_all commands do not work...

But I'm saying you didn't.

so is it not the above?

evidence

library(tidyverse) 

(qn5_is_involved <- data.frame(AQN5=1:3,
                              QN5A=1:3))
qn5_is_involved%>% 
  mutate(across(matches("QN5$"),~"X"))


qn5_is_involved %>% 
  mutate(across(matches("^QN5"),~"X"))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.