Merging string variables with some exclusions

Hi,
I'm trying to merge all string variables containing "Com" in their names.

I have prepared the code below but mutate_at is not working properly.

library(dplyr)
library(stringr)

TM.data <- data.frame(stringsAsFactors=FALSE,
                                                      DF.URN = c("fds", "xdgx", "gvx", "ryh", "jhgjf", "df", "fg", "jgg",
                                                                 "gjg"),
                                                         Rec = c(10, 10, 8, 10, 8, 5, 10, 8, 7),
                                                      SatCom = c("Nothing", "Great service.", NA, "NA", "xxxxxx",
                                                                 "No comment",
                                                                 NA, NA, NA),
                                                    AltTrCom = c("NA", "NA", "NA", "NA", "NA", "…....", "NA", "NA", NA),
                                                      EnvCom = c("NA", "NA", "NA", "NA", "NA", "no complaints.", "NA", "NA",
                                                                 "Car park"),
                                                    StaffCom = c("NA", "NA", "bla bla blda", "NA", "NA", "NA", "NA", NA, NA),
                                                    ValueCom = c("NA", "NA", "NA", "NA", "Extend the service.", "NA", "NA",
                                                                 "NA", "NA"),
                                                  WaitingCom = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", NA,
                                                                 "Not applicable"),
                                                WorkComplCom = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", NA,
                                                                 "Not applicable"),
                                                  ContactCom = c("xxx", "no complaints", "NA", "something weird", "NA",
                                                                 "NA", "NA",
                                                                 NA,
                                                                 "Not applicable")
                                             )

TM.data <- TM.data %>%
  mutate_at(vars(matches("com$")), ~str_remove_all(.x, "^.{1,5}$"), ~str_remove_all(.x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"), ~str_remove_all(.x, "^(NA)$")) %>% # Remove blanks
  mutate(all_comment = paste(SatCom, AltTrCom, EnvCom, StaffCom, ValueCom, WaitingCom, WorkComplCom, ContactCom, sep="/"), # Merges comment variables
         all_comment = str_remove_all(all_comment, "(.)\\1{2,}"), # Removes repeted characters
         all_comment = str_remove_all(all_comment, "NA"), # Removes NAs
         all_comment = str_remove_all(all_comment, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"), # Removes blanks 2
         all_comment = str_remove_all(all_comment, "[:cntrl:]"), # Removes control characters like /n/r
         all_comment = str_replace_all(all_comment, "\\s\\s+", " "),  #Removes duplicated /
         all_comment = str_replace_all(all_comment, "//+", "/")) # Removes extra spaces

TM.data$all_comment <- str_remove(TM.data$all_comment, "/$") # Removes / in the end


TM.data

I still get merged comments with "Not applicable" and "No comment".
Can you help please?

If you want to pass multiple functions to mutate_at(), you need to wrap them inside a list() — however, this is designed for cases where you want to run multiple independent functions on the same set of columns, not multiple successive functions like what you are doing. This detail gets a bit lost in the main text of the mutate_at() documentation, but it’s a lot clearer if you read through the examples.

I can think of a few different options for doing what you want. Here are two:

  1. Absolute simplest: call mutate_at() multiple times in a row, and make sure to wrap your single, unnamed function in a list() so that the columns are modified in place instead of new columns being created (see the final example in the documentation). So something like:
    mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^.{1,5}$"))) %>% 
    mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"))) %>%
    mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^(NA)$"))) %>%
    
  2. Less repetitive: Write a small function that applies all of your successive transformations, and call that inside mutate_at(). For example:
    remove_spaces <- function(x) {
      str_remove_all(x, "^.{1,5}$") %>%
      str_remove_all(x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
      str_remove_all(x, "^(NA)$")
    }
    
    TM.data %>%
      mutate_at(vars(matches("com$")), list(remove_spaces)) %>%
      # etc
    
1 Like

A small correction for this

remove_spaces <- function(x) {
    x %>% 
    str_remove_all("^.{1,5}$") %>%
    str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
    str_remove_all("^(NA)$")
}
1 Like

Thank you but nothing has changed after using that

remove_spaces <- function(x) {
    x %>% 
    str_remove_all("^.{1,5}$") %>%
    str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
    str_remove_all("^(NA)$")
}
TM.data %>%
  mutate_at(vars(matches("com$")), list(remove_spaces)) %>%

I still can see "Not applicable" and "No comment" in my merged comments...

That is because you haven't made the regex case-insensitive, you already know how to do that from your previous topics, give it a try, the idea is that you learn from our answers, not to simply copy/paste the coding solutions.

I've tried multiple options:

  str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE) %>%
    str_remove_all("^(NA)$", ignore_case = TRUE)
---
   str_remove_all(ignore.case("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$")) %>%
    str_remove_all(ignore.case("^(NA)$"))
---
       str_remove_all(grepl("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"),ignore.case=TRUE) %>%
    str_remove_all(grepl("^(NA)$"),ignore.case=TRUE)

And I cannot find any help in documentation :sob:

You already have asked this before (several times in fact) you just have to check your previous topics, for example, this one

1 Like

I know andresrcs and thank you for being so patient. I've gone through all your previous responses but I can see that

, ignore_case = TRUE)

works well with str_detect or with regex but not with str_remove_all :confused:

This doesn't make any sense, str_remove_all() (like any other stringr function) also accepts expressions constructed with regex(), this is as simple as

str_remove_all(regex("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)) %>%

Aaaaa, sure!!! Silly me!

Thank you!

Now I have a question as I want to understand functionality of this specific "remove_spaces" function.

Is removing not required phrases thank to the function or to these lines?

         all_comment = str_remove_all(all_comment, "NA"), # Removes NAs
         all_comment = str_remove_all(all_comment, regex("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)), # Removes blanks 2

Is it a repetition of the same thing?
I would like to remove these elements before merging string variables...

Sorry but I don't understand what you mean and it seems like you are going off-topic in relation to your original question, remember that you have to narrow down the scope of your topic, this is not supposed to be a support chat or a consultancy.

I fully understand. I just don't want to bother you with further questions for these advanced (at least for me) codes as it's not easy to find answers in documentation or R help for something more than basic codes.
My understanding is that mutate_at statements work for all individual string variables whereas mutate statements for final, merged comments (all_comment). Is that correct?

Sorry but I still don't understand what you mean, try to exemplify your question with code.

That is all right. I used enough of your time and you help is significantly better than going through other (not always working) solutions found in R documentations or other websites.

I simply want to make sure my "try and go" codes are not too complicated and not overwritten if they can be simplified. I just have a feeling that my code is too long and mutate_at and mutate do the same thing as they both include identical elements like this one:

(regex("^(no\\scomment?|n/a|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)

That is all. Sorry if this question is silly. I simply prefer to ask this type of questions to experts like you rather then getting false, misleading information from other sources.

Yes, you are duplicating some actions, I think the best way to check for this is to execute your code command by command and check intermediate outputs

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.