Merging string variables with some exclusions

Slavek · October 16, 2019, 1:43pm

Hi,
I'm trying to merge all string variables containing "Com" in their names.

I have prepared the code below but mutate_at is not working properly.

library(dplyr)
library(stringr)

TM.data <- data.frame(stringsAsFactors=FALSE,
                                                      DF.URN = c("fds", "xdgx", "gvx", "ryh", "jhgjf", "df", "fg", "jgg",
                                                                 "gjg"),
                                                         Rec = c(10, 10, 8, 10, 8, 5, 10, 8, 7),
                                                      SatCom = c("Nothing", "Great service.", NA, "NA", "xxxxxx",
                                                                 "No comment",
                                                                 NA, NA, NA),
                                                    AltTrCom = c("NA", "NA", "NA", "NA", "NA", "…....", "NA", "NA", NA),
                                                      EnvCom = c("NA", "NA", "NA", "NA", "NA", "no complaints.", "NA", "NA",
                                                                 "Car park"),
                                                    StaffCom = c("NA", "NA", "bla bla blda", "NA", "NA", "NA", "NA", NA, NA),
                                                    ValueCom = c("NA", "NA", "NA", "NA", "Extend the service.", "NA", "NA",
                                                                 "NA", "NA"),
                                                  WaitingCom = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", NA,
                                                                 "Not applicable"),
                                                WorkComplCom = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", NA,
                                                                 "Not applicable"),
                                                  ContactCom = c("xxx", "no complaints", "NA", "something weird", "NA",
                                                                 "NA", "NA",
                                                                 NA,
                                                                 "Not applicable")
                                             )

TM.data <- TM.data %>%
  mutate_at(vars(matches("com$")), ~str_remove_all(.x, "^.{1,5}$"), ~str_remove_all(.x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"), ~str_remove_all(.x, "^(NA)$")) %>% # Remove blanks
  mutate(all_comment = paste(SatCom, AltTrCom, EnvCom, StaffCom, ValueCom, WaitingCom, WorkComplCom, ContactCom, sep="/"), # Merges comment variables
         all_comment = str_remove_all(all_comment, "(.)\\1{2,}"), # Removes repeted characters
         all_comment = str_remove_all(all_comment, "NA"), # Removes NAs
         all_comment = str_remove_all(all_comment, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"), # Removes blanks 2
         all_comment = str_remove_all(all_comment, "[:cntrl:]"), # Removes control characters like /n/r
         all_comment = str_replace_all(all_comment, "\\s\\s+", " "),  #Removes duplicated /
         all_comment = str_replace_all(all_comment, "//+", "/")) # Removes extra spaces

TM.data$all_comment <- str_remove(TM.data$all_comment, "/$") # Removes / in the end


TM.data

I still get merged comments with "Not applicable" and "No comment".
Can you help please?

jcblum · October 16, 2019, 2:18pm

Slavek:

mutate_at(
  vars(matches("com$")), 
  ~str_remove_all(.x, "^.{1,5}$"), 
  ~str_remove_all(.x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"), 
  ~str_remove_all(.x, "^(NA)$")
) %>% # Remove blanks

If you want to pass multiple functions to mutate_at(), you need to wrap them inside a list() — however, this is designed for cases where you want to run multiple independent functions on the same set of columns, not multiple successive functions like what you are doing. This detail gets a bit lost in the main text of the mutate_at() documentation, but it’s a lot clearer if you read through the examples.

I can think of a few different options for doing what you want. Here are two:

Absolute simplest: call mutate_at() multiple times in a row, and make sure to wrap your single, unnamed function in a list() so that the columns are modified in place instead of new columns being created (see the final example in the documentation). So something like:

mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^.{1,5}$"))) %>% 
mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"))) %>%
mutate_at(vars(matches("com$")), list(~str_remove_all(.x, "^(NA)$"))) %>%

Less repetitive: Write a small function that applies all of your successive transformations, and call that inside mutate_at(). For example:

remove_spaces <- function(x) {
  str_remove_all(x, "^.{1,5}$") %>%
  str_remove_all(x, "^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
  str_remove_all(x, "^(NA)$")
}

TM.data %>%
  mutate_at(vars(matches("com$")), list(remove_spaces)) %>%
  # etc

andresrcs · October 16, 2019, 2:28pm

A small correction for this

remove_spaces <- function(x) {
    x %>% 
    str_remove_all("^.{1,5}$") %>%
    str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
    str_remove_all("^(NA)$")
}

Slavek · October 16, 2019, 2:41pm

Thank you but nothing has changed after using that

remove_spaces <- function(x) {
    x %>% 
    str_remove_all("^.{1,5}$") %>%
    str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$") %>%
    str_remove_all("^(NA)$")
}
TM.data %>%
  mutate_at(vars(matches("com$")), list(remove_spaces)) %>%

I still can see "Not applicable" and "No comment" in my merged comments...

andresrcs · October 16, 2019, 3:30pm

That is because you haven't made the regex case-insensitive, you already know how to do that from your previous topics, give it a try, the idea is that you learn from our answers, not to simply copy/paste the coding solutions.

Slavek · October 17, 2019, 9:58am

I've tried multiple options:

  str_remove_all("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE) %>%
    str_remove_all("^(NA)$", ignore_case = TRUE)
---
   str_remove_all(ignore.case("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$")) %>%
    str_remove_all(ignore.case("^(NA)$"))
---
       str_remove_all(grepl("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$"),ignore.case=TRUE) %>%
    str_remove_all(grepl("^(NA)$"),ignore.case=TRUE)

And I cannot find any help in documentation

andresrcs · October 17, 2019, 12:25pm

You already have asked this before (several times in fact) you just have to check your previous topics, for example, this one

Slavek · October 17, 2019, 1:26pm

I know andresrcs and thank you for being so patient. I've gone through all your previous responses but I can see that

, ignore_case = TRUE)

works well with str_detect or with regex but not with str_remove_all

andresrcs · October 17, 2019, 2:00pm

This doesn't make any sense, str_remove_all() (like any other stringr function) also accepts expressions constructed with regex(), this is as simple as

str_remove_all(regex("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)) %>%

Slavek · October 17, 2019, 2:28pm

Aaaaa, sure!!! Silly me!

Thank you!

Slavek · October 17, 2019, 2:45pm

Now I have a question as I want to understand functionality of this specific "remove_spaces" function.

Is removing not required phrases thank to the function or to these lines?

         all_comment = str_remove_all(all_comment, "NA"), # Removes NAs
         all_comment = str_remove_all(all_comment, regex("^(no\\scomment?|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)), # Removes blanks 2

Is it a repetition of the same thing?
I would like to remove these elements before merging string variables...

andresrcs · October 17, 2019, 2:54pm

Sorry but I don't understand what you mean and it seems like you are going off-topic in relation to your original question, remember that you have to narrow down the scope of your topic, this is not supposed to be a support chat or a consultancy.

Slavek · October 17, 2019, 3:20pm

I fully understand. I just don't want to bother you with further questions for these advanced (at least for me) codes as it's not easy to find answers in documentation or R help for something more than basic codes.
My understanding is that mutate_at statements work for all individual string variables whereas mutate statements for final, merged comments (all_comment). Is that correct?

andresrcs · October 17, 2019, 4:27pm

Sorry but I still don't understand what you mean, try to exemplify your question with code.

Slavek · October 22, 2019, 8:49am

That is all right. I used enough of your time and you help is significantly better than going through other (not always working) solutions found in R documentations or other websites.

I simply want to make sure my "try and go" codes are not too complicated and not overwritten if they can be simplified. I just have a feeling that my code is too long and mutate_at and mutate do the same thing as they both include identical elements like this one:

(regex("^(no\\scomment?|n/a|Not\\sApplicable|nothing|^\\s*n.?a.?\\s*$)$", ignore_case = TRUE)

That is all. Sorry if this question is silly. I simply prefer to ask this type of questions to experts like you rather then getting false, misleading information from other sources.

andresrcs · October 22, 2019, 12:37pm

Yes, you are duplicating some actions, I think the best way to check for this is to execute your code command by command and check intermediate outputs

system · November 12, 2019, 12:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.