This example shows how to create the first two variables using dplyr and regular expressions, you should be able to complete the rest by doing something similar.

df <- data.frame(stringsAsFactors=FALSE,
                 Unique.respondent.number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 comment = c("I have seen many various charges in my life,
                                but I don’t like your saving rates",
                             "I like R Studio", "No comment",
                             "Main benefit is having low charges", NA, "Charge could be an issue",
                             "Issues with saving rates", "Good saving rates",
                             "Many benefits like reasonable charges", "NA"))
library(dplyr)
library(stringr)

df %>% 
    mutate(Charges_Fees = if_else(str_detect(comment, "charges?") & !str_detect(comment, "benefits?"), 1, 0),
           Poor_Rates = if_else(str_detect(comment, "don.?t\\slike|issue") & str_detect(comment, "saving\\srates"), 1, 0)) %>% 
    select(-comment) # I have diselected this long variable just for printing purposes

#>    Unique.respondent.number Charges_Fees Poor_Rates
#> 1                         1            1          1
#> 2                         2            0          0
#> 3                         3            0          0
#> 4                         4            0          0
#> 5                         5           NA         NA
#> 6                         6            0          0
#> 7                         7            0          0
#> 8                         8            0          0
#> 9                         9            0          0
#> 10                       10            0          0

Created on 2019-07-04 by the reprex package (v0.3.0)

Thank you very much but:

  1. I have the following error
Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"
  1. The results are incorrect (respondent 6 and 7) and we should get the following (only integers 0 or 1 are allowed):
#>    Unique.respondent.number Charges_Fees Poor_Rates
#> 1                         1            1          1
#> 2                         2            0          0
#> 3                         3            0          0
#> 4                         4            0          0
#> 5                         5            0          0
#> 6                         6            1          0
#> 7                         7            0          1
#> 8                         8            0          0
#> 9                         9            0          0
#> 10                       10            0          0

Also, can we assign a list of words which characterise negative sentiment (such as "don't like", "issue" etc.) and a list of words which describe empty response (such as "no comment", "nothing to say"," NA" etc) separately and use a reference to them in a code?

I really appreciate your help and I can create more rules after “Charges/ Fees”, “Poor Rates” but "Other" is conditional (if a comment does not meet any previous requirement and is not blank then it should become "Other".). Is it a simple condition?

Thank you,
Slavek

For your first point I can't know why are you getting that error without a reproducible example (it works for me on a clean environment with the sample data provided).

For the second point, it's giving incorrect results because I forgot to make it case insensitive, this would fix that.

df <- data.frame(stringsAsFactors=FALSE,
                 Unique.respondent.number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 comment = c("I have seen many various charges in my life,
                                but I don’t like your saving rates",
                             "I like R Studio", "No comment",
                             "Main benefit is having low charges", NA, "Charge could be an issue",
                             "Issues with saving rates", "Good saving rates",
                             "Many benefits like reasonable charges", "NA"))
library(dplyr)
library(stringr)

df %>% 
    mutate(Charges_Fees = if_else(str_detect(comment, regex("charges?", ignore_case = TRUE)) &
                                      !str_detect(comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
           Poor_Rates = if_else(str_detect(comment, regex("don.?t\\slike|issue", ignore_case = TRUE)) &
                                    str_detect(comment, regex("saving\\srates", ignore_case = TRUE)), 1, 0)) %>% 
    select(-comment) %>% 
    mutate_all(~if_else(is.na(.), 0, .))
#>    Unique.respondent.number Charges_Fees Poor_Rates
#> 1                         1            1          1
#> 2                         2            0          0
#> 3                         3            0          0
#> 4                         4            0          0
#> 5                         5            0          0
#> 6                         6            1          0
#> 7                         7            0          1
#> 8                         8            0          0
#> 9                         9            0          0
#> 10                       10            0          0

About the "only integers 0 or 1 are allowed" part, "NA" is not a character string, is the way R deals with missing values, it stands for "Not Available", but you can replace that with 0 if you want (as shown in the example above).

Yes, you can create the regular expression separately and reference it later

negative_sentiments <- regex("don.?t\\slike|issue|other words", ignore_case = TRUE)

You just have to find the right logical statement, to give you a hint, once you have created a variable with mutate you can refence its value, so you could check if any of the previos variables have value 1

I have new version of R installed 3.6.0, I refreshed everything and reinstalled dplyr but I have this error:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

I don't really know why this code works for you but it does not for me...
Still the same error:

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"

Do you know how I could fix that?

Slavek

Also, what should I include in the "Blank" code to indicate blank fields (if blank or "NA" or "No comments" than 1 otherwise 0)?

Slavek

Are you running the exact same code on a clean R session? try restarting your R sesion with Ctrl+Shift+F10

Yes, restarted, used your key combination, source display correctly:

datapasta::df_paste(source)
data.frame(stringsAsFactors=FALSE,
   Unique.respondent.number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                    comment = c("I have seen many various charges in my life,
                                but I don’t like your saving rates",
                                "I like R Studio", "No comment",
                                "Main benefit is having low charges", NA, "Charge could be an issue",
                                "Issues with saving rates", "Good saving rates",
                                "Many benefits like reasonable charges", "N/A")

# A tibble: 10 x 2
   `Unique respondent number` comment                                                                        
                        <dbl> <chr>                                                                          
 1                          1 I have seen many various charges in my life, but I don’t like your saving rates
 2                          2 I like R Studio                                                                
 3                          3 No comment                                                                     
 4                          4 Main benefit is having low charges                                             
 5                          5 NA                                                                             
 6                          6 Charge could be an issue                                                       
 7                          7 Issues with saving rates                                                       
 8                          8 Good saving rates                                                              
 9                          9 Many benefits like reasonable charges                                          
10                         10 N/A            

but issues with dplyr (maybe a special installation is required?):

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

and with your code...:

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"

:frowning_face:

Nope, I'm just using the CRAN version of dplyr, nothing special about it.

Ok, I have run the same code in the main R console (R x64 3.6.0) and it's working. What is the solution? Fresh R installation?

Actually, It worked once. Now an error in R (not R Studio):

Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
  argument `str` should be a character vector (or an object coercible to)

Also, a silly question but I cannot fix it myself:

# I have diselected this long variable just for printing purposes

How can I select the entire df? URN, comment and new variables?

I don't know the solution for the problem regarding running the code. Maybe Andres or some Rstudio people can help you with that. I can only say that it works perfectly for me.

select is a function in dplyr package, which can be used to select a subset of the columns. Here, Andres selected all but the comment column. If you want to have the whole data frame, just comment out that line.

If you are not familiar with these functions, you can check out Chapter 5 of R4DS, a free online book:

Thank you for your responses. I'm trying to find solutions with my limited R knowledge.
I cannot find any reference to the data source (called "source") in the code. Is it normal? We use reference to "comment" which is source$comment.
I have also tried running a part of the code without pipes:

result <- mutate(Charges_Fees = if_else(str_detect(comment, regex("charges?", ignore_case = TRUE)) &
                                          !str_detect(comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
                 Poor_Rates = if_else(str_detect(comment, regex("don.?t\\slike|issue", ignore_case = TRUE)) &
                                        str_detect(comment, regex("saving\\srates", ignore_case = TRUE)), 1, 0))

and the error is following:

Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
  argument `str` should be a character vector (or an object coercible to)

When I change 'comment' into 'source$comment':


result <- mutate(Charges_Fees = if_else(str_detect(source$comment, regex("charges?", ignore_case = TRUE)) &
                                          !str_detect(source$comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
                 Poor_Rates = if_else(str_detect(source$comment, regex("don.?t\\slike|issue", ignore_case = TRUE)) &
                                        str_detect(source$comment, regex("saving\\srates", ignore_case = TRUE)), 1, 0))

the error is different:

Error in mutate_(.data, .dots = compat_as_lazy_dots(...)) : 
  argument ".data" is missing, with no default

I don't want to give up so quickly!

In my answer I'm using your sample data and I have call it "df", you have to replace this with your own dataset, i.e. "source".

If you run mutate() without pipes then you have to provide the .data=source argument inside the function.

The solution for this is reading the book Yarnabrina pointed out in his response

Hurray!!!! Thank you for being so patient.
I knew it must have been a silly error! My little data example is called "source" so I used this name :slight_smile:

Final question please, please.

When I remove this bit

  select(-comment) %>% 

from the code (as suggested by Yarnabrina "If you want to have the whole data frame, just comment out that line.") and use this code:

source %>% 
  mutate(Charges_Fees = if_else(str_detect(comment, regex("charges?", ignore_case = TRUE)) &
                                  !str_detect(comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
         Poor_Rates = if_else(str_detect(comment, regex("don.?t\\slike|issue", ignore_case = TRUE)) &
                                str_detect(comment, regex("saving\\srates", ignore_case = TRUE)), 1, 0)) %>% 
  mutate_all(~if_else(is.na(.), 0, .))

my error is:

Error: `false` must be a double vector, not a character vector

What am I doing wrong? It's just simply removing one condition from the chain of pipes...

Yarnabrina is right, try this

source %>% 
    mutate(Charges_Fees = if_else(str_detect(comment, regex("charges?", ignore_case = TRUE)) &
                                      !str_detect(comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
           Poor_Rates = if_else(str_detect(comment, regex("don.?t\\slike|issue", ignore_case = TRUE)) &
                                    str_detect(comment, regex("saving\\srates", ignore_case = TRUE)), 1, 0)) %>% 
    mutate_if(is.numeric, ~if_else(is.na(.), 0, .))

You are my Master!

Thank you!!!

Sorry, my final, final question related to my initial request (not resolved yet).

I would like to create Blank variable if a comment contains "n/a", "nothing", "no comments" or is left blank.
How can I do that?

Sorry but I'm more inclined towards giving you pointers to help you solve your problems rather than doing your work for you, and I belive that you already have all the needed examples to solve this by your own, you just have to try to undestand the code and investigate a little bit.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.