Adding sentiment variables to the existing responses

Dear R Studio masters,

I have already spent so much time on this challenging task (probably easy for you) and I am giving up :sob:
I have this simple file:

data.frame(stringsAsFactors=FALSE,
   Unique.respondent.number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
                    comment = c("I have seen many various charges in my life,                                                                                                    but I don't like your saving rates",
            "I like R Studio", "No comment",
            "Main benefit is having low charges",
            NA,
            "Charge could be an issue",
            "Issues with saving rates", "Good saving rates",
            "Many benefits like reasonable charges", "N/A", "-",
            "Nothing", "zzzz",
            "I like saving rates but charges are poor")
)

My task was creating sentiment categories based on key words detected in the "comment" field

Now I would like to create 3 new variables: Positive_Com, Negative_Com, Neutral_Com with 1 or 0 values based on sentiment in DictionaryGI. I need to add these 3 new variables to the existing df (so 3 values for each response). I need 3 separate variables (instead of a standard sentiment statement or value) because first two may overlap (for example for response "I like saving rates but charges are poor" we should get Positive_Com=1, Negative_Com=1, Neutral_Com=1).

I think I ma close. I can run this:

tmresult<-analyzeSentiment(comment)

tmresult
  WordCount SentimentGI NegativityGI PositivityGI SentimentHE NegativityHE PositivityHE SentimentLM NegativityLM PositivityLM RatioUncertaintyLM SentimentQDAP NegativityQDAP PositivityQDAP
1          9   0.1111111    0.1111111    0.2222222   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.2222222      0.0000000      0.2222222
2          2   0.5000000    0.0000000    0.5000000   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.5000000      0.0000000      0.5000000
3          1   0.0000000    0.0000000    0.0000000   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.0000000      0.0000000      0.0000000
4          4   0.0000000    0.5000000    0.5000000  -0.2500000         0.25    0.0000000   0.2500000          0.0    0.2500000                  0     0.2500000      0.0000000      0.2500000
5          0         NaN          NaN          NaN         NaN          NaN          NaN         NaN          NaN          NaN                NaN           NaN            NaN            NaN
6          2  -0.5000000    0.5000000    0.0000000   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0    -0.5000000      0.5000000      0.0000000
7          3   0.3333333    0.0000000    0.3333333   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.0000000      0.3333333      0.3333333
8          3   0.6666667    0.0000000    0.6666667   0.3333333         0.00    0.3333333   0.3333333          0.0    0.3333333                  0     0.6666667      0.0000000      0.6666667
9          5   0.4000000    0.2000000    0.6000000   0.0000000         0.00    0.0000000   0.2000000          0.0    0.2000000                  0     0.6000000      0.0000000      0.6000000
10         0         NaN          NaN          NaN         NaN          NaN          NaN         NaN          NaN          NaN                NaN           NaN            NaN            NaN
11         0         NaN          NaN          NaN         NaN          NaN          NaN         NaN          NaN          NaN                NaN           NaN            NaN            NaN
12         1   0.0000000    0.0000000    0.0000000   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.0000000      0.0000000      0.0000000
13         1   0.0000000    0.0000000    0.0000000   0.0000000         0.00    0.0000000   0.0000000          0.0    0.0000000                  0     0.0000000      0.0000000      0.0000000
14         5   0.0000000    0.4000000    0.4000000   0.0000000         0.00    0.0000000  -0.2000000          0.2    0.0000000                  0     0.2000000      0.2000000      0.4000000
> 

And then say:
if PositivityGI>0 Positive_Com=1 otherwise 0
if NegativityGI>0 Negative_Com=1 otherwise 0
if SentimrntGI=0 Neutral_Com=1 otherwise 0

Obviously I'm opened to any technique or package. I simply need to add 3 sentiment variables to the existing data frame.

Can you help?

I have found not very elegant but working way around missing comments or comments shorter than 3 characters. I can add this straight after importing data from excel:

source$comment[ is.na(source$comment) ] <- "blank"
source$comment[ str_length(source$comment)<3 ] <- "blank"

I believe you're more clever than I am and you can include the above in the "blank_statements" code...

Hurray! I've got that:

# Creating blank variable if no comment or comment shorter than 3 characters

source$comment[ is.na(source$comment) ] <- "blank"
library(stringr)
source$comment[ str_length(source$comment)<3 ] <- "blank"
source

# Specifying blank statements

library(dplyr)
blank_statements <- regex("no\\scomment?|nothing|\\sno\\s|blank|^\\s*n.?a.?\\s*$", ignore_case = TRUE)


#  Sentiment set up

library(SentimentAnalysis)
tmresult<-analyzeSentiment(source$comment)
tmresult

library(dplyr)
negative_sentiments <- if_else(((tmresult$NegativityGI>0 & tmresult$NegativityHE>0 & tmresult$NegativityLM>0)
                                | str_detect(source$comment, regex("don.?t\\slike|issue|bad", ignore_case = TRUE))) & !str_detect(source$comment, regex("reasonable", ignore_case = TRUE)),1,0)
positive_sentiments <- if_else(((tmresult$PositivityGI>0 & tmresult$PositivityHE>0 & tmresult$PositivityLM>0)
                                | str_detect(source$comment, regex("benefit|like", ignore_case = TRUE))) & !str_detect(source$comment, regex("bad", ignore_case = TRUE)),1,0)
neutral_sentiments <- if_else(((tmresult$SentimentGI==0 & tmresult$SentimentHE==0 & tmresult$SentimentLM==0)), 1, 0)

negative_sentiments
neutral_sentiments
positive_sentiments



# Categories (without Other)

library(dplyr)
library(stringr)
result <- source %>% 
  mutate(Positive_Com = if_else(positive_sentiments==1, 1, 0),
         Negative_Com = if_else(negative_sentiments==1, 1, 0),
         Neutral_Com = if_else(neutral_sentiments==1, 1, 0),
         Charges_Fees = if_else(str_detect(comment, regex("charges?", ignore_case = TRUE)) &
                                  !str_detect(comment, regex("benefits?", ignore_case = TRUE)), 1, 0),
         Good_Rates = if_else(str_detect(comment, regex("saving\\srates", ignore_case = TRUE)) &
                                (positive_sentiments==1), 1, 0),
         Poor_Rates = if_else(str_detect(comment, regex("saving\\srates", ignore_case = TRUE)) &
                                (negative_sentiments==1), 1, 0),
         Blank = if_else(str_detect(comment, blank_statements)|(str_detect(comment, "(.)\\1{4,}")), 1, 0)) %>% 
  mutate_if(is.numeric, ~if_else(is.na(.), 0, .))

result


# Creating "Other" category

result$Other <-result$Charges_Fees+result$Good_Rates+result$Poor_Rates+result$Blank
result
result$Other <- ifelse(result$Other==0, 1,0)
result

# Removing Sentiment from Blanks

result$Neutral_Com <- ifelse(result$Blank==1, 0,result$Neutral_Com)
result

Just a little feedback for your code, there is no need for duplicated library calls on each section, you just need to load them once per R session and is a good practice to keep all the library calls at the beginning of your script.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.