Dear R masters,
I have gone through many text related projects with your help but I cannot solve this simple example myself again:
library(tidyverse)
library(stringr)
sample_data <- data.frame(stringsAsFactors=FALSE,
InterviewID = c(94, 59, 100, 86, 60, 101, 61, 7),
AComm_1 = c("None", "neen", "xxxxx.",
"None of products", "geen speciale", "geen commentaren",
"Goood!!!!", "aa"),
ModelLong = c("A", "A", "A",
"B", "B",
"B", "xxx", "xxx")
)
sample_data
# List of full sentencies which should be excluded (ecxlude if the sentence contains ONLY this element but it's not part of a full sentence)
blank_statements <- regex("none|geen\\sspeciale\\scommentaar|neen",
ignore_case = TRUE)
results <- sample_data %>%
mutate(TMC.Blank = ifelse(test = (is.na(x = sample_data$AComm_1)),yes = 1,
no = ifelse((test = (str_length(string = sample_data$AComm_1) < 4) | # Remove sentences with less than 4 characters
(str_detect(string = AComm_1,pattern = blank_statements))| # Remove sentences containing ONLY phrases listed in the blank_statements
(str_length(string = sample_data$AComm_1) < 10) & (str_detect(AComm_1, "(.)\\1{3,}"))| # Remove sentences shorter than 10 caracters containing repeated characters (like xxx, aaaaa)
(str_length(string = sample_data$AComm_1) < 10) & (str_detect(AComm_1, regex("none|neen", ignore_case = TRUE)) # Remove sentences shorter than 10 caracters containing specific words
),yes = 1,no = 0))
results
First of all, I think my syntax is overcomplicated. I don't think I should use reference to my data source with piping but the code is not working without references to sample_data.
Secondly, there are two ways of excluding "none" from my comments in my code and I think one is unnecessary: I simply want to replace sentences containing just "None" with a blank but keep other sentences containing "none" ("None of products").
Thirdly, I don't know why comments with repeated characters are not replaced by blanks in my results df.
Lastly, I don't know how to use exceptions to the repeated characters code. So I want to replace short comments including repeated characters such as xxx or aaaa by blanks but include !!!! ("Good!!!!").
Can you help?