ifelse statement only returns else values (when combined with mutate() and %in%)

I have a series of files in my directory that look like this

Title A.txt
Title A.txt
Title A.txt
Title A.txt
Title A.txt
Title B.txt
Title B.txt
Title B.txt

my code for preprocessing looks like this,

all <- readtext("*.txt")

tidy.all <- all %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)%>%
mutate(party = ifelse(doc_id %in% "A", "a", "b"))

the problem arises with the mutate function when I'm trying to add a new column, it only returns the else values for the entire dataframe and I don't know why it's doing that. I've tried to apply the same function on austen_books and there it works. I've even changed the doc_id variable into factors but to no avail.

library(janeaustenr)

books <- austen_books() %>%
mutate(party = ifelse(book %in% "Sense & Sensibility", "a", "b"))

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

1 Like

Thank you for wanting to help

here's a reprex that has the same issue

doc_id <- c("Title A.txt",
"Title A.txt",
"Title B.txt",
"Title B.txt",
"Title B.txt")

df <- as.data.frame(doc_id)

df$category <- ifelse(doc_id %in% "A", "A", "B")

result

df
doc_id category
1 Title A.txt B
2 Title A.txt B
3 Title B.txt B
4 Title B.txt B
5 Title B.txt B

What's weird is if I use the entire doc_id string in ifelse() like so

df$category <- ifelse(doc_id %in% "Title A.txt ", "A", "B")

then it does return what I want, the problem is that the only thing I can differentiate the documents is through the A & B element

That's not how the %in% operator works. Is this what you are trying to do?

doc_id <- c("Title A.txt",
            "Title A.txt",
            "Title B.txt",
            "Title B.txt",
            "Title B.txt")

df <- as.data.frame(doc_id)

df$category <- ifelse(grepl("A", doc_id), "A", "B")

print(df)
#>        doc_id category
#> 1 Title A.txt        A
#> 2 Title A.txt        A
#> 3 Title B.txt        B
#> 4 Title B.txt        B
#> 5 Title B.txt        B

Created on 2020-04-11 by the reprex package (v0.3.0)

Yes, that does it, thank you very much!
May I ask why the %in% operator wouldn't work in this instance?

Because you're trying to do a partial match. %in% will give you your desired result only if the entire string is found in the target vector. For partial matches, you have to go with grep() or similar pattern matching functions.

# This works.
doc_id %in% "Title A.txt"
[1]  TRUE  TRUE FALSE FALSE FALSE

# This doesn't.
doc_id %in% "A"
[1] FALSE FALSE FALSE FALSE FALSE

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.