Hello RStudio Community folks!
I am trying to count the number of word occurrences in a tibble (after tokenization from the tidytext::unnest_tokens()
function), but can't seem to figure out how to do this with stringr::str_count()
:
WordOccurrenceTest <- tibble::tribble(
~word, ~text,
"abnormal", "if this was not abnormal, consider changing from abnormal.",
"abnormal", "if this was not abnormal, consider changing from abnormal."
)
I want to count the number of times 'abnormal' occurs (or whatever word exists in word
) in text
, so I thought it would be:
WordOccurrenceTest %>%
mutate(
word_occurrence = sum(str_count(text, as.character(word)))
)
But this gives me 4
in word_occurrence
.
# A tibble: 2 × 3
word text word_occurrence
<chr> <chr> <int>
1 abnormal if this was not abnormal, consider changing from abnormal. 4
2 abnormal if this was not abnormal, consider changing from abnormal. 4
I can do this with base R, but it gives a warning:
WordOccurrenceTest %>%
mutate(
word_occurrence = lengths(regmatches(text, gregexpr(as.character(word), text)))
)
# A tibble: 2 × 3
word text word_occurrence
<chr> <chr> <int>
1 abnormal if this was not abnormal, consider changing from abnormal. 2
2 abnormal if this was not abnormal, consider changing from abnormal. 2
Warning message:
Problem with `mutate()` column `word_occurrence`.
ℹ `word_occurrence = lengths(regmatches(text, gregexpr(as.character(word), text)))`.
ℹ argument 'pattern' has length > 1 and only the first element will be used
Any help on how to get the output from str_count() to produce the rowwise count of each word occurrence from the word
column would be great!
Thank you so much for your time!