I streamed tweets from Twitter for about 2hours and I have the tweets in a data frame. Now, I want to filter out the tweets that contain certain keywords like 'coronavirus' but R returns a 0*3 tibble
#sample data frame
tweet <- c("I was tested for coronavirus today", "my covid-19 test came out negative")
is_retweet <- c("TRUE", "FALSE")
is_quote <- c("FALSE", "FALSE")
df <- data.frame(tweet, is_retweet, is_quote)
.....and here is how I am trying to filter out the rows where the "tweet" column contains the keywords like covid, corona
The command %in% is looking for an exact match and would only catch cases that are exactly that word. I would suggest something like the following which uses the function str_detect and I made a case that has none of the keywords which you can see is no longer included.
library(tidyverse)
tweet <- c("I was tested for coronavirus today", "my covid-19 test came out negative" , "no key words")
is_retweet <- c("TRUE", "FALSE", "FALSE")
is_quote <- c("FALSE", "FALSE", "FALSE")
df <- data.frame(tweet, is_retweet, is_quote)
filter_df <- df %>%
select(tweet, is_quote, is_retweet) %>%
filter(str_detect(tweet, "covid|covid-19|face mask|pandemic|coronavirus|virus"))
filter_df
#> tweet is_quote is_retweet
#> 1 I was tested for coronavirus today FALSE TRUE
#> 2 my covid-19 test came out negative FALSE FALSE