Hi everybody!
I am working with long texts from the newspapers texts and I want
to create new variables to codify some topics of the news.
For example, if the content of the title refers to labor or an educational issue.
I want to codify every new with an 'issue' variable containing 'labor' or 'education' as categories.
The reprex:
news_DF <- tibble(newspaper=c('New York Times', 'Washington Post', 'The Times', 'The Times'),
title=c('Workers are striking all over the world',
'Workers are not striking in March 2009',
'The scholarship students in America are not well paid',
'The US employees are not part of the working class'))
The words referring to the 'labor' type of issue can be:
labor_vector <- c('workers', 'teachers', 'employees', 'unions', 'AFL-CIO')
How I do that without writing every single element of a long list of words-
as the code below- but using vectors like the 'labor_vector'?
Here I have an example of a sort of function factory, for in this case labor, this might be repeated to a few others; it may even be possible to do a function factory factory if there are too many categories.
library(tidyverse)
news_DF <- tibble(newspaper=c('New York Times', 'Washington Post', 'The Times', 'The Times'),
title=c('Workers are striking all over the world',
'Workers are not striking in March 2009',
'The scholarship students in America are not well paid',
'The US employees are not part of the working class'))
labor_vector <- c('workers', 'teachers', 'employees', 'unions', 'AFL-CIO')
labor_funcs <- map(labor_vector,
~function(x)str_detect(tolower(x),
pattern = tolower(.x)))
labor_eval <- function(x) {
any(map_lgl(labor_funcs, ~ .x(x)))}
# test
> labor_eval("workers")
[1] TRUE
> labor_eval("workdrs")
[1] FALSE
news_DF |> rowwise() |>
mutate(issue=
case_when(labor_eval(title) ~ 'labor')) |>
ungroup()