How to create new values based on an original column and rename them in a new column

Thank you in advance for any help!

I have a dataframe about book bans with an original column titled "reason" but the values from the dataset are inconsistently written. I have some code written that is creating a new column "specific reason" that is aggregating all the "reason" values that include certain words to make categories (ie: "race" and "racial" could be renamed as "racism" in the new column). How can I add more words/categories to my new column ("specific reason")? The code I'm working with now is below, but I'm not sure how to add to it. I'd also like to convert all the blank values in the column to be named "NA".

add_specific_reason <- function(messy_reason) {

case_when(str_detect(messy_reason, "nude|sex|nudity") == TRUE
& str_detect(messy_reason, "\brace\b|racial") ~ "sex; race", str_detect(messy_reason, "nude|sex|nudity") == TRUE ~ "sex",
str_detect(messy_reason, "\brace\b|racial") == TRUE ~ "race",
str_detect(messy_reason, "violent|violence") == TRUE ~ "violence",
)

}

This is one solution. I've added some sample data, and I am not operating within a function, but the method should be the same. You could transfer this to a function if you wish.

# package library
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.2
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.2
#> Warning: package 'readr' was built under R version 4.2.2
#> Warning: package 'purrr' was built under R version 4.2.2
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.2
#> Warning: package 'forcats' was built under R version 4.2.2
#> Warning: package 'lubridate' was built under R version 4.2.2
library(janitor)
#> Warning: package 'janitor' was built under R version 4.2.2
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

# sample data 
sam_dat <- tibble(
  reason_messy = sample(
    x = c("nude", "sex", "nudity", "race", "racial", "violent", "violence", "", "blue"),
    size = 25,
    replace = TRUE
  ),
  book = as.character(seq(1, 25, 1))
)

# create new variable reason_specific
sam_dat <- sam_dat %>%
  mutate(
    reason_specific = factor(case_when(
      # sex
      str_detect( 
        string = reason_messy,
        pattern = "nude|sex|nudity"
      ) ~ "sex",
      # race
      str_detect( 
        string = reason_messy,
        pattern = "race|racial"
      ) ~ "race",
      # violence
      str_detect( 
        string = reason_messy,
        pattern = "violent|violence"
      ) ~ "violence",
      # convert all the blank values in the column to be named "NA".
      reason_messy == "" ~NA_character_,
      # other
      TRUE ~ "other" 
    ))
  ) 

# frequency table
sam_dat %>%
  tabyl(reason_specific)
#>  reason_specific  n percent valid_percent
#>            other  3    0.12     0.1304348
#>             race  4    0.16     0.1739130
#>              sex 12    0.48     0.5217391
#>         violence  4    0.16     0.1739130
#>             <NA>  2    0.08            NA

Created on 2023-05-24 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.