Help me write this script for case_when inside dplyr::mutate and I'll acknowledge by name you in my article :)

Hi all,

I'm a newbie here and have spent days trying to figure out how to write this script. My data frame contains 50,000+ rows, so I definitely don't want to do this analysis manually. :slight_smile: If you can help me write this script successfully, I'd be happy to list you by name in the acknowledgments section of the journal article I'm writing. Here's what I'm trying to accomplish:

My data frame is called followers_df. I want to write a script that will create a new column vector called ismedia in the data frame that would contain the word TRUE if any of the following conditions are satisfied:

• If the column vector called UserName contains the string news
• If the column vector called UserName contains the string tv
• If the column vector called Bio contains the string news
• If the column vector called Bio contains the string reporter
• If the column vector called Bio contains the string journalist
• If the column vector called Bio contains the string radio
• If the column vector called Bio contains the string tv
• If the column vector called Bio contains the string television

This is what I wrote. It was an epic fail:

I tried to write a case_when statement inside mutate to create a new variable called ismedia that relies on one of several things in two different variables being true

mutate (ismedia = case_when (followers_df$UserName == "news | tv" , followers_df$Bio == "news|reporter|journalist|radio|tv|television")) %>%

Thank you in advance for your help! :slight_smile:
Kim

2 Likes

You could try this:

library(dplyr)
library(stringr)

followers_df %>% 
  mutate(
    ismedia = case_when(
      str_detect(UserName, "news") ~ TRUE, 
      str_detect(UserName, "tv") ~ TRUE, 
      str_detect(Bio, "news") ~ TRUE, 
      str_detect(Bio, "reporter") ~ TRUE, 
      str_detect(Bio, "journalist") ~ TRUE, 
      str_detect(Bio, "radio") ~ TRUE, 
      str_detect(Bio, "tv") ~ TRUE, 
      str_detect(Bio, "television") ~ TRUE, 
      TRUE ~ FALSE
    )
  )

There is probably a more concise version possible using regex.

The final condition sets any other string to FALSE.

4 Likes

Thank you SO much, John! This runs without error but the new column "ismedia" doesn't appear in the data frame. I must be doing something wrong. I'll reload and see if it works. Thanks!
Kim

A prose description of your input and what you are trying to do is not a good way to present a question like this. You should provide a reprex.

When looking for text you have to be really precise... by word do you mean a whole word or an embeded word... would the text "tvs" be a match for "tv" in your test.

Here is some info about reprexs and using them.

A prose description isn't sufficient, you also need to make a simple reprex that:

Builds the input data you are using.

The function you are trying to write, even if it doesn't work.

Usage of the function you are trying to write, even if it doesn't work.

Builds the output data you want the function to produce.

You can learn more about reprex's here:


Right now the is an issue with the version of reprex that is in CRAN so you should download it directly from github.

Until CRAN catches up with the latest version install reprex with

devtools::install_github("tidyverse/reprex")

if this gives you an error saying that devtools is not available then use

install.packages("devtools")

and try again.

In any case the reprex shows one way to use regular expressions to create the ismedia column.

suppressPackageStartupMessages(library(tidyverse))
df <- tribble(~UserName, ~Bio,
                            "apple", "tv",
                            "radio", "television",
                            "orange", "blue")

Bio_pat <- "\\b(news|reporter|journalist|radio|tv|television)\\b"
UserName_pat <- "\\b(news|tv)\\b"

mutate(df, ismedia = str_detect(UserName, UserName_pat) | str_detect(Bio, Bio_pat))
#> # A tibble: 3 x 3
#>   UserName Bio        ismedia
#>   <chr>    <chr>      <lgl>  
#> 1 apple    tv         TRUE   
#> 2 radio    television TRUE   
#> 3 orange   blue       FALSE

Created on 2018-03-26 by the reprex package (v0.2.0).

2 Likes

Sorry, martin.R - I thought you were John Martin. Thanks for your help! :slight_smile:
Kim

You need to assign the output to an object, e.g.:

library(dplyr)
library(stringr)

followers_df <- 
  followers_df %>% 
  mutate(
    ismedia = case_when(
      str_detect(UserName, "news") ~ TRUE, 
      str_detect(UserName, "tv") ~ TRUE, 
      str_detect(Bio, "news") ~ TRUE, 
      str_detect(Bio, "reporter") ~ TRUE, 
      str_detect(Bio, "journalist") ~ TRUE, 
      str_detect(Bio, "radio") ~ TRUE, 
      str_detect(Bio, "tv") ~ TRUE, 
      str_detect(Bio, "television") ~ TRUE, 
      TRUE ~ FALSE
    )
  )

(not John)

2 Likes

Thank you, danr. My apologies for the newbie faux pas. I've never written a reprex before but I'll make sure to learn moving forward. Thanks for your patience and assistance! :slight_smile:
Kim

Holy cannoli, Not John - that worked! Thank you so much. What name should I list in the acknowledgment section of my article?

No need for any acknowledgement. :grinning:

@danr's answer is more elegant using regular expressions, so you might wish to use that unless you wish to include more intricate criteria within the case_when sequence.

I'd like to acknowledge both of you. You were both really helpful! :slight_smile:
Kim

LOL, so that wasn't even me.

This is what I would do, but without dplyr:

match.UserName <- c("news","tv")
match.Bio <- c("news","reporter","journalist","radio","tv","television")

followers_df$ismedia <- grepl(paste(match.UserName,collapse="|"),followers_df$UserName,ignore.case=TRUE) | grepl(paste(match.Bio,collapse="|"),followers_df$Bio,ignore.case=TRUE)

That should create two vectors of keywords and then match those keywords and then mark those rows that match in the variables specified as true...

I think.

This is fabulous! Thank you so much. I'm going to acknowledge all three of you - you're heroes to me! :slight_smile:
Kim

2 Likes