Need to create a new variable with conditions from multiple variables

First, I've really tried to avoid asking this and I sincerely want to teach myself how to do this, but after hitting my head against a wall for a few days, I've only wasted time and have nothing to show for my work. I can't even put out a reprex (I read how to do that, I'm so confused I can't even guess what code to start with).

My goal is to create a new variable in an existing dataset. I will call this new variable ha_rescue. Options for this will be "yes" or "no". I have 5 other columns, called "new_drug_1" up to "new_drug_5". If any of 6 words are in any of these columns, then I would want "yes" in my new variable "ha_rescue". If there is nothing or any other word in those columns, I would want "no" in variable ha_rescue.

My dataset is called "ha".

From what I have gathered, I should be using the dplyr package and mutate? I've seen examples of people creating new variables from numbers, but not from characters. The examples I see on youtube or other help sites have a lot of code and without starting their project from scratch I get pretty confused. I am new to R Studio and used it for my first semester in a clinical research masters program so I'm still a novice but I want to become proficient.

TLDR: Is dplyr the best way to create a new variable with specific character requirements from 5 other character variables?

I think this will help you. I first made an example dataset with 5 columns new_drug_1 through new_drug_5 and saved it as ha - you already have something that looks like this. Then I make a vector of the words I'm searching for. Then I create the ha_rescue variable by checking if each column is in the word list. The | symbol is OR in R.

library(tidyverse)

#Make an example dataset



allwords <- c("apple", "banana", "carrot", "cucumber", "lettuce", "tomato")

ha <- tibble(new_drug_1=sample(allwords, 200, replace=TRUE),
             new_drug_2=sample(allwords, 200, replace=TRUE),
             new_drug_3=sample(allwords, 200, replace=TRUE),
             new_drug_4=sample(allwords, 200, replace=TRUE),
             new_drug_5=sample(allwords, 200, replace=TRUE))

ha
#> # A tibble: 200 x 5
#>    new_drug_1 new_drug_2 new_drug_3 new_drug_4 new_drug_5
#>    <chr>      <chr>      <chr>      <chr>      <chr>     
#>  1 carrot     banana     banana     banana     tomato    
#>  2 tomato     lettuce    apple      carrot     banana    
#>  3 carrot     cucumber   lettuce    tomato     cucumber  
#>  4 lettuce    carrot     banana     carrot     apple     
#>  5 banana     cucumber   banana     apple      cucumber  
#>  6 apple      apple      cucumber   banana     banana    
#>  7 carrot     cucumber   lettuce    cucumber   apple     
#>  8 lettuce    cucumber   apple      carrot     cucumber  
#>  9 apple      lettuce    cucumber   lettuce    cucumber  
#> 10 lettuce    tomato     apple      lettuce    tomato    
#> # ... with 190 more rows

#This would be a vector of your 6 words you are searching for
#In my case, it just has 2 words
wordlist <- c("apple", "banana") 

ha %>%
  mutate(ha_rescue=if_else(
    new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
    "yes",
    "no"
  ))
#> # A tibble: 200 x 6
#>    new_drug_1 new_drug_2 new_drug_3 new_drug_4 new_drug_5 ha_rescue
#>    <chr>      <chr>      <chr>      <chr>      <chr>      <chr>    
#>  1 carrot     banana     banana     banana     tomato     yes      
#>  2 tomato     lettuce    apple      carrot     banana     yes      
#>  3 carrot     cucumber   lettuce    tomato     cucumber   no       
#>  4 lettuce    carrot     banana     carrot     apple      yes      
#>  5 banana     cucumber   banana     apple      cucumber   yes      
#>  6 apple      apple      cucumber   banana     banana     yes      
#>  7 carrot     cucumber   lettuce    cucumber   apple      yes      
#>  8 lettuce    cucumber   apple      carrot     cucumber   yes      
#>  9 apple      lettuce    cucumber   lettuce    cucumber   yes      
#> 10 lettuce    tomato     apple      lettuce    tomato     yes      
#> # ... with 190 more rows

Created on 2019-12-23 by the reprex package (v0.3.0)

2 Likes

To use reprex, do the following steps:

  1. load the reprex package by using the command library(reprex)
  2. Copy the code you want to run into reprex - don't copy what went to the console. It shouldn't have the > at the beginning of lines, this is just the code. Note you need to include everything including reading in data as the reprex is a standalone session and won't use what is in your environment
  3. Run the command reprex(). This will run whatever you have copied.

This video might be helpful: [Video] Reproducible Examples and the `reprex` package

@StatSteph Thanks again, I watched the video and tried to recreate her example, even using the basic code in the first 2 minutes, and it didn't work with that either. Here is a screen shot

You only need to copy the code, not paste it

Then you will have in your clipboard, the following:

# Only copy code below, don't paste

x <- factor("a")
y <- factor("b")
c(x, y)
#> [1] 1 1

Created on 2019-12-24 by the reprex package (v0.3.0)

Alternatively, you can put the code within curly braces as seen below:

reprex(x={
  # Only copy code below, don't paste
  
  x <- factor("a")
  y <- factor("b")
  c(x, y)
  
})
1 Like

Gotcha, here we go

ha %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> Error in ha %>% mutate(ha_rescue = if_else(new_drug_1 %in% wordlist | : could not find function "%>%"

I was expecting to see a new variable added on the far right, didn't happen

Thanks, the error is informative. You don't have the function %>% which comes in the package magrittr. You didn't copy your entire reprex into the window so I can't tell which packages you did load, but I would suggest loading the entire tidyverse package. You'll see my code in my first response does that.

I reinstalled reprex and tidyverse, here is the reprex w/ that information in it.

library(reprex)
#> Warning: package 'reprex' was built under R version 3.6.2
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.2
wordlist <- c("DROPERIDOL", "METOCLOPRAMIDE", "OLANZAPINE", "PROCHLORPERAZINE", "ELETRIPTAN", "SUMATRIPTAN")
ha %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> Error in eval(lhs, parent, parent): object 'ha' not found

A reprex (short for reproducible example) has to be, by definition, reproducible, and we can't reproduce your code since you are not providing sample data on a copy/paste friendly format (i.e. ha data frame) thus getting you into an endless back and forth
Please try to follow this guide an make a proper reproducible example illustrating your issue

That error means you don't have an object named ha in your work space. That dataframe must be read in somehow. I suggest you take a look at some intro to R materials, maybe the R for Data Science book. https://r4ds.had.co.nz/

Yeah, I've been reading that today. I know how to load a dataframe, it states it's loaded on the top right area. What's curious is, I read the tutorial recommended to me by @andresrcs, and realized that most reprex's should have some data in it, my dataset has 102 colums which wouldn't work for a reprex, so I created a new df by just selecting the columns (new_drug1 - 5). Running that actually worked, but wasn't reflected in the reprex which I wanted to share with you all.

Reprex below

ha_reprex_df %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))
#> Error in ha_reprex_df %>% mutate(ha_rescue = if_else(new_drug_1 %in% wordlist | : could not find function "%>%"

Screenshot of it Working

Read the guide more carefully, a reprex must be self contained so it has to include library calls and creation of the sample data on the code itself, regardless if the data exist in your current environment or not (code being reprexed is run on an independent clean R session)

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.2
wordlist <- c("DROPERIDOL", "METOCLOPRAMIDE", "OLANZAPINE", "PROCHLORPERAZINE", "ELETRIPTAN", "SUMATRIPTAN")
ha_reprexdf2<-structure(list(new_drug_1 = c("DROPERIDOL", "PROCHLORPERAZINE", 
                                            "PROCHLORPERAZINE", "METOCLOPRAMIDE", "PROCHLORPERAZINE"), new_drug_2 = c("ONDANSETRON", 
                                                                                                                      "DIPHENHYDRAMINE", "ONDANSETRON", "DROPERIDOL", "DROPERIDOL"), 
                             new_drug_3 = c("ONDANSETRON", "", "ONDANSETRON", "", ""), 
                             new_drug_4 = c("PROCHLORPERAZINE", "", "ACETAMINOPHEN", "", 
                                            ""), new_drug_5 = c("", "", "ACETAMINOPHEN", "", "")), row.names = c(NA, 
                                                                                                                 -5L), class = c("tbl_df", "tbl", "data.frame"))
ha_reprexdf2 %>%
  mutate(ha_rescue=if_else(
    new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
    "yes",
    "no"
  ))
#> # A tibble: 5 x 6
#>   new_drug_1     new_drug_2     new_drug_3 new_drug_4     new_drug_5   ha_rescue
#>   <chr>          <chr>          <chr>      <chr>          <chr>        <chr>    
#> 1 DROPERIDOL     ONDANSETRON    ONDANSETR… PROCHLORPERAZ… ""           yes      
#> 2 PROCHLORPERAZ… DIPHENHYDRAMI… ""         ""             ""           yes      
#> 3 PROCHLORPERAZ… ONDANSETRON    ONDANSETR… ACETAMINOPHEN  ACETAMINOPH… yes      
#> 4 METOCLOPRAMIDE DROPERIDOL     ""         ""             ""           yes      
#> 5 PROCHLORPERAZ… DROPERIDOL     ""         ""             ""           yes

So the point of this is to find the error in my coding, but what I did in the reprex works. When I convert my reprex friendly code to what I was trying to do before (on the original dataset with all the variables I will need), still no success, but I can't share that reprex since the original dataset has 102 columns and 6000 rows. I'm proud of myself for getting my reprex to work but I feel like I'm where I started?

Try to find a subset of your data that reproduces the issue, or you could share a link (Dropbox, Google Drive, Box, etc) to your actual dataset so we can take a look.

I'll keep at it, it has patient sensitive information in it so I can't share it, even if I delete the sensitive stuff (knowing me, I'd still find a way to screw that up). While I need to get this done, can't lose my job over it...

@StatSteph

So I figured it out! I was expecting a new variable to pop up on top and it did not. So I assigned a new name to the code you gave me and can pull it up that way. Thank you all for your help, especially on Christmas Eve / Christmas Day :).

ha_rescue <- ha %>%
mutate(ha_rescue=if_else(
new_drug_1 %in% wordlist | new_drug_2 %in% wordlist | new_drug_3 %in% wordlist |new_drug_4 %in% wordlist |new_drug_5 %in% wordlist,
"yes",
"no"
))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.