Creating a new variable from existing column values

Hi everyone,

I am new to R and I am trying to replicate something similar to the SAS if x= 1 or if x=2 to create a new variable.

Could someone please help?

I'm both a SAS and a R user and think you need to look into the dplyr package. You'd use a combination of the mutate function and likely the case_when function to accomplish what you want. The following does the same thing in both R and SAS. Without more details, this is about as good of an example as I could think of.

Check out Section 5 Data transformation of this book: https://r4ds.had.co.nz/transform.html

data b;
set a;
if x=1 then y="Adult";
else if x=2 then y="Minor";
run;
library(tidyverse)
b <- a %>%
   mutate(
      y=case_when(
         x==1~"Adult",
         x==2~"Minor"
      )
      )

1 Like

Thanks very much. I'll try this and get back to you.

Thanks once again.

Good morning,

Thank you for your previous example.

I tried using mutate but got error messages

What I am trying to create in R is a variable with particular codes.

In SAS, I'll write my code like this

data y;
set x;
if x = "b1" or if x= "b2" or if x="b3" then a = "Yes"
else a = "No"

I don't know what the SAS code is doing, but you might be after case_when().

Or perhaps something with an %in% c("b1", "b2", "b3")

Hard to tell without a reproducible example.

1 Like

Thanks for your reply.

What I am trying to do is create a variable called Uber where certain codes will generate a "Yes" and any other codes will generate a "No".

Base_Number
B02869
B02617
B02872
B02788
B02765
B02788
B02872
B02877
B02864
B02875
B02902
B02872
B00248
B02875
B02872
B02865
B02888
B02765

something along Uber = "Yes" if Base_Number = B02869, B02617,B02872,B02788,B02765,B02788,
B02872,B02877,B02864,B02875,B02902,B02872,B00248,B02875,B02872,B02865,B02888,B02765

ifelse = "No"

> current_uber <- Active_current %>%
+     mutate(
+         Uber=case_when(
+             Base_Number=="B02395","B02404","B02510","B02512","B02617","B02682","B02764",
+             "B02765","B02774","B02788","B02800","B02835","B02836","B02844",
+             "B02864","B02865","B02866","B02867","B02869","B02870","B02871",
+             "B02872","B02875","B02876","B02877","B02878","B02879","B02880",
+             "B02882","B02883","B02884","B02887","B02888","B02889","B03125",
+             "B03126","B03136","B03144","B03223","B03234","B03235","B03252"
+             
+             ~"Yes",
+             ifelse~"No"
+         )
+     )
Error: Problem with `mutate()` input `Uber`.
x Case 1 (`Base_Number == "B02395"`) must be a two-sided formula, not a logical vector.
i Input `Uber` is `case_when(...)`.

Like this:

library(tidyverse)

# create data
Active_current <- tibble(
       Base_Number = c("B02869","B02617","B02872",
                       "B02788","B02765","B02788","B02872","B02877","B02864",
                       "B02875","B02902","B02872","B00248","B02875",
                       "B02872","B02865","B02888","B02765"))

# using if_else
current_uber <- Active_current %>% 
  mutate(Uber = if_else(Base_Number %in% c("B02395","B02404","B02510","B02512","B02617","B02682","B02764",
                                           "B02765","B02774","B02788","B02800","B02835","B02836","B02844",
                                           "B02864","B02865","B02866","B02867","B02869","B02870","B02871",
                                           "B02872","B02875","B02876","B02877","B02878","B02879","B02880",
                                           "B02882","B02883","B02884","B02887","B02888","B02889","B03125",
                                           "B03126","B03136","B03144","B03223","B03234","B03235","B03252"),
                        "Yes",
                        "No"))
  
  
# using case_when
current_uber2 <- Active_current %>% 
  mutate(Uber = case_when(Base_Number %in% c("B02395","B02404","B02510","B02512","B02617","B02682","B02764",
                                           "B02765","B02774","B02788","B02800","B02835","B02836","B02844",
                                           "B02864","B02865","B02866","B02867","B02869","B02870","B02871",
                                           "B02872","B02875","B02876","B02877","B02878","B02879","B02880",
                                           "B02882","B02883","B02884","B02887","B02888","B02889","B03125",
                                           "B03126","B03136","B03144","B03223","B03234","B03235","B03252") ~ "Yes",
                          TRUE ~"No"))

# A tibble: 18 x 2
   Base_Number Uber 
   <chr>       <chr>
 1 B02869      Yes  
 2 B02617      Yes  
 3 B02872      Yes  
 4 B02788      Yes  
 5 B02765      Yes  
 6 B02788      Yes  
 7 B02872      Yes  
 8 B02877      Yes  
 9 B02864      Yes  
10 B02875      Yes  
11 B02902      No   
12 B02872      Yes  
13 B00248      No   
14 B02875      Yes  
15 B02872      Yes  
16 B02865      Yes  
17 B02888      Yes  
18 B02765      Yes
1 Like

Thanks very much for your help!

1 Like

I am not sure how up-to-date this is but Bob Muenchen has a book out entitled * R for SAS and SPSS Users that you might find useful in general. http://r4stats.com/books/free-version/

I have not used SAS in years but I remember it was a real cultural/conceptual shock when i hit R.

1 Like

Thank you very much for the book suggestion; I will have a look at the book. Yes learning R has been a real conceptual shock as I have been used to SAS for over 10 years.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.