Long time SAS User - New to R

Hello All,

I have been using SAS for a very long time but got interested in learning R recently. I am thinking of analyzing data and producing the same table using R that I did using SAS and learn R on the way.
Much of the work in clinical trials involve producing pretty tables to present the data after analyzing the data using some statistical methods. Many tables do not really need much of high level statistics (e.g. demographics, AE) but is presented in a visually attractive way. SAS/PROC REPORT is a fantastic procedure just for that. R do not have similar procedures but recently few developers tried to overcome that limitation (e.g. David Bosak) - made some packages. I am reading many books/articles on R but thought of joining this community to get some help.

Anyway, here is my question - I have few line of codes from SAS below: a very simple task.

libname vis2 "/xxx/xxxx/xxxx/xxxx/xxxx/xxx" access= readonly.

data adslvis;
set vis2.adsl;
if actarmcd="xxxx" then trtn=1;
else if actarmcd="yyyy" then trtn=2;
else delete;
keep usubjid trtn;
run;

The first line is creating a library (basically making a link to a folder where the sas data is located) vis2 .

The second block of codes simply creating a dataset 'adslvis' by getting the dataset 'adsl' from the library vis2, creating a new numeric variable trtn, and keeping just subjid and trtn in the dataset 'adslvis'.

R-version.

I can read sas7bdat data from the folder (after installing packages haven and sas7bdat) using the function read_sas.

vis <- read_sas("J:xxxx/xxxx/xxxx/xxxx/")

How can I accomplish second part and still keep the data name the same? I am sure there are many different ways of doing it using different packages/functions but what is the simplest way? It might be very trivial for R programmers but not yet for me.

Thanks a lot for reading up to this.

Sharif Uddin

Here is one version using the dplyr library. Obviously, I haven't tested it since I don't have data.

library(dplyr)
vis <- vis |> filter(actarmcd %in% c("xxxx", "yyyy")) |> 
  mutate(trtn = case_when(
    actarmcd == "xxxx" ~ 1,
    actarmcd == "yyyy" ~ 2,
    TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
  )) |> 
  select(usubjid, trtn)
1 Like

Posit recently published a blog post on this topic: How to learn R as a SAS user - Posit

Does not seem to be working although if I run vis$actarmcd - the variable is there. I get this message:

Error in filter():
! Problem while computing ..1 = actarmcd %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC").
Caused by error in actarmcd %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC"):
! object 'actarmcd' not found

Did you have trouble loading dplyr ?
I would expect your error if dplyr wasnt loaded as there is a stats::filter that would fail lole thisz whereas you want dplyr::filter

Does not seem to be a dyplr issue.

Check load this actarmcd.

Found the issue - actarmcd should have been capitalized ACTARMCD. Didn't know these two are different in R.

Anyway now I get a new problem:

visadsl <- vis |> filter(ACTARMCD %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC")) |>

  • mutate(trtn = case_when(
    
  •     ACTARMCD == "VEDOIV_VEDOSC" ~ 1,
    
  •     ACTARMCD == "VEDOIV_PSC" ~ 2,
    
  •     TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
    
  • )) |> 
    
  • select(usubjid, trtn)
    

Error in mutate():
! Problem while computing trtn = case_when(...).
Caused by error in case_when():

I know.. I know I need to patient - I am too new to R. That's why I am posting this to get help from the Gurus. So sorry if I annoyed anyone with trivial questions.

Sharif

1 Like

Dont worry about problems. All R users start in the same way:

Maybe you need put the column as character.

str(visadsl ) # for see what types are the columns: numeric, character...

mutate(trtn = case_when(
    as.character(ACTARMCD) == "VEDOIV_VEDOSC" ~ 1,
    as.character(ACTARMCD) == "VEDOIV_PSC" ~ 2,
    TRUE ~ NA))

I still get the same error:

str(visadsl) <- vis |> filter(ACTARMCD %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC")) |>

  • mutate(trtn = case_when(
    
  •     ACTARMCD == "VEDOIV_VEDOSC" ~ 1,
    
  •     ACTARMCD == "VEDOIV_PSC" ~ 2,
    
  •     TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
    
  • )) |> 
    
  • select(usubjid, trtn)
    

Error in mutate():
! Problem while computing trtn = case_when(...).
Caused by error in case_when():

looks like the problem is with the function case_when()

Run this:

And try to put a reproducible example of data:

# paste the result of 
dput(vis)

Still the same problem:

visadsl <- vis |> filter(ACTARMCD %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC")) |>

  • mutate(trtn <- case_when(
    
  •     as.character(ACTARMCD) == "VEDOIV_VEDOSC" ~ 1,
    
  •     as.character(ACTARMCD) == "VEDOIV_PSC" ~ 2,
    
  •     TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
    
  • )) |> 
    
  • select(usubjid, trtn)
    

Error in mutate():
! Problem while computing ..1 = trtn <- ....
Caused by error in case_when():

Basically I am trying to create a dataset called visadsl from the dataset vis (a large dataset) where visadsl will only have subjid and trtn and trtn will be a numeric variable as 1 when ACTARMCD='VEDOIV_VEDOSC and as 2 when ACTARMCD='VEDOIV_PSC.

Not sure why it will be so difficult :frowning:

Without real data is so difficult get an accuracy help.

But, if I understand well you need something like that with you vis data.

my_df <- data.frame(
  name = c("Alice", "Bob", "Charlie","Alice"),
  age = c(25, 30, 35,50),
  city = c("New York", "San Francisco", "Boston","Boston")
)

# add a new column using case_when
my_df2 <- my_df %>% filter(name %in% c('Alice','Charlie') ) |> 
  mutate(region = case_when(
    city == "New York" ~ "East",
    city == "San Francisco" ~ "West",
    TRUE ~ "Other"
  ))

#       name age     city region
# 1   Alice  25 New York   East
# 2 Charlie  35   Boston  Other
# 3   Alice  50   Boston  Other

okay - looks fine
But can you assign a numeric variable 1 for East and 2 for West and 3 for other.

And where is Bob? Why bob is not in the new dataset?

Please send me your personal contact so that I can send you the SAS dataset. This is my first attempt to generate the same table I generated using SAS so easily - this is the first step of my effort. Nothing fancy - but I seem have struggle even getting through the first step.

Sharif

My email is uddin.sharifm@gmail.com - please help to get through it.

Sharif

Because I filter only the names Alice and Charlie.

Example <- read_excel("C:\\Users\\Put_user\\Downloads\\example.xlsx")

Example_filter <- Example |> 
  filter(ACTARMCD %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC")) 

Example_filter$NEW_ACTARMCD= ifelse(Example_filter$ACTARMCD=="VEDOIV_VEDOSC",1 ,
                                    ifelse(Example_filter$ACTARMCD=="VEDOIV_PSC",2,''))

Example_filter[, c(1,7,9)]
# USUBJID                  ACTARMCD      NEW_ACTARMCD
# <chr>                    <chr>         <chr>       
# 1 MLN0002SC-3031-02003-506 VEDOIV_VEDOSC 1           
# 2 MLN0002SC-3031-02008-501 VEDOIV_PSC    2           
# 3 MLN0002SC-3031-02008-505 VEDOIV_VEDOSC 1           
# 4 MLN0002SC-3031-02012-503 VEDOIV_PSC    2           
# 5 MLN0002SC-3031-02014-502 VEDOIV_VEDOSC 1           
# 6 MLN0002SC-3031-02015-502 VEDOIV_VEDOSC 1