I have been using SAS for a very long time but got interested in learning R recently. I am thinking of analyzing data and producing the same table using R that I did using SAS and learn R on the way.
Much of the work in clinical trials involve producing pretty tables to present the data after analyzing the data using some statistical methods. Many tables do not really need much of high level statistics (e.g. demographics, AE) but is presented in a visually attractive way. SAS/PROC REPORT is a fantastic procedure just for that. R do not have similar procedures but recently few developers tried to overcome that limitation (e.g. David Bosak) - made some packages. I am reading many books/articles on R but thought of joining this community to get some help.
Anyway, here is my question - I have few line of codes from SAS below: a very simple task.
data adslvis;
set vis2.adsl;
if actarmcd="xxxx" then trtn=1;
else if actarmcd="yyyy" then trtn=2;
else delete;
keep usubjid trtn;
run;
The first line is creating a library (basically making a link to a folder where the sas data is located) vis2 .
The second block of codes simply creating a dataset 'adslvis' by getting the dataset 'adsl' from the library vis2, creating a new numeric variable trtn, and keeping just subjid and trtn in the dataset 'adslvis'.
R-version.
I can read sas7bdat data from the folder (after installing packages haven and sas7bdat) using the function read_sas.
vis <- read_sas("J:xxxx/xxxx/xxxx/xxxx/")
How can I accomplish second part and still keep the data name the same? I am sure there are many different ways of doing it using different packages/functions but what is the simplest way? It might be very trivial for R programmers but not yet for me.
Here is one version using the dplyr library. Obviously, I haven't tested it since I don't have data.
library(dplyr)
vis <- vis |> filter(actarmcd %in% c("xxxx", "yyyy")) |>
mutate(trtn = case_when(
actarmcd == "xxxx" ~ 1,
actarmcd == "yyyy" ~ 2,
TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
)) |>
select(usubjid, trtn)
Does not seem to be working although if I run vis$actarmcd - the variable is there. I get this message:
Error in filter():
! Problem while computing ..1 = actarmcd %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC").
Caused by error in actarmcd %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC"):
! object 'actarmcd' not found
Did you have trouble loading dplyr ?
I would expect your error if dplyr wasnt loaded as there is a stats::filter that would fail lole thisz whereas you want dplyr::filter
Found the issue - actarmcd should have been capitalized ACTARMCD. Didn't know these two are different in R.
Anyway now I get a new problem:
visadsl <- vis |> filter(ACTARMCD %in% c("VEDOIV_VEDOSC", "VEDOIV_PSC")) |>
mutate(trtn = case_when(
ACTARMCD == "VEDOIV_VEDOSC" ~ 1,
ACTARMCD == "VEDOIV_PSC" ~ 2,
TRUE ~ NA #This line shouldn't be necessary because actarmcd should have only two values
)) |>
select(usubjid, trtn)
Error in mutate():
! Problem while computing trtn = case_when(...).
Caused by error in case_when():
I know.. I know I need to patient - I am too new to R. That's why I am posting this to get help from the Gurus. So sorry if I annoyed anyone with trivial questions.
Basically I am trying to create a dataset called visadsl from the dataset vis (a large dataset) where visadsl will only have subjid and trtn and trtn will be a numeric variable as 1 when ACTARMCD='VEDOIV_VEDOSC and as 2 when ACTARMCD='VEDOIV_PSC.
Without real data is so difficult get an accuracy help.
But, if I understand well you need something like that with you vis data.
my_df <- data.frame(
name = c("Alice", "Bob", "Charlie","Alice"),
age = c(25, 30, 35,50),
city = c("New York", "San Francisco", "Boston","Boston")
)
# add a new column using case_when
my_df2 <- my_df %>% filter(name %in% c('Alice','Charlie') ) |>
mutate(region = case_when(
city == "New York" ~ "East",
city == "San Francisco" ~ "West",
TRUE ~ "Other"
))
# name age city region
# 1 Alice 25 New York East
# 2 Charlie 35 Boston Other
# 3 Alice 50 Boston Other
And where is Bob? Why bob is not in the new dataset?
Please send me your personal contact so that I can send you the SAS dataset. This is my first attempt to generate the same table I generated using SAS so easily - this is the first step of my effort. Nothing fancy - but I seem have struggle even getting through the first step.