Create empty dataframes for later population of values from validation checks

Hi, I just started my programming journey, and have only som minor experience from SAS before. Now I need to create 3 empty datasets (dataframes) with various numbers of variables (columns) in different formats. They should be datasets logging the output from different validation checks, so they will be populated later on, but now I need to just create the "shells".
In SAS i would have used the following code;

data log_runs;
format RUNN best.;
length RUNNF $4.;
format RUNDT datetime20.;
length PREV_DVRES $100.;
delete;
run;

data log_res;
format RUNN best.;
length STATUS $3.;
length UTID $300.;
length SUBJID $10.;
length CHKID $12.;
length VISIT $64.;
length PAGE $32.;
format SEQ best.;
format ROW best.;
length SPEC $300.;
length NOTE $500.;
length CLOSURE $1.; *C=Closed/Ignored by monitor response, O=Obsolete (no longer present), X=Closed by DM (error in run, etc);
format CLOSURE_RUNN best.;
delete;
run;

data log_progs;
format RUNN best.;
length DVPROG $32.;
delete;
run;

Is there anyone who can help out here ... please bare with me, I'm really new to programming :slight_smile:

You can do this with code like this:

DF <- data.frame(Name = character(), Value = double(), Fac = factor())

However, I suspect there is no need to do it. I do not remember ever having to do that either for my own work or as a solution on the forum. Can you provide some details of how these data frames will be filled?

1 Like

Hi,

@FJCC I have used this quite a few times actually. It comes in handy when you need to add data to a dataframe in a loop but you don't want to duplicate the code for the first instance of the dataframe (or don't know at which iteration it will happen).

Here is a dummy example (the data itself does not make sense)

myData = data.frame(id = integer(), value = character())

for(i in 1:5){
  if(i > 2){
    myData = rbind(
      myData[myData$id > 3,], 
      data.frame(id = i, value = "success"))
  }
}

myData
#>   id   value
#> 1  4 success
#> 2  5 success

Created on 2022-03-02 by the reprex package (v2.0.1)

By setting an empty dataframe to start with, you can still perform operations on it even if the result is the same empty dataframe. Once you actually have data, you can merge it with empty data frame and continue from there.

Hope this helps,
PJ

But the following also works. Making an empty data frame can be useful; I'm not aware of a need to predefine the column class. I even tried swapping the class of id and value in your code (not shown) and the result was unchanged.

myData <- data.frame()

for(i in 1:5){
   if(i > 2){
     myData = rbind(
       myData[myData$id > 3,], 
       data.frame(id = i, value = "success"))
   }
 }
myData
  id   value
1  4 success
2  5 success

Hi,

Good catch, this is because I use the Tidyverse all the time and forgot this might work in base R. Using dplyr syntax you have to do this

ERROR

library(dplyr)

myData = data.frame()

for(i in 1:5){
  if(i > 2){
    myData = rbind(
      myData %>% filter(id > 3), 
      data.frame(id = i, value = "success"))
  }
}
#> Error in `filter()`:
#> ! Problem while computing `..1 = id > 3`.
#> Caused by error in `id > 3`:
#> ! comparison (6) is possible only for atomic and list types

myData
#> data frame with 0 columns and 0 rows

Created on 2022-03-02 by the reprex package (v2.0.1)

CORRECT

library(dplyr)

myData = data.frame(id = integer(), value = character())

for(i in 1:5){
  if(i > 2){
    myData = rbind(
      myData %>% filter(id > 3), 
      data.frame(id = i, value = "success"))
  }
}

myData
#>   id   value
#> 1  4 success
#> 2  5 success

Created on 2022-03-02 by the reprex package (v2.0.1)

PJ

1 Like

Hi all! Thank you very much for you help ... I had to put this aside for a while, to prio other work, so I have not yet tried it, but I will soon ... and hopefully it works fine :slight_smile:
/Ankan

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.