Creating columns with data.table within a function

Hi,

I would like to pass a string of column names through a loop. So that if these variables do not exist to
create columns with the same names (and fill them with NAs). This is all part of a function.

My common approach is to use := operator from data.table package. But, when I try doing this inside a function R throws me an error:

Error in `:=`(columns[i], NA) : Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

Same thing even if I'm setting the data to data.table with setDT() and data.table inside the function. Short replicable example below.

Data <- t(c(n1 = 10, n2 = 35, n3 = "text"))
columns <- c("n1", "n5")#string of col names to check

function <- colCreate(Data){
for (i in (1:length(columns))){
Data <- data.table(Data)
if(sum((colnames(Data) == paste(columns[i])))==0){
Data[,columns[i] := NA] }
}
}
#Calling the function
colCreate(Data)

Would appreciate any hints.
Thanks!

It is usually a good idea to avoid loops when set based logic can do the same thing. This show how to find the cols that need to be added but ends there.

# get the list of column names from the orignial dataframe
current_names <- names(mtcars)
 
# create a vector of the new "candidate" columns
 new_names <- c("col1",  "col2", "mpg", "col4")
 
# return logical vector showing "duplicates"
 current_names %in% new_names
1 Like

This looks like a job for 'set'

Something like

for (j in columns) {

set(Data, i = NULL, j = j, value = NA)

}

Untested, as I'm on my phone

1 Like

this appears to work

library(data.table)
Data <- t(c(n1 = 10, n2 = 35, n3 = "text"))
columns <- c("n1", "n5")

Data <- data.frame(Data)

my_fun <- function(df,cols_to_check) {
  
  DT <- setDT(df)
  current_names <- names(DT)
  new_cols <- cols_to_check[c(which(!cols_to_check %chin% current_names))]
  
  setblank <- function(x){x <- NA}
  
  for (j in new_cols) {
    
    set(DT, j = j, value = setblank())
    DT
  }
  return(DT)
}

test <- my_fun(Data, columns)

My original reply would also work ( and is easier)

my_fun <- function(df,cols_to_check) {
  
  DT <- setDT(df)
  current_names <- names(DT)
  new_cols <- cols_to_check[c(which(!cols_to_check %chin% current_names))]
 
  for (j in new_cols) {
 
  set(DT, j = j, value = NA)
  }
  return(DT)
}

test <- my_fun(Data, columns)

test
n1 n2 n3 n5
1: 10 35 text NA

Thanks, @johnmackintosh! The first approach worked for me. It seems that a combo of %chin%, and setDT() did the trick.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.