Create new column within dataframe within function

Hello, I am new to R and trying to write a function that will remove outliers for many variables in my data frame and create a new variable with the outlier blank. I think the function is working, but the new variable does not appear in the data frame.

remove_outliers <- function (dataset, column, column_new) {
column <- eval(substitute(column), dataset, parent.frame())

Q <- quantile(column, probs = c(.25, .75), na.rm = FALSE)
iqr <- IQR(column)
up <- Q[2] + 1.5 * iqr # Upper Range
low <- Q[1] - 1.5 * iqr # Lower Range

dataset[[column_new]] <- ifelse (column > up, "",
ifelse(column < low, "",
column))
return(dataset)
}

remove_outliers(study,
day35_freecort_cortisone,
"day35_freecort_cortisone_x")

You have to save the returned value of the function in a variable.

study <- remove_outliers(study, day35_freecort_cortisone, "day35_freecort_cortisone_x")

Also, replacing a number with an empty string will convert the whole column into characters, which you probably do not want. You can use NA instead.

Thank you very much for your help - I will use NA instead. I thought I was storing the value in a variable by writing the following:

dataset[[column_new]] <- ifelse (column > up, NA,
ifelse(column < low, NA,
column))

Is there something else I should write? Thank you again!

The line you quote does store the new column in the dataset variable. However, dataset only exists while the function is executing. To get a little technical, the study variable exists in the Global Environment; you can see it listed in the Global Environment pane in R Studio. When you define the function remove_outliers, that function exists in the Global Environment. When you execute the function, it makes its own environment and everything the function does happens there, isolated from the Global Environment. Whatever you return() from the function is written out to the Global Environment and either stored in a variable there, as I showed earlier, or simply written to the console if there is no variable assigned. All of the variables inside of the function's environment disappear when the function completes and its environment disappears.
In the code below, you can see two versions of a function that modifies its argument. The first case is similar to what you did. The variable x is passed as the Z argument of the function and Z is modified and returned. However, x in the Global environment is unchanged. The returned value what not stored in x.
In the second case, the argument of the function is also named x and it is modified within the function. But this is a different x than the one defined in the global environment; it is defined inside the function as the function's argument. Modifying it does not affect the x in the Global environment.

x <- 1
MyFunc <- function(Z) {
   Z <- Z + 1
   return(Z)
 }
MyFunc(x)
[1] 2
#x is still 1
x 
[1] 1
#version 2 
x <- 1
MyFunc <- function(x) {
   x <- x + 1 #not the same x as in the Global Environment
   return(x)
 }
MyFunc(x)
[1] 2
#x is still 1
x
[1] 1

That might be more than you wanted to know! The short version is, if you want to save a value defined inside of a function, you must return() it and save that returned value in a variable.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.