Why my function doesn't work

I tried to replace NA values in the following tibble:

testdf

  a  |   b    

1 NA  |  b    
2 a   |  NA   

by using the following function:

replaceNATest <- function(df, colName, value='Unknown') {
     df[colName] <- ifelse( is.na(df[[colName]]), value, df[[colName]]) }

After submitted the following command:

replaceNATest(testdf, "a", "Unknown")

the testdf didn't change

I'm not sure why the NA not be replaced by "Unknown"

Thanks

Stephen

Changing the df inside of the function does not change the testdf object outside of the function. You have to write the function so that it returns an object and assign that object to testdf.

DF <- data.frame(A = c(NA, 2), B = c(3, NA))
DF
#>    A  B
#> 1 NA  3
#> 2  2 NA
replaceNATest <- function(df, colName, value='Unknown') {
  df[colName] <- ifelse( is.na(df[[colName]]), value, df[[colName]])
  df
  }
DF <- replaceNATest(DF, "A")
DF
#>         A  B
#> 1 Unknown  3
#> 2       2 NA

Created on 2019-11-08 by the reprex package (v0.3.0.9000)

Try running these two little blocks of code and you will see that A and x in the global environment are not affected by what goes on in MyFunc, which has its own environment.

MyFunc <- function(x) {
  x <- x + 2
  x
}
A = 6
MyFunc(A)
A

x = 3
MyFunc(x)
x
2 Likes

Great exposition on the difference between local and global scope, hard in any language.

Thanks FJCC. I think there are three ways to get the dataframe testdf updated:

    library(tidyr)
    testdf <- tibble(
        a = c(NA, "a"),
        b = c("b", NA)
    )
    testdf %>% print()
    testdf_bkp <- testdf

By explict return dataframe using the following function :

  replaceNATest <- function(df, colName, value='Unknown') {
     df[colName] <- ifelse( is.na(df[[colName]]), value, df[[colName]]) 
     return(df)
  }
  testdf <- replaceNATest(testdf, "a", "Unknown Value")
  testdf %>% print()

or by direct call environment variable from enclosed function (not recommended):

  testdf <- testdf_bkp
  replaceNATest <- function(df, colName, value='Unknown') {
     df[colName] <- ifelse( is.na(df[[colName]]), value, df[[colName]])
     testdf <<- df
  }
  replaceNATest(testdf, "a", "Unknown Value")
  testdf %>% print()

or by explict assign to environment from enclosed function (not recommended):

  testdf <- testdf_bkp
  replaceNATest <- function(df, colName, value='Unknown', dfName ) {
     df[colName] <- ifelse( is.na(df[[colName]]), value, df[[colName]]) 
     assign(dfName, df, envir = .GlobalEnv)
  }
  replaceNATest(testdf, "a", "Unknown Value", dfName = "testdf")
  testdf %>% print()

referenced to:

Why do you want to update inside function? Is there any gain in any way? You can read through the points raised against this in the SO thread you linked to.

If that's not a requirement, don't do it. It makes things unnecessarily complicated in my opinion. Since you're already using pipes (though unnecessarily, as print is optional in yyour code), I guess you're using other tidyverse packages also. You can simply use tidyr::replace_na like below:

testdf %<>% replace_na(list(a = "Unknown Value")) 

In this way, you can easily change multiple columns too.