Trying to loop through a dataframe and replace values to a 0 or 1

dsmith_au · March 28, 2020, 6:13am

Sorry for the newbie question, I am trying to loop through a dataframe and replace binary string values like "YES" and "NO" with 0 and 1.

I'm trying really hard to learn the syntax but it just defies logic. Rather than create a line for each factor like this 'dat2$Verified.as.Malware <- ifelse( is.na(dat2$Verified.as.Malware), 'No ', 'Yes' )' I thought I would write a function like this :

replaceStringBinary = function(dataFrame){
  for( currentRow in 1:nrow(dataFrame)){
    for( currentCol in 1:ncol(dataFrame)){
     
      if ( is.na(dataFrame[currentRow, currentCol]) ){
        dataFrame[currentRow, currentCol] = 0
      }
     
      if( dataFrame[currentRow, currentCol] == 'Yes'){
        dataFrame[currentRow, currentCol] = 1
      }
     
      if ( dataFrame[currentRow, currentCol] == 'No'){
        dataFrame[currentRow, currentCol] = 0
      }
     
    }
  }
}

which I feel is syntactically correct but throws this error:
Error in if (dataFrame[currentRow, currentCol] == "No") { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In [<-.factor(*tmp*, iseq, value = 0) :
invalid factor level, NA generated
2: In [<-.factor(*tmp*, iseq, value = 1) :
invalid factor level, NA generated

I account for the 'NA' test in the first if statement to ensure the subsequent if statements do not test a non string but I cant work out why it throws this error.

Can any one assist please?

technocrat · March 28, 2020, 6:49am

Hi, and welcome. For future reference, please see the FAQ on reprex to optimize discussions.

I’m on a tablet, so if I’m unclear, reply and I’ll try to clarify tomorrow PDT.

The dplyr package has a mutate variant to do this all in one pass. Alternatively, create a vector of the variables and use a purrr:map() and your function.

BTW: R is best thought of in terms of a functional language, f(x) = y where everything is an object. Procedural language approaches with control statements are available but best used “under the hood” in packages, rather than on the fly.

nirgrahamuk · March 28, 2020, 12:26pm

you're being tripped up because your code is trying to amend an existing dataframe based on its prior values but this would require switching type, as character strings are not compatible types with integer values) To reform your approach, you need your function to construct cell by cell an object with integer variables, and return that.

However, this is purely to demonstrate the ability to program in this way, this approach does not show expertise in R, it takes no advantage of R's vectorisation, and useful high level packages.

myframe <- data.frame(a = c("Yes","No","Yes"),
                      b = c(NA_character_,"No",NA))

replaceStringBinary = function(dataFrame){
  
  new_m <- matrix(nrow=dim(dataFrame)[1],ncol = dim(dataFrame)[2])
 
  for( currentRow in 1:nrow(dataFrame)){
    for( currentCol in 1:ncol(dataFrame)){
      
      if ( is.na(dataFrame[currentRow, currentCol]) ){
        new_m[currentRow, currentCol] = 0
      } else if( dataFrame[currentRow, currentCol] == 'Yes'){
        new_m[currentRow, currentCol] = 1
      } else if ( dataFrame[currentRow, currentCol] == 'No'){
        new_m[currentRow, currentCol] = 0
      }
      
    }
  }
  new_df <- as.data.frame(new_m)
  names(new_df) <- names(dataFrame)
  return(new_df)
}

myframe
replaceStringBinary(myframe)

alternative

mytransform <- function(x) {case_when(is.na(x) ~ 0,
                                      x=='Yes' ~1,
                                      TRUE ~ 0)} # 0 all other values (includes No )

mutate_all(myframe,
           ~  mytransform(.))

system · April 18, 2020, 12:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.