Quick methods to loop over multiple columns and rows in a data frame/data table

Hi there,

I am looking for a quick method for looping through multiple rows and columns to replace values, especially for a large data set. I want to keep NA's as NA's, -1 and -3 with NA's, -7 with 0 and keep the rest of the values as it is.

The following loop runs well for a small data set but takes longer to process for a large data set.

for (j in 1:ncol(df)) {
  for (i in 1:nrow(df)) {
    if(is.na(df[i,j])){
      df[i, j] <- NA 
    } 
    else if(df[i,j]==-1|df[i,j]==-3){
      df[i, j] <- NA
    }
    else if(df[i,j]==-7) {
      df[i, j] <- 0
    }
    else if(df[i,j]==df[i,j]) {
      df[i, j] <- df[i,j]
    }
  }
}

Sample data:

id A B C D E
1 2 3 1 1 2
2 -1 -7 -1 2 NA
3 3 3 2 -3 -3
4 -3 9 1 4 2
5 4 NA NA NA NA
6 NA 3 0 1 2
7 3 5 NA 2 3
8 -1 9 0 -3 2
9 4 -3 2 4 -7
10 1 -3 -7 2 NA

Thanks

first, I invite you to learn how to provide example data in a forum friendly way reprex
Here is your data, as it would have been output via dput() , this would have been much easier for you no doubt, than it has been for me to get it into this form...

df1 <- structure(list(id = 1:10, A = c(
  2L, -1L, 3L, -3L, 4L, NA, 3L,
  -1L, 4L, 1L
), B = c(3L, -7L, 3L, 9L, NA, 3L, 5L, 9L, -3L, -3L), C = c(1L, -1L, 2L, 1L, NA, 0L, NA, 0L, 2L, -7L), D = c(
  1L,
  2L, -3L, 4L, NA, 1L, 2L, -3L, 4L, 2L
), E = c(
  2L, NA, -3L, 2L,
  NA, 2L, 3L, 2L, -7L, NA
)), row.names = c(NA, -10L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))

now a possible solution:


library(tidyverse)

revalue_func <- function(x) {
  case_when(
    x %in% c(-1, -3) ~ NA_integer_,
    x == -7 ~ 0L,
    TRUE ~ as.integer(x)
  )
}

mutate(df1,
       across(A:E,
              revalue_func))

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.