 Set negative values on subset of columns to 0 in R data.table

I have a large data.table in R and would like to set any negative numbers in a subset of columns to zero. I have looked at multiple options for doing this but I can't find answer I need.

For example if I have the following data.table:

v1 <- c(-32, -45, -92, 0, 11)
v2 <- c(10, 12, -9, -3, 5)
v3 <- c(-12, -65, 10, 4, -9)
v4 <-  c(-54, 45, 65, 34, -12)
DT <- as.data.table(cbind(v1, v2, v3, v4))

While I could convert ALL negative values to 0 with this:

DT[DT<0] <- 0

How could I set only the negative values in v2 and v3 to zero? I feel like this should be simple but the answer is escaping me.

Try using \$ notation to specify the column you want. Here's how to do what you want in two steps...

> DT[DT\$v2 < 0, "v2"] <- 0
> DT[DT\$v3 < 0, "v3"] <- 0
> DT
v1 v2 v3  v4
1 -32 10  0 -54
2 -45 12  0  45
3 -92  0 10  65
4   0  0  4  34
5  11  5  0 -12

Also, I assume you meant as.data.frame rather than as.data.table

1 Like

Hi Riffomonas, thanks for getting back to me. This does work - I was missing the "v2" part, so this is a step forward. However, I would ideally like to do this in a single step, either using the named columns or column indices as this is within a loop. I am also using the R data.table package, so it is a data.table as well as a data.frame, but data.frame options do still work for the most part.

The data.table methods are discussed here:
r - Apply a function to every specified column in a data.table and update by reference - Stack Overflow

one step

DT[DT\$v2 < 0, c("v2","v3")] <- 0

p.s. while data.table is a common library but as demonstratedd on this thread, not ubiquitous. Therefore it would be good to add library(data.table) to your example ( in the future).

Regards,
Nir

HI Nir, thanks for your response. I'll cite the package next time. Unfortunately the code you have supplied doesn't work as it filters only on the negative values in 'v2' and turns the values on those rows in 'v2 and 'v3' to 0. I would like to search for negative values in v2 and v3 and only turn the negative values in those columns to 0, rather than across the row.

sorry about that. data.table is its own syntax that one would have to study and learn.
I will skip that by applying my dplyr knowledge and using dtplyr package to run data.table queries.

library(data.table)
library(dplyr)
library(dtplyr)
v1 <- c(-32, -45, -92, 0, 11)
v2 <- c(10, 12, -9, -3, 5)
v3 <- c(-12, -65, 10, 4, -9)
v4 <-  c(-54, 45, 65, 34, -12)
DT <- lazy_dt(cbind(v1, v2, v3, v4))

(DT2 <- DT %>%
mutate(across(c("v2","v3"),~ifelse(.x<0,0,.x))) %>% as.data.table())

note its best to upgrade to the latest versions of these libraries to increase the odds of this working.

Hi Martin

Thanks for this link, it helped me to see a way through (though if anyone has a better way to do this I'd love to see it). In the end I wrote a neg2zero function and applied that.

library(data.table)

v1 <- c(-32, -45, -92, 0, 11)
v2 <- c(10, 12, -9, -3, 5)
v3 <- c(-12, -65, 10, 4, -9)
v4 <-  c(-54, 45, 65, 34, -12)
DT <- as.data.table(cbind(v1, v2, v3, v4))
DT

cols <- c("v2", "v3")

neg2zero <- function(x){
res <- ifelse(x < 0, 0, x)
return(res)
}

DT[ , (cols) := lapply(.SD, neg2zero), .SDcols = cols]
DT

Thanks Nir, it's great to have options. I do use the tidyverse sometimes, but data.table is fast on the large datasets i normally work with.

1 Like

That looks fine to me. You could include the ifelse() function directly in lapply() rather than use neg2zero, but that's a matter of taste.

Using the set() option from the SO link may be faster on very large tables.

Thanks Martin, I did consider an ifelse function in the lapply(), but I couldn't work out how to call the value of 'x' i.e.

DT[ , (cols) := lapply(.SD, ifelse(x<0, 0, x)), .SDcols = cols]

clearly doesn't work

(I'm not particularly experienced with the apply family)

Try

DT[ , (cols) := lapply(.SD, function(x) ifelse(x<0, 0, x)), .SDcols = cols]

I haven't tested, but pmax(x, 0) might be quicker.

2 Likes

Perfect, thanks Martin.

DT[ , (cols) := lapply(.SD, pmax, 0), .SDcols = cols]
DT
#    v1 v2 v3  v4
#1: -32 10  0 -54
#2: -45 12  0  45
#3: -92  0 10  65
#4:   0  0  4  34
#5:  11  5  0 -12
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.