Rename columns in data.table without breaking chaining operation

In dplyr, I can easily rename columns in data.frame within my chaining operatons by doing things like

%>% rename("newname" = "oldname")

I was wondering in data.table, how can I do that? If I use the setnames function, it seems that I need to break my chaining operation and start a new block of code. Maybe some thing like this:

[...][, rename("newname" = "oldname")]

I believe it is just

[...][, .(newname = oldname)]

and the above is chained.

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

Also if you ever to run into a command that won't chain (which is very unusual for data.table as it is designed to chain), you can use a dot notation (with the wrapr dot-arrow-pipe) to try and get out of the corner.

library("wrapr")
library("data.table")

iris %.>% setorderv(., c("Sepal.Length", "Sepal.Width"))[] %.>% head(.)

One can use tricks like %.>% -> .[...] to move in and out of the form.

But honestly I think everything has a chaining version in data.table.

1 Like

Will the oldname column be replaced or it just creates an identical column with a new name?

I am not a data.table expert (yet).

It looks like you have to name all columns you want to live.

The following has two columns.

library("data.table")
as.data.table(iris)[, .(WWW = Petal.Width, Petal.Length)]

One could try this (which does leave the uninvolved old columns in, removing "Petal.Width"):

library("wrapr")
library("data.table")

as.data.table(iris) %.>% setnames(., old = "Petal.Width", new = "WWW")[]

(I have note here as to why the following does not work.

library("magrittr")
library("data.table")

as.data.table(iris) %>% setnames(., old = "Petal.Width", new = "WWW")[]

The following does work.

library("magrittr")
library("data.table")

as.data.table(iris) %>% {setnames(., old = "Petal.Width", new = "WWW")[]}

)

By doing that, I think the other columns are dropped, which is different from the rename function in dplyr.

Can I asked why you want it chained "like a pipe" absolutely ?

As data.table is working by reference, there is no assignment and it is like it is chained by default without any pipe-like operator. Pipe operators are a way to chain operation without making an assignment.

library(data.table)
iris_dt <- as.data.table(iris)
iris_dt[, is.SETOSA := Species == "setosa"]
setnames(iris_dt, "is.SETOSA", "is_setosa")
iris_dt
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species is_setosa
#>   1:          5.1         3.5          1.4         0.2    setosa      TRUE
#>   2:          4.9         3.0          1.4         0.2    setosa      TRUE
#>   3:          4.7         3.2          1.3         0.2    setosa      TRUE
#>   4:          4.6         3.1          1.5         0.2    setosa      TRUE
#>   5:          5.0         3.6          1.4         0.2    setosa      TRUE
#>  ---                                                                      
#> 146:          6.7         3.0          5.2         2.3 virginica     FALSE
#> 147:          6.3         2.5          5.0         1.9 virginica     FALSE
#> 148:          6.5         3.0          5.2         2.0 virginica     FALSE
#> 149:          6.2         3.4          5.4         2.3 virginica     FALSE
#> 150:          5.9         3.0          5.1         1.8 virginica     FALSE

You can "pipe" it using the := operator using one of the multiple column syntax to create a new column with a new name then deleting the old one.

library(data.table)
iris_dt <- as.data.table(iris)
iris_dt[, is.SETOSA := Species == "setosa"][
  , `:=`(is_setosa = is.SETOSA, is.SETOSA = NULL)]
names(iris_dt)
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
#> [5] "Species"      "is_setosa"

Otherwise, like mentioned in a previous post, using the %>% pipe should work with setnames but you may have difficulties to continue the chain without %>%

library(data.table)
library(magrittr)
iris_dt <- as.data.table(iris)
iris_dt[, is.SETOSA := Species == "setosa"] %>%
  setnames(., "is.SETOSA", "is_setosa")
names(iris_dt)
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
#> [5] "Species"      "is_setosa"
3 Likes

Indeed, it is "][" that is the correct pipe-operator for data.table (though method chaining may be the term of choice).

I forgot this! Thanks.

The best way to take advantage of data.table's pass-by-reference for this is using .SD

as.data.table(iris)[
, is.SETOSA := Species == "setosa"
][, setnames(.SD, "is.SETOSA", "is_setosa")]

2 Likes

will copy data to a new column, then delete the old column. Less efficient than set.names. Better to use .SD in a chain.

1 Like