Rolling mean and standard deviation in data.table

datatable

#1

I need to calculate rolling mean and standard deviations for a couple of columns in a large data (30 million rows and 11 columns). I use the rollify function in tibbletime with data.table, but the code seems very slow.

I want to know how to do it quickly in data.table without having to use functions that are slow. My code is as below


mywindow <- 90

MyMean <- function(x){
  return(mean(x, na.rm = TRUE))
}


mean_roll <- rollify(MyMean, window = mywindow)


MySD <- function(x){
  return(sd(x, na.rm = TRUE))
}

sd_roll <- rollify(MySD, window = mywindow)

MySkewness <- function(x){
  return(skewness(x, na.rm = TRUE))
}

skewness_roll <- rollify(MySkewness, window = mywindow)


MyKurtosis <- function(x){
  return(kurtosis(x, na.rm = TRUE))
         }
  
kurtosis_roll <- rollify(MyKurtosis, window = mywindow)



sample_filter <- crsp_all[, .N, by = permno][N >= 91][,list(permno)]

crsp_all <- merge(crsp_all, sample_filter, by = "permno")

rm(sample_filter)

crsp_all[
  , ':='(prc = abs(prc), impact = abs(ret)/(prc*vol), 
         baspread = coalesce(ask-bid, askhi-bidlo)/coalesce((ask+bid)/2, (askhi+bidlo)/2),
         stockto = vol/shrout)
  ][
    ,':='(mean_ret = mean_roll(ret), sd_ret = sd_roll(ret), 
          skewness_ret = skewness_roll(ret), kurtosis_ret = kurtosis_roll(ret),
          amihud = mean_roll(impact), stockTO = mean_roll(stockto),
          BAspread = mean_roll(baspread)), by = permno
    
  ]

#2

rollify uses purrr under the hood, so I can't imagine it's going to be super performant. If it's simple statistics you're interested in, you could check out some of the functions in the zoo package. It has rollapply(), which takes an analogous approach to rollify but uses apply instead (so maybe not a big performance increase), and rollmean(), which is a performance-optimised rolling mean. The latter will probably give you the best performance for the mean, but if the others aren't fast enough for the SD, you might have to look into writing a rolling SD function using rcpp() :confused:


#3

That is what I am thinking. I used to use zoo::rollapply and I will try it now. I really like the ease of use provided by tidyverse ecosystem, but it seems functions from it have a performance issue.


#4

Is this helpful, at all?
https://cran.r-project.org/package=RcppRoll


#5

I think this could be a great alternative. I was wondering why data.table does not have its own version of rollapply.


#6

The rollapply function is still pretty slow, I will use the rcpproll package.


#7

Oooh yeah, that looks great!


#8

Yeah :frowning_face: Rolling functions tend to be slow in R because they require iteration, and applying an arbitrary function iteratively means doing the iteration in R, which introduces a lot of overhead. Functions like zoo::rollmean() and those in RcppRoll have been compiled with the iteration built-in (because the function is explicitly defined, not arbitrary), so they tend to be faster.


#9

The RcppRoll really saves my life! Do you know if there are any other high performance packages like this one? Thanks!


#10

There's a High-Performance and Parallel Computing with R CRAN tast view maintained by Dirk Eddelbuettel, which you might want to take a look at.


#11

Hi
The package Rolling Window has been a winner for me.
Check my blog and the comments on rolling functions