I need to calculate rolling mean and standard deviations for a couple of columns in a large data (30 million rows and 11 columns). I use the rollify function in tibbletime with data.table, but the code seems very slow.
I want to know how to do it quickly in data.table without having to use functions that are slow. My code is as below
mywindow <- 90
MyMean <- function(x){
return(mean(x, na.rm = TRUE))
}
mean_roll <- rollify(MyMean, window = mywindow)
MySD <- function(x){
return(sd(x, na.rm = TRUE))
}
sd_roll <- rollify(MySD, window = mywindow)
MySkewness <- function(x){
return(skewness(x, na.rm = TRUE))
}
skewness_roll <- rollify(MySkewness, window = mywindow)
MyKurtosis <- function(x){
return(kurtosis(x, na.rm = TRUE))
}
kurtosis_roll <- rollify(MyKurtosis, window = mywindow)
sample_filter <- crsp_all[, .N, by = permno][N >= 91][,list(permno)]
crsp_all <- merge(crsp_all, sample_filter, by = "permno")
rm(sample_filter)
crsp_all[
, ':='(prc = abs(prc), impact = abs(ret)/(prc*vol),
baspread = coalesce(ask-bid, askhi-bidlo)/coalesce((ask+bid)/2, (askhi+bidlo)/2),
stockto = vol/shrout)
][
,':='(mean_ret = mean_roll(ret), sd_ret = sd_roll(ret),
skewness_ret = skewness_roll(ret), kurtosis_ret = kurtosis_roll(ret),
amihud = mean_roll(impact), stockTO = mean_roll(stockto),
BAspread = mean_roll(baspread)), by = permno
]