I need to create two columns in my data.table. Both columns rely on linear regression by groups. Part of my code is as below:
SPI_func <- function(SD){
model <- lm(ret ~ ret_mkt + ret_ind, data = SD)
r_squared <- summary(model)$r.squared
return(log((1 - r_squared)/r_squared))
}
Gamma_func <- function(SD){
model <- lm(ret ~ stats::lag(ret) + ret_mkt
+ I(stats::lag(ret)*stats::lag(V)),
data = SD)
Gamma <- coef(model)[[4]]
return(Gamma)
}
crsp_dsf[, ':='(SPI = SPI_func(.SD),
Gamma = Gamma_func(.SD)), by = .(cusip, qtr)]
My data crsp_dsf is large. Its dimension is about 50million rows and 10 columns.
I guess most of the time is spent on linear regression part. Since I am already using Microsoft R Open with MKL, if I replace lm with fastLm from RcppEigen, will that improve performance?
Or are there any other parts of my code could be optimized for performance?
Since my regressions involve lagged variable, if I want to use matrix input for fastLm, then I need to create some new columns of lagged variables.