Fast linear regression by groups

I need to create two columns in my data.table. Both columns rely on linear regression by groups. Part of my code is as below:

SPI_func <- function(SD){
  model <- lm(ret ~ ret_mkt + ret_ind, data = SD)
  r_squared <- summary(model)$r.squared
  return(log((1 - r_squared)/r_squared))

Gamma_func <- function(SD){
  model <- lm(ret ~ stats::lag(ret) + ret_mkt 
              + I(stats::lag(ret)*stats::lag(V)),
              data = SD)
  Gamma <- coef(model)[[4]]
crsp_dsf[, ':='(SPI = SPI_func(.SD),
                Gamma = Gamma_func(.SD)), by = .(cusip, qtr)]

My data crsp_dsf is large. Its dimension is about 50million rows and 10 columns.

I guess most of the time is spent on linear regression part. Since I am already using Microsoft R Open with MKL, if I replace lm with fastLm from RcppEigen, will that improve performance?

Or are there any other parts of my code could be optimized for performance?

Since my regressions involve lagged variable, if I want to use matrix input for fastLm, then I need to create some new columns of lagged variables.

1 Like

The only way to answer this question is to profile. Where does your code spend most of it's time? profvis should be able to answer this question at least at the high level.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.