Tibbletime::rollify regression dynamic explanatory variables


#1

I'm attempting to build a rolling regression framework that could process a dynamic list of explanatory variables. I've tried this a few ways but haven't yet found an ideal solution that achieves the following:

  1. allows me to pass a dynamically generated list of explanatory variables (I have had success doing this using tidyquant::tq_mutate, but not in tibbletime::rollify framework)

  2. allows me to run the rolling regressions in parallel (I have had success doing this using tibbletime::rollify framework but not in tidyquant::tq_mutate)

# rollify regression example
library(tidyquant)
library(tibbletime)
library(multidplyr)

# price time series
price_df <- c("FB", "AAPL", "GOOG", "NFLX") %>%
    tq_get(
        get = "stock.prices",
        from = "2010-01-01",
        to = Sys.Date()-1) %>%
    group_by(symbol) %>%
    tq_transmute(
        select ="adjusted",
        mutate_fun = periodReturn,
        period = "daily",
        col_rename = "daily_return"
    )

# defining index names as a vector so they can be called in function execution later
indexes <- c("^GSPC", "XLK")

# pull index time series
indx_df <- indexes %>%
    tq_get(
        get = "stock.prices",
        from = "2010-01-01",
        to = Sys.Date()-1) %>%
    group_by(symbol) %>%
    tq_transmute(
        select = adjusted, 
        mutate_fun = periodReturn, 
        period = "daily") %>%
    spread(key = symbol, value = daily.returns)


# framework to explicitly pass a determined number of explanatory variables
rolling_regr <- rollify(.f = ~lm(..1 ~ ..2 + ..3), window = 252, unlist = FALSE)

# framework to dynamically pass a vector of explanatory variables
# can this be done in rollify framework?
#rolling_regr <- rollify(.f = ~lm(paste0(.x," ~ `",paste0(.y,collapse = "` + `"),"`")), window = 252, unlist = FALSE)

# join price_df and index_df and run rolling regression - sequential execution
rolling_regr_df <- price_df %>%
    left_join(indx_df, by = "date") %>%
    group_by(symbol) %>%
    # explicit rolling regression - works!
    #mutate(rolling_regr = rolling_regr(..1 = daily_return, ..2 = `^GSPC`, ..3 = `XLK`))
    # dynamic rolling regression framework - does not work
    mutate(rolling_regr = rolling_regr(.x = "daily_return", .y = indexes))

# # bonus credit - multidplyr parallelized rolling regression
#  # not strictly necessary to debug the dynamic rolling regression
# cl <- create_cluster(cores = 4)
# 
# cl %>%
#     cluster_copy(rolling_regr) %>%
#     cluster_assign_value("indexes", indexes)
# 
# # join price_df and index_df and run rolling regression
# rolling_regr_df <- price_df %>%
#     left_join(indx_df, by = "date") %>%
#     partition(symbol, cluster = cl) %>%
#     # explicit rolling regression - works!
#     mutate(rolling_regr = rolling_regr(..1 = daily_return, ..2 = `^GSPC`, ..3 = `XLK`)) %>%
#     # dynamic rolling regression framework - does not work
#     #mutate(rolling_regr = rolling_regr(.x = "daily_return", .y = indexes)) %>%
#     collect()

here's the error I get when I attempt to run the "dynamic" version:
Error in mutate_impl(.data, dots) :
Evaluation error: Cannot roll apply with a window larger than the length of the data.

The rollify framework seems to be more flexible than using tq_mutate - rollapply to convert functions into rolling functions. Any ideas?


#2

@davis solved it and posted the solution here

"How is this? It let's you specify a dynamic number of regression X variables in the mutate() call. The first argument to rolling_regr_dyn() will be the Y variable, and any variables following it will be interpreted as X variables. If you really want to use that indexes variable, you can wrap it in a quosure and unquote it in the mutate call. An example of that is shown at the end as well." - @davis