I'm attempting to build a rolling regression framework that could process a dynamic list of explanatory variables. I've tried this a few ways but haven't yet found an ideal solution that achieves the following:
-
allows me to pass a dynamically generated list of explanatory variables (I have had success doing this using tidyquant::tq_mutate, but not in tibbletime::rollify framework)
-
allows me to run the rolling regressions in parallel (I have had success doing this using tibbletime::rollify framework but not in tidyquant::tq_mutate)
# rollify regression example
library(tidyquant)
library(tibbletime)
library(multidplyr)
# price time series
price_df <- c("FB", "AAPL", "GOOG", "NFLX") %>%
tq_get(
get = "stock.prices",
from = "2010-01-01",
to = Sys.Date()-1) %>%
group_by(symbol) %>%
tq_transmute(
select ="adjusted",
mutate_fun = periodReturn,
period = "daily",
col_rename = "daily_return"
)
# defining index names as a vector so they can be called in function execution later
indexes <- c("^GSPC", "XLK")
# pull index time series
indx_df <- indexes %>%
tq_get(
get = "stock.prices",
from = "2010-01-01",
to = Sys.Date()-1) %>%
group_by(symbol) %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "daily") %>%
spread(key = symbol, value = daily.returns)
# framework to explicitly pass a determined number of explanatory variables
rolling_regr <- rollify(.f = ~lm(..1 ~ ..2 + ..3), window = 252, unlist = FALSE)
# framework to dynamically pass a vector of explanatory variables
# can this be done in rollify framework?
#rolling_regr <- rollify(.f = ~lm(paste0(.x," ~ `",paste0(.y,collapse = "` + `"),"`")), window = 252, unlist = FALSE)
# join price_df and index_df and run rolling regression - sequential execution
rolling_regr_df <- price_df %>%
left_join(indx_df, by = "date") %>%
group_by(symbol) %>%
# explicit rolling regression - works!
#mutate(rolling_regr = rolling_regr(..1 = daily_return, ..2 = `^GSPC`, ..3 = `XLK`))
# dynamic rolling regression framework - does not work
mutate(rolling_regr = rolling_regr(.x = "daily_return", .y = indexes))
# # bonus credit - multidplyr parallelized rolling regression
# # not strictly necessary to debug the dynamic rolling regression
# cl <- create_cluster(cores = 4)
#
# cl %>%
# cluster_copy(rolling_regr) %>%
# cluster_assign_value("indexes", indexes)
#
# # join price_df and index_df and run rolling regression
# rolling_regr_df <- price_df %>%
# left_join(indx_df, by = "date") %>%
# partition(symbol, cluster = cl) %>%
# # explicit rolling regression - works!
# mutate(rolling_regr = rolling_regr(..1 = daily_return, ..2 = `^GSPC`, ..3 = `XLK`)) %>%
# # dynamic rolling regression framework - does not work
# #mutate(rolling_regr = rolling_regr(.x = "daily_return", .y = indexes)) %>%
# collect()
here's the error I get when I attempt to run the "dynamic" version:
Error in mutate_impl(.data, dots) :
Evaluation error: Cannot roll apply with a window larger than the length of the data.
The rollify framework seems to be more flexible than using tq_mutate - rollapply to convert functions into rolling functions. Any ideas?