Create a tibble with models to estimate

Hi there,

I want to create a tibble with all models I will run, and there specification, based on a list of strings that are the column name.

For example, I will use the idea from Forecast: Principles and Practice book: 7.5 Selecting predictors | Forecasting: Principles and Practice (3rd ed)

The idea there was to estimate different combinations of regressor to explain US Consumption.

The output I am looking for is something like it:

tbl_models <- tibble(
  model_id = 1:16,
  regressors = list(
    list(""),
    list("Income"),
    list("Income", "Production"),
    list("Income", "Production", "Savings"),
    list("Income", "Production", "Unemployment"),
    list("Income", "Production", "Savings", "Unemployment"),
    list("Income", "Savings"),
    list("Income", "Savings", "Unemployment"),
    list("Income", "Unemployment"),
    list("Production"),
    list("Production", "Savings"),
    list("Production", "Unemployment"),
    list("Production", "Savings", "Unemployment"),
    list("Savings"),
    list("Savings", "Unemployment"),
    list("Unemployment"))
)

Actually, the column "regressors" should be a list of regressor, only, not like the one above. I am struggling to create this tibble.

Here is my attempt to write a code:

library(tidyverse)
library(fpp3)

list_of_regressores <- us_change[3:6] %>% colnames()

regressors <- map(.x = 1:length(list_of_regressores),
            .f = ~ combn(x = list_of_regressores,
                       m = .x))

tbl_models <- tibble(
  model_id = 1:16,
  regressors = test
)

Here are the problems:
The variable regressors is a list of matrices, where each column of each matrix should be a list. And each list should be in a row of tbl_models.

The column model_id in tbl_models should be just a row numbering for the number of models. So, if I increase the number of regressors, it should automatically increase the number 16 to something else.

Any idea?

Take a look at the {tidymodels} package, which may be more straightforward.

The idea is to use tidymodels, but I would like to create a list of different models first, and then use tidymodels to estimate each specification and evaluate. But how can I specify all the models I want to run?

There are two ways I can imagine. One the the above, where I create a tibble with all models I want to run, and then use tidymodels to estimate and evaluate.

The other way, instead of specify the equation, I could create as many as necessary dataset, each of them considering a different combination of regressors, and the estimate a model for each dataset, where the equation is Consumption ~ ..

I thought it would be more efficient to specify the equations instead create multiple datasets.

Would something like this work as a feedstock?

test1 <- function(x) x == 1
test2 <- function(x) x < length(steps) & x > 1 
test3 <- function(x) x == length(steps)

forepart <- "fable::TSLM(Consumption ~ "
aftpart  <- " + trend())"
Vars <- fpp3::us_change[3:6] %>% colnames()
steps <- purrr::accumulate(Vars, paste, sep = " + ")

out <- list()

for (i in seq_along(steps)) {
  if(test1(i)) out[i] = glue::glue(forepart,glue::glue(steps[i],aftpart))
  if(test2(i)) out[i] = glue::glue(forepart,glue::glue(steps[i], " +",aftpart))
  if(test3(i)) out[i] = glue::glue(forepart,glue::glue(steps[i],aftpart))
}

models <- rev(out)
models

# [[1]]
# [1] "fable::TSLM(Consumption ~ Income + Production + Savings + Unemployment + trend())"
# 
# [[2]]
# [1] "fable::TSLM(Consumption ~ Income + Production + Savings + + trend())"
# 
# [[3]]
# [1] "fable::TSLM(Consumption ~ Income + Production + + trend())"
# 
# [[4]]
# [1] "fable::TSLM(Consumption ~ Income + trend())"

Not exactly.

I made some progress to create a tibble with all models I want to estimate:

library(tidyverse)
library(tidylog)
library(fpp3)


list_of_regressors <- c(1, us_change[3:6] %>% colnames())



regressor_combination <- function(.regressors = c(""),
                                  .min_regs = 1,
                                  .max_regs = length(.regressors)) {
  
  regressors_list <- map(.x = .min_regs:.max_regs,
                         .f = ~ combn(x = .regressors,
                                      m = .x) %>%
                           split(x = .,
                                 f = rep(1:ncol(.), each = nrow(.)))) %>%
    do.call(what = c,
            args = .) %>%
    unname()
  
  output <- tibble(
    regressors = regressors_list
  ) %>%
    mutate(id = row_number()) %>%
    select(id, everything()) %>%
    mutate(formula =  map(.x = regressors,
                          ~ paste0(.x, collapse = " + ")))
  
  return(output)
  
}


test <- regressor_combination(.regressors = list_of_regressors)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.