I am looking to run different regressions on a data set, with the number of variables varying across the different models, and return predictions based on a prediction data set. The problem is that I cannot figure out how to use the formulas provided to generate the prediction data set. The following works but is ignorant of what variables are in the model and therefore uses all possible variables for the prediction data frame.
suppressMessages(library(tidyverse))
reg_form_list <- list(
as.formula(mpg ~ factor(am)),
as.formula(mpg ~ factor(am)*factor(gear)),
as.formula(mpg ~ factor(am)*factor(gear)*factor(cyl))
)
reg_predict <- function(df, reg_form) {
predict_df <- expand(df, am, gear, cyl) %>%
mutate(
var_combinations = interaction(am, gear, cyl, sep = "_")
)
df <- df %>%
mutate(
var_combinations = interaction(am, gear, cyl, drop = TRUE, sep = "_")
)
m <- lm(reg_form, data = df)
tibble(
predict(m,
subset(predict_df, var_combinations %in% df$var_combinations)),
subset(predict_df, var_combinations %in% df$var_combinations)
) %>%
rename(predicted = contains("predict")) %>%
right_join(predict_df, by = c("am", "gear", "cyl", "var_combinations"))
}
results <- map(reg_form_list, ~ reg_predict(mtcars, .))
#> Warning in predict.lm(m, subset(predict_df, var_combinations %in%
#> df$var_combinations)): prediction from a rank-deficient fit may be misleading
#> Warning in predict.lm(m, subset(predict_df, var_combinations %in%
#> df$var_combinations)): prediction from a rank-deficient fit may be misleading
Created on 2020-05-26 by the reprex package (v0.3.0)
I thought that maybe using all.vars
in the function (as shown below) would work, but I cannot figure out how to manipulate the returned vector of strings so it can be used both in expand
and in the right_join
at the end.
reg_predict <- function(df, reg_form) {
x_vars <- all.vars(reg_form)[-1]
predict_df <- expand(df, x_vars) %>%
mutate(
var_combinations = interaction(am, gear, cyl, sep = "_")
)
.
.
.
}
Any suggestions would be much appreciated. Also, if my basic approach of using a list to hold all the models can be improved, I would love to hear about that as well.