Prediction with missing coefficients

clausp · June 13, 2019, 10:26pm

I am trying to figure out why predict does not show NA for the predictions based on combinations for which the coefficient on the interaction is NA. Instead it simply ignores the interaction (treat it as zero) and calculate a prediction based on all the other coefficients. It does provide a warning, but there is no information about which combinations are not present in the original.

# Predict with missing coefficients
model_results <- lm(mpg ~ wt + factor(am)*factor(carb), data = mtcars)

# Generate new data
wt <- c(3)
am <- c(0:1)
carb <- c(1:4, 6, 8)

# expand.grid provides all possible combinations of variables
prediction_data <- expand.grid(wt = wt,
                               am = am,
                               carb = carb)

# Get a prediction even if combination relies on an NA coefficient
predicted_mpg <- data.frame(
  predict(
    model_results, prediction_data, interval = "confidence"
    ),
  prediction_data 
)
#> Warning in predict.lm(model_results, prediction_data, interval =
#> "confidence"): prediction from a rank-deficient fit may be misleading

^{Created on 2019-06-13 by the reprex package (v0.2.1)}

If there is no way of making predict show NA when relying on NA coefficients, is there an easy way to check which interaction coefficients are NA and get rid of the predictions in those cases?

aosmith · June 14, 2019, 6:51pm

If you make a new factor representing the factor combinations you could make a prediction dataset that doesn't contain combinations that aren't present in the original dataset. (To be honest, I often switch to a "simple effects" model with a single combined factor when factors aren't perfectly crossed, anyway, and then use post hoc comparisons to address specific research questions.)

First, making a prediction dataset that only contains factor combinations present in the original data:

# Make the combined factor with interaction() or paste()
mtcars$am_carb = with(mtcars, interaction(am, carb, drop = TRUE) )

# Make the prediction dataset
prediction_data2 = expand.grid(wt = 3,
                              am_carb = unique(mtcars$am_carb) )

# Separate combined factor into 2 for prediction
prediction_data2 = tidyr::separate(prediction_data2, 
                                   col = am_carb, into = c("am", "carb"))
# Predictions
data.frame(
    predict(model_results, 
            prediction_data2, 
            interval = "confidence"),
    prediction_data2
)
       fit      lwr      upr wt am carb
1 18.84016 15.35725 22.32308  3  1    4
2 25.55748 21.93905 29.17591  3  1    1
3 20.51157 17.03560 23.98755  3  0    1
4 20.94235 18.31477 23.56994  3  0    2
5 19.37929 15.70820 23.05039  3  0    4
6 19.58471 15.64205 23.52737  3  0    3
7 23.27641 19.58356 26.96925  3  1    2
8 18.82153 12.78292 24.86015  3  1    6
9 17.17707 11.03363 23.32052  3  1    8

If you have a reason to keep the rows for combinations that don't exist in the original data, you could subset prediction_data using the combined factor and then re-merge things.

prediction_data = expand.grid(wt = 3,
                              am = 0:1,
                              carb = c(1:4, 6, 8))

# Combined variable
prediction_data$am_carb = with(prediction_data, interaction(am, carb, drop = TRUE))

pred2 = data.frame(
    predict(model_results, 
            subset(prediction_data, am_carb %in% mtcars$am_carb), 
            interval = "confidence"),
    subset(prediction_data, am_carb %in% mtcars$am_carb) 
)

merge(prediction_data, pred2, all.x = TRUE)

   wt am carb am_carb      fit      lwr      upr
1   3  0    1     0.1 20.51157 17.03560 23.98755
2   3  0    2     0.2 20.94235 18.31477 23.56994
3   3  0    3     0.3 19.58471 15.64205 23.52737
4   3  0    4     0.4 19.37929 15.70820 23.05039
5   3  0    6     0.6       NA       NA       NA
6   3  0    8     0.8       NA       NA       NA
7   3  1    1     1.1 25.55748 21.93905 29.17591
8   3  1    2     1.2 23.27641 19.58356 26.96925
9   3  1    3     1.3       NA       NA       NA
10  3  1    4     1.4 18.84016 15.35725 22.32308
11  3  1    6     1.6 18.82153 12.78292 24.86015
12  3  1    8     1.8 17.17707 11.03363 23.32052

clausp · June 14, 2019, 7:29pm

Thank you so much for your help. I especially like the second solution since the predictions will eventually end up in a graph.

system · June 21, 2019, 7:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.