Predict function error

Hello everyone,

I'm still learning R, and I cannot find a solution to my problem. I have constructed a regression model that has for regressand the average wage of an individual and for regressors its age, years of education, and IQ score. I based my regression on the data "wage2" from the package Wooldridge. I would like to predict the average change (I assume it will be a reduction) in wages considering a 4 years reduction in years of education. Any idea how I could do that using the function predict()? To begin with, I tried to assess the accuracy of my model by predicting the wage given a specific value of 'years of education' (12 years), but I am getting an error: "Error in eval(predvars, data, env) : object 'age' not found". You can see my code below:

library(Wooldridge)
data("wage2")

model1 <- lm(wage ~ age + educ + IQ, data = wage2)

predict(model1, data.frame(educ = 12))

Can anyone help me?

Thank you!

In the data frame where you set educ, you also have to set age and IQ. I would set age and IQ to some kind of "typical" value and then set educ to two values separated by four years.

As pointed out by @FJCC , You need to provide the predictors in your new data frame. Please make a follow up down here;

library("wooldridge")
data("wage2")

model1 <- lm(wage ~ age + educ + IQ, data = wage2)
predict(model1, data.frame(age = median(wage2[['age']]), IQ = median(wage2[['age']]), educ = 12))
#>        1 
#> 528.5071

# Alternatively, you can pass your dataframe with 'edu == 12'
df_new <- wage2[wage2$educ == 12, ]
predict(model1, newdata = df_new )

# Slice the data frame at 'edu = 12' for further plotting
df_new2 <- df_new[c('hours', 'wage', 'age', 'educ', 'IQ')] 
new_wage <- predict(model1, newdata = df_new )

df_plt <- data.frame(df_new2[c('hours', 'wage')], 'new_wage' = new_wage) # df for plotting

# Let us do the plot. I assume a Time series of wages vs hours
df_plt |> reshape::melt(id = 'hours') |> 
  ggplot2::ggplot(aes(x = as.character(hours), y = value, group = variable, color = variable)) +
  geom_line(size = 1.0, alpha = 0.75) + geom_point(size = 2.2, alpha = 0.75) +
  labs(title = 'Wages at edu = 12', x = 'Hours', y = 'Wages')

Thanks to both of you!

Let me point out that this is a linear regression. You don't need predict() to find the effect of a 4 year reduction in education and the values of the other variables are irrelevant. You just multiply the coefficient on education by 4. (Unless, the purpose is to learn how to use predict()).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.