Hi. Suppose I have one continuous predictor X1 and one categorical predictor X2, I do a linear regression, and now I want a prediction for a particular value of X1, averaged of all values of X2. I am not sure how to handle the X2.
df <- data.frame(salary=c(10,20,30,40,50,5,10,15,20,25),
years=c(1,2,3,4,5,1,2,3,4,5),
gender=c("M","M","M","M","M","F","F","F","F","F"))
df$gender <- ifelse(df$gender=="F",0,1)
df$gender <- factor(df$gender)
model <- lm(salary ~ years + gender, df)
summary(model)
newdata <- data.frame(years=1, gender=mean(as.numeric(df$gender)))
predict(model, newdata)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.5000 3.4069 -2.201 0.063600 .
years 7.5000 0.9449 7.937 9.58e-05 ***
gender1 15.0000 2.6726 5.612 0.000805 ***
I get the following error:
Error: variable 'gender' was fitted with type "factor" but type "numeric" was supplied
In addition: Warning message:
In model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
variable 'gender' is not a factor.
I realize I can't really average men and women ...