Hi, in the following code, how did the lm call know to include Verb and Math as variables in the formula? Also, what does the I do in I(Verb^2) ? Thank you.
colnames(collgpa)
[1] "ID" "Verb" "Math" "Gpa"
model <- lm(Gpa ~ Verb*Math +I(Verb^2) + I(Math^2), data = collgpa)
summary(model)
Call:
lm(formula = Gpa ~ Verb * Math + I(Verb^2) + I(Math^2), data = collgpa)
Residuals:
Min 1Q Median 3Q Max
-0.50180 -0.05485 0.02719 0.10687 0.35148
The following line tells R to fit a linear model (hence lm()) where GPA is modeled as a function of the interaction between verbal score, math score and the squares of those terms.
model <- lm(Gpa ~ Verb*Math +I(Verb^2) + I(Math^2), data = collgpa)
Specifically, your model is GPA = -7.22 + 0.126 x Verb + 0.117 x Math - 0.00113 x Verb^2 -0.00106 x Math^2 +0.000878 x Math x Verb. Although it only uses Math and Verb as input variables, because of the ^2 terms and the interaction, your resulting linear model has 5 coefficients plus the intercept.
The I() around Verb^2 and Math^2 forces R to treat those as separate variables when fitting the model.
When you include an interaction term, lm() automatically includes the separate level terms. ^ has a special meaning in a formula; that's why it needs to be inside the I().
Just to be more explicit on that first point: lm() expands Verb*Math so as to include three terms: Verb , Math and Verb x Math.
And as @startz startz said, the ^ character has a different meaning than "to the power of" when used within formulas such as lm(). So, if you want to include the squared term, it needs to be within the I() -the so-called "AsIs" function. That preserves the intended meaning. See ?I for more details.