Multiple linear regression models with interaction and coefficient interpretation

Hello everyone,

I’m very new to R and statistics in general and have some questions regarding linear regression models with multiple predictors.

If I have a dataset containing 4 predictor variables (of which 2 are binary (dummy coded) variables and the other two are continuous, as well as a continuous dependent variable, and I want to know how they all interact with each other as well as influence my DV, how would I do that?

I have already established that there is an interaction between one continuous and one dummy variable (which however aren’t good individual predictors for the outcome).
Can I do a model of the following form (is this mathematically possible) and if yes, how do I interpret the coefficients? Are the other variables that aren’t part of the interaction also influenced by the terms within the interaction and need to be interpreted in light of the interaction?

Y= b0 + b1X1 + b2D1 + b3X2 + b4D2 + b5(X2*D2)

Thank you!

What you propose is fine.

In your example the marginal effect of a change in X2 is b3+b5*D2, so you get different marginal effects depending on whether D2 is 0 or 1.

The marginal effect of X1 is still just b1.

Thank you for your reply!
Do I understand it correctly then, if I assume that the coefficient b1 of my variable X1 tells me the increase or decrease in my y variable for every one unit increase in my X1 variable? The interaction doesn’t affect the interpretation of the coefficient?

Does the same then apply to the coefficient of the other binary variable which isn’t included in the interaction?

You are exactly right, and yes, same for the other binary variable.

Thank you very much for the clarification! That was very helpful!

I’m assuming if that is the case, when interpreting the effects of X1 and D1, I can only talk about their effect on Y and not on how or whether they affect the interaction between X2 and D2?
Is there any way of testing whether additional variables have an affect on an interaction and how that interaction affects the dependent variable?

Excuse the many questions, I’m still very new to this and am trying to understand what different models do exactly!

You ask questions very clearly, which make it much easier to answer.

The regression tells you the effect of the independent variables on the dependent variable. But what you can test is whether b5=0. If it does, then there is no interaction.
That doesn't mean that X2 are unrelated. It means that the value of one has no effect on the effect of the other.

Yes. I understand the bit about the interaction. However I’m not quite clear about the following: within that model, can I get information on how my X1 variable at different values, has an effect on my dependent variable at different levels of D1 and D2?

Or would I need to create another model for this?

According to your model, X1 has the same effect on Y for all values of D1 and D2. Perhaps you want to add interactions between X1 and D1 and/or D2 in just the same way you've done between X2 and D2.

Thank you very much!

So if I were to create a new model with the same Y component and the aim to investigate a possible interaction between X1 and X2 as well as X1 and D2, could I do that in one model or would that make it unreliable and be rather unsuitable to have two interactions in the same model with the same term (X1) being used twice?

One model is the right way to go!

Great, I just wasn’t sure whether it is reasonable to create a model with two interaction terms in it.

Then, if I were to see whether X1 and D1 account for effects of X2 and D2, which model would be most suitable for that investigation?

This isn't an easy question. if you have reason to think that all the variables matter, then the best model is one that includes them all.

I see. I actually am not sure in that case. If I merely want to know whether X1 and D1 account for effects of X2 and D2, how would I go on about that? Is that essentially testing for an interaction between those then?

I am slightly overwhelmed by datasets with more than 2 predictor and one outcome variable still, and don’t have enough knowledge on how I would test this and which regression model results to evaluate

One approach is to run the regression with all the variables. Then test that the coefficients on X2, D2 and X2*D2 equal zero. If you can't reject the test then there is an argument (though not a perfect one) that it is okay to leave them out.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.