lm interaction categorical/continue variables

Hi there! I've got a question concerning the mathematical formulation of a model containing an intercation between a categorical and a continue variables using lm function.
it's built this way :
lm(Y~X1*X2, data=data)
with
Y being the variable to predict
X1 being a continuous variable
X2 being a categorical variable.
I obtained estimates :
a - intercept
b1- estimate of the continuous effect
b2- estimate of the categorical variable having a different value for each level of X2
b3-estimate of the interaction having a different value for each level o X2.

Now the fitted values obtained with this model are pretty good and within the range of observed data. The thing is, when I try to apply this model in an other software to estimate Y, I formulated this way :

Y=a+b1X1+b2+b3X1

And this gives values which are not even realistic about what should be expected. Do you know how I should translate this interaction in my formula to get a correct estimation of Y?

because of data privacy, I can't produce a reprex to help, but in my mind, it is more a question related to the matematical formulation of the interaction within the lm function that something related to coding... Anyway, thanks for your help and I'll be happy to give you any possible precision :smile:

can you duplicate this in your other software ?

set.seed(42)
rv1 <- sample.int(3,1000,replace=TRUE)
rv2 <- sample.int(6,1000,replace=TRUE)
X1 <- 1:1000
X2 <- factor(c(rep("blue",500),rep("red",500)))
Y  <- ifelse(X2=="blue",X1*rv1,
             X1*rv2
)

(mydata <- data.frame(Y,
                     X1,
                     X2))

(my_lm <- lm(Y~X1*X2,data=mydata))

mydata$lm_pred <- predict(my_lm,newdata = mydata)

manual_pred <- function(a,b){
  int <- my_lm$coefficients[[1]]
  x1_coeff <- my_lm$coefficients[[2]]
  x2red_coeff <- my_lm$coefficients[[3]]
  x1_adjust_for_x2red <- my_lm$coefficients[[4]]
  
  int + 
    a* ifelse(b=="red",x1_coeff+x1_adjust_for_x2red,x1_coeff) +
   ifelse(b=="red",x2red_coeff,0)
}
mydata$lm_manpred <- manual_pred(mydata$X1,mydata$X2)

ggplot(data=mydata,
       mapping=aes(x=X1,color=X2,
                   y=Y)) + geom_point() + 
  geom_line(aes(y=lm_pred),color="black",linetype=3,size=2) +
  geom_line(aes(y=lm_manpred),color="red",linetype=2,size=1) 

@nirgrahamuk the other software is QGIS. I don't know how to reproduce this kind of coding in it. I updated the layers with a column taking the adequate values of the estimates. Theoretically, just taking these values in my formula should do it. To be more specific, the estimates of my categorical variable X2 are positive without interaction and when I model it into QGIS I have good results. But when I add the interactions, X2 estimates becomes strongly negative and the interaction effect doesn't seems to be strong enough, giving me negative values for a chemical concentration I try to predict...

I'm afraid your not giving any information that might be useful to help you...
first of all I don't know QGIS so I can't comment on that..
What is your 'formula' though, is it correct ? if it was correct pasing the correct parameters would give correct results.
My example shows how I can construct a manual prediction, this should be reproducible in any coding language. if QGIS can then it can, if it cant it cant, you tell me !

I'm trying to do it in QGIS right now. Thanks anyway, for the time and help. I'm sorry not being able to be more specific, but the data and topic are kept under a strict privacy policy... I'm well aware it doesn't help my case :confused:

Do you understand how to do it in R though, before you try in QGIS ?
i.e. can you follow my worked example as it relates to manual predictions.
I would think this is key.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.