How to compare the model fit between two models?

Hi all,

I am currently building a prediction model and aim to improve the model fit. I came across two methods of increasing model fit: (1) interaction termand (2) squared term.

Can someone pls show me how I can confirm if adding these terms will indeed increase the model fit (i.e. what analytical approach can we use to compare the model fit between two models, one with the additional term and one without)?

Thanks so much!

I would use an ANOVA test, which will compare two models in order to determine whether or not there is a significant difference between the two. If there isn't, then the additional terms can be dropped, as they add nothing of significance to the model's fit.

Here is a link to the documentation:
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/anova

1 Like

Thank you very much for this. I will give it a try!
May I ask if you are aware of any other methods (other than interaction & squared terms) that I can use to increase the model fit?

Here I fit 5 models over a modified version of iris (which I made include a random variable (that therefore you wouldnt expect to improve model fit).
I then assess all the models with AIC criteria, small numbers are better. This correctly identifies that lm4 , the fourth version is optimal in so far as it has the greatest accuracy from the simplest model


modiris <- iris
set.seed(42)
modiris$somerand <- runif(n=150)

lm1 <- lm(Petal.Length ~  Sepal.Length , data=modiris)
lm2 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width , data=modiris)
lm3 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width, data=modiris)
lm4 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width + Species, data=modiris)
lm5 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width + Species + somerand, data=modiris)

lapply(list(lm1,lm2,lm3,lm4,lm5),FUN = AIC)
1 Like

Thank you very much for this. Can I ask what you meant by accuracy? Do you refer to AUC?

Here is the description :
Akaike information criterion - Wikipedia

1 Like

Thanks!! I saw you fit 5 predictors (you have N=150). Do you know how many predictors I can fit if I only have N=50 (40 with diseases and 20 without)?

Thank you for the suggestion. Could you pls share how the code would be like? I tried "anova(m1,m2)" where m1 is my 1st model and m2 is my second model. However, I did not get any p-value (I am sure I made some mistakes..). I am comparing two logistic models btw. Does ANOVA still hold true in this case? Any help would be very appreciated!

Hi @nirgrahamuk Now I understand better how to compare model fit between two models. I wonder, do you know how I can improve model fit for my predictive model? I have two predictors in my model: BMI (continuous variable) and smoking status (yes/no).