How to compare the model fit between two models?

RRuser199 · November 23, 2021, 2:34am

Hi all,

I am currently building a prediction model and aim to improve the model fit. I came across two methods of increasing model fit: (1) interaction termand (2) squared term.

Can someone pls show me how I can confirm if adding these terms will indeed increase the model fit (i.e. what analytical approach can we use to compare the model fit between two models, one with the additional term and one without)?

Thanks so much!

BLukomski · November 23, 2021, 3:09pm

I would use an ANOVA test, which will compare two models in order to determine whether or not there is a significant difference between the two. If there isn't, then the additional terms can be dropped, as they add nothing of significance to the model's fit.

Here is a link to the documentation:
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/anova

RRuser199 · November 23, 2021, 5:27pm

Thank you very much for this. I will give it a try!
May I ask if you are aware of any other methods (other than interaction & squared terms) that I can use to increase the model fit?

nirgrahamuk · November 23, 2021, 5:34pm

Here I fit 5 models over a modified version of iris (which I made include a random variable (that therefore you wouldnt expect to improve model fit).
I then assess all the models with AIC criteria, small numbers are better. This correctly identifies that lm4 , the fourth version is optimal in so far as it has the greatest accuracy from the simplest model


modiris <- iris
set.seed(42)
modiris$somerand <- runif(n=150)

lm1 <- lm(Petal.Length ~  Sepal.Length , data=modiris)
lm2 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width , data=modiris)
lm3 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width, data=modiris)
lm4 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width + Species, data=modiris)
lm5 <-   lm(Petal.Length ~  Sepal.Length +Sepal.Width +Petal.Width + Species + somerand, data=modiris)

lapply(list(lm1,lm2,lm3,lm4,lm5),FUN = AIC)

RRuser199 · November 23, 2021, 5:48pm

Thank you very much for this. Can I ask what you meant by accuracy? Do you refer to AUC?

nirgrahamuk · November 23, 2021, 5:51pm

Here is the description :
Akaike information criterion - Wikipedia

RRuser199 · November 23, 2021, 5:56pm

Thanks!! I saw you fit 5 predictors (you have N=150). Do you know how many predictors I can fit if I only have N=50 (40 with diseases and 20 without)?

RRuser199 · November 24, 2021, 3:35am

Thank you for the suggestion. Could you pls share how the code would be like? I tried "anova(m1,m2)" where m1 is my 1st model and m2 is my second model. However, I did not get any p-value (I am sure I made some mistakes..). I am comparing two logistic models btw. Does ANOVA still hold true in this case? Any help would be very appreciated!

RRuser199 · December 2, 2021, 2:56am

Hi @nirgrahamuk Now I understand better how to compare model fit between two models. I wonder, do you know how I can improve model fit for my predictive model? I have two predictors in my model: BMI (continuous variable) and smoking status (yes/no).

system · December 23, 2021, 2:57am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.