# Creating a predictive model

Hi all!

I am trying to build a model that will predict a specific diagnosis is present. E.g CT chest showing calcification in coronary arteries or MRI showing typical amyloid pattern. I am trying to build the best possible model that would predict the positive scan based on the variables I have like age/gender/co morbidities etc. Do I need ti use a specific package/test or I can start with just logistic regression? And what are things I need to consider before starting a model? Do I need to include all the variables in the start or the one I think will be most relevant?
Thank you.
Zafar

You can start with logistic regression, and you can start using all the predictor variables. In R, you do not need a special package, just type

glm(formula = dependentvariable ~ . , family = binomial(link = "logit"), data = traindataset)

Then start eliminating predictor variables that are insignificant.

But also try other classification models such as decision trees, random forest, k-nearest neighbor, support vector machines, etc.

You might want to search for examples of predicting heart disease and breast cancer using machine language in R, which seem like similar problems.

Here is a very short introduction to logistic regression using a coronary heart disease dataset with age, age cohort and CDH as variables illustrating the use of glm, which is a standard methodology against the results of which any other techniques should be compared.

There is more to evaluating a model than simply discarding those parameters with a p-value below the selected &\alpha\$ in a forward or backwards stepwise selection because there can be no guarantee that a parameter that is scored insignificant in the presence of one set of parameters will also be insignificant with respect to a different set of parameters.

In addition to the Hosmer-Lemeshow-Sturdivant text cited in the link, Frank Harrell’s Regression Modeling Strategies and the associated {rms} package should be reviewed.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.