Need help with interpreting a model parameters

I am fitting a logistic regression model for my thesis and I am looking at univariable models to determine significant predictors that I can include in the multivariable model for prediction. The output for one of the variables (wealth quintile) is as shown below; Should I conclude that the variable is a significant predictor or not?

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -0.28478    0.04200  -6.781 1.19e-11 ***
wealthquintile.L -0.62622    0.09939  -6.301 2.96e-10 ***
wealthquintile.Q -0.23414    0.09694  -2.415   0.0157 *  
wealthquintile.C -0.10522    0.08983  -1.171   0.2415    
wealthquintile^4 -0.11141    0.08904  -1.251   0.2109

Hi,

Welcome to the RStudio community!

The best way of getting a good answer to your question is to provide us a more in depth explanation on the data you're working with, your goals and the reason for using regression. Interpretation of machine learning model parameters is a tricky business and should always be done in the full context of the data and question at hand.

Apart from some more details on the issue itself, it's always a good idea as well to provide a reprex, where you create some code that consists of a minimal dataset and the code you like to run.

Kind regards,
PJ

Hi, and welcome to the community and, as well, to the wonderfully wacky world of logistic regression. I'm in the middle of unpacking, so I can just give you the view from 40,000 feet tonight.

The typical logistic regression model is in the form

glm(y \tilde{} x_i + ... x_n)

There are four steps in evaluating a logistic model.

  1. Selection of the parameters. There are several ways to do this. One is to use a saturated model with all of the available independent variables. For a given \alpha, the x terms that have a p-value greater than \alpha are successively discarded from the model.

  2. Calculation of odds ratio.

odr <- function(x) {
    exp(cbind(OR = coef(x), confint(x)))
}

This gives an indication whether observing x makes observing y more likely (OR > 1), less likely (OR < 1) or equally likely (OR = 1), and allows testing whether the OR falls within a given two-sided confidence interval.

  1. Next comes a goodness of fit test, such as Hosmer-Lemeshow goodness of fit, which has a null hypothesis H_0, that the fit is poor; accordingly a high p-value is evidence of a good fit. The generalhoslem package will produce a test statistic with the hoslem.test function. It also provides tables of expected and observed frequencies.

  2. If the stars align, the final step of model diagnostics may not be needed.

The standard text is Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression, 3rd Edition. 2013. New York, USA: John Wiley and Sons.

1 Like

Since wealthquintile is encoded as an ordered factor, you get a set of polynomial variables generated from that one column. I don't think that this is the best idea and tend to convert these variables to unordered factors (but that's just my preference).

Since you have multiple polynomials, you would have to conduct an overall ANOVA with and without this predictor. Here an example using a different data set

library(broom)

set.seed(2424)
dat <- data.frame(
  y = factor(rep(c("yes", "no"), 200)),
  x = ordered(sample(letters[1:4], 400, replace = TRUE))
)

lr_mod <- glm(y ~ x, data = dat, family = binomial())

# 4 level factor => 3 polynomial variables
tidy(lr_mod)
#> # A tibble: 4 x 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)  0.00334     0.101    0.0332   0.973
#> 2 x.L         -0.148       0.206   -0.718    0.473
#> 3 x.Q          0.239       0.201    1.19     0.234
#> 4 x.C          0.0968      0.196    0.494    0.622

# 3 df test for the overall effect of x
anova(lr_mod, test = "LRT")
#> Analysis of Deviance Table
#> 
#> Model: binomial, link: logit
#> 
#> Response: y
#> 
#> Terms added sequentially (first to last)
#> 
#> 
#>      Df Deviance Resid. Df Resid. Dev Pr(>Chi)
#> NULL                   399     554.52         
#> x     3   2.2333       396     552.28   0.5254

Created on 2019-10-10 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.