Logistic regression with independent variable as factor

fcas80 · September 15, 2020, 6:34pm

Here is a logistic regression with a factor independent variable having three levels 0, 1, 2. The regression creates two dummy variables x1and x2. Is x1 equal to my level 0 or my level 1? And why does the regression only show two variables rather than three?

df <- data.frame(cbind(x=c(0,0,0,0,0,1,1,1,1,1,2,2,2,2,2), y=c(0,1,0,1,1,0,0,1,1,1,0,1,1,1,1)))
df$x <- factor(df$x)
model <- glm(y~x, data=df, family="binomial")
summary(model)

'data.frame': 15 obs. of 2 variables:
x: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 2 2 2 2 2 ... y: num 0 1 0 1 1 0 0 1 1 1 ...

Call:
glm(formula = y ~ x, family = "binomial", data = df)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.7941 -1.3537 0.6681 1.0108 1.0108

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.055e-01 9.129e-01 0.444 0.657
x1 1.282e-16 1.291e+00 0.000 1.000
x2 9.808e-01 1.443e+00 0.680 0.497

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 19.095  on 14  degrees of freedom

Residual deviance: 18.464 on 12 degrees of freedom
AIC: 24.464

Number of Fisher Scoring iterations: 4

joels · September 15, 2020, 6:46pm

The reference level is x=0. The coefficients x1 (x=1) and x2 (x=2) are relative to x=0. The intercept alone gives the predicted outcome when x=0. In other words, here are the fitted values for each possible level of x (where p is the probability of y=1, and log(p/1-p) is the log odds of the outcome.

x = 0: log(p/(1-p)) = 0.04055
x = 1: log(p/(1-p)) = 0.04055 + 1.28e-16
x = 2: log(p/(1-p)) = 0.04055 + 0.9808

fcas80 · September 15, 2020, 6:50pm

Thanks Joels.

What if I had two independent factor variables u and v? Wouldn't I want to see the regression parameters separately for u0 and v0?

joels · September 15, 2020, 6:55pm

It's the same idea, but now each independent variable has a reference level and one or more other levels. This StackOverflow answer that I wrote a few years ago explains a similar case of logistic regression with two independent categorical variables (although in that example, there's also an interaction term included).

So, let's say u can have categories 0 and 1 and v can have categories 0 and 1. And let's call the regression coefficients u1 for u=1 and v1 for v=1. Then the possible fitted values would be:

u=0, v=0: log(p/(1-p)) = Intercept
u=0, v=1: log(p/(1-p)) = Intercept + v1
u=1, v=0: log(p/(1-p)) = Intercept + u1
u=1, v=1: log(p/(1-p)) = Intercept + u1 + v1

system · October 6, 2020, 6:55pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.