Reporting intercept-only model

Hiya,

I'm a beginner at Studio and I'm feeling particularly low in confidence and would love some insight into how others would interpret this finding. The idea is to fit an appropriate intercept-only model predicting whether a respondent has felt depressed in the past week and then to state how does the intercept in this model represent the likelihood of having felt depressed.

With this output:

mod1 <- lm(formula = depressed ~ 1, data = dat)
summary(mod1)

Call:
lm(formula = depressed ~ 1, data = dat)

Residuals:
Min 1Q Median 3Q Max
-0.3655 -0.3655 -0.3655 0.6345 0.6345

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.365506 0.008772 41.67 <2e-16 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4817 on 3014 degrees of freedom

My instinct is to answer:
"The intercept-only model predicts a 0.36 chance of a respondent reporting having felt depressed in the past week. Though the intercept is a small number, with a very small error margin and a high significance, it is worthy to consider the weight of the chance"

But I dont know why I feel like this is insufficient or even misguided, in that the 0.36 speaks to chance or percentage. Should anything else stand out to me in this output that could contribute to how understanding how the intercept represents likelihood of feeling depressed?

TIA

If you have a binary outcome (I'm assuming that, because you asked for a probability), you shouldn't use a linear model. There are people who do this and call it a linear probability model, but a logistic regression model is more appropriate.

That said, I suggest using glm(..., family = binomial()). The estimate is then on the log odds scale, and using plogis() should convert it to the probability.

Results may be similar, though.

There is nothing necessarily wrong with a linear probability model. That is especially true when the only variable is an intercept. Note that in this case the coefficient is just the mean.

In a linear probability model, both estimates and confidence intervals are not bounded within the [0, 1] range, which makes it potentially less accurate. I see no benefit using a LPM over a logistic regression model, even if results are similar.

True. But not in this case, where there is only an intercept.

I also like the logit (or probit) model, but so long as predictions stay within [0,1] the linear probability model is much easier to interpret.

thank you both for the insight. Its valuable to know that in lm, both estimates and confidence intervals are not bounded within the [0, 1] range.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.