Logistic Regression with Proportion Data

Hello, I would like help creating a logistic model for collisions per population within census tracts. I am wondering the best R code to use for this.
I currently have:
mylogit <- glm(CollsPerPop ~ WalkPerc + PROP_LIM_AT + MalesToFemales + Non_Can_Perc + TransitPerc + TransitStopsPerLandArea + CR20_C20_R20 + C_COR1 + C_COR2 + MU_1 + S_CS + R_CG, data = data, family = "quasibinomial")

I am wondering if this is the best approach or if I should instead be using an offset, or weights to account for population differences between census tracts.

This looks like count data (number of collisions, expressed as a rate (number of collisions per unit of population per year)), so maybe poisson regression or some other model for count data would be more appropriate. For example, here and here are articles on modelling road collisions.

1 Like

Hi joels. Thanks for your response. I have also tried to model my data using negative binomial regression because it exhibits overdispersion. I am wondering if I should be worried that this model gives a low pseudo R-squared using the McFadden method. And would you have an example of R code for negative binomial regression for modeling a rate/ proportion.
Thank you again.

I don't have much experience with negative binomial regression and I'm not sure how useful the pseudo-r-squared statistic is for this type of regression model (see here for example). Generally, it's probably better to focus on what type of model structure makes sense for the process you're trying to represent (but advice like that is probably too general to be of practical use on its own). This vignette discusses assessing model fit for count data (comparing poisson and negative binomial models) in a bayesian context and uses the rstanarm package to fit the models.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.