Classify the output of Logistic Regression into bins

Hi All,
I have a logistic regression in R whose goal is to predict the probability of default on some test data
glm(default ~ X1 + X2 + X3 + X4 + X5 + X1:term + term:X5 - 1, family="binomial", data=mydata)

What I'd like to do is 'bin' this data so that bins 1 to n each have a certain rate of default. How can I bin the logistic regression results in this way? For example, the bins on a sample set of 1000 might look like:

Bin# P(Default) Count
1 10% 4000
2 20% 1100
3 30% 1500

That is, I set in advance the probabilities I want each bin to have (0.1,0.2,.0.3) and then bins are created based around those settings.

Thanks in advance

I think you can do this with the predict function. The help file for predict.glm says this about the type argument:

type: the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities.

You should be able to use code like this to make bins.

FIT <- glm(Result ~ X1 + X2, family = "binomial", data = DF)
DF$prob <- predict(FIT, type = "response")
DF$bin <- cut(DF$prob, breaks = seq(0, 1, 0.1))
1 Like

It worked just fine for me. Thanks a lot.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.