I work with data from human resources and my goal is to build a logistic regression model in order to predict employee attrition. (Employees having a status active=0 or left the business=1). With that, I want to calculate risk scores for each individual employee that tells that employees with different characteristics might have a high/medium/low risk to leave the business.
I have about 40 variables for that, e.g. overtime, sick leaves taken, compensation data, department etc., numerical and categorical data alike.
Using numerical variables are sort of ok to build the model (I have been studying R for a while but I am far from being an expert). However, I can hardly understand how to exactly use and interpret categorical predictors in a multiple logistic regression model. Explanations I have found so far seem to be quite rough or vague about this, so I do not get how to apply this in practice. Can you please suggest any good source of information for someone who is learning R and is moderately comfortable with statistics in general?
My other question is about the risk score calculation. If you could give me any hint on - once my model isaccurate enough - what method should I use to come up with risk scores, please?
What I am thinking is e.g. employee001 works as a developer (+60 risk score), has been promoted in the last 12 months (-15 risk score), but has a commute time more than 45 minutes to work (+25 risk score) has 70 risk score, while employe002 has only 20, so employee001 has a high, why 002 has a low chance to leave. What should be the appropriate steps to come up with something like this?
Appreciate any help on the above.