I want to do multiple regression using the dependent variable "posn_neg_5" using a mixture of continuous and binary independent variables. However, posn_neg_5 is not normally distributed (see below). I would like to avoid transformation so that the output remains in units the readers can understand/ interpret.
The statistics that are typically calculated along with linear regression assume that residuals are normally distributed. It's not necessarily a problem that your response variable is non-normal - what matters are the residuals. I would go ahead and fit the model you have in mind and test the residuals.
If you still think you require a transformation, then I would do it, and try to use effects plots plotted on the original scale to help your readers understand the relationship between the predictors and the response.
One thing I notice is that your response variable is perhaps on a discrete scale. This might push you to treat it as an ordinal categorical variable and do your regression with polr.
Let me add a bit to @arthur.t 's helpful response. If you have a large sample, then it's not very important that the residuals be normally distributed. Although, as @arthur.t points out, it does matter for auxiliary statistics if the sample is small.
However, when the dependent variable is 0/1, sometimes called a linear probability model, it is likely that you have heteroskedasticity--which does affect the validity of the auxiliary statistics.
Thanks both,
And yes to both points. I was showing ignorance, it's the residuals I need to check.
Normality: Deciding whether a Q-Q plot is adequately normal has always seemed a bit arbitrary. How would I apply a Shapiro-Wilk test directly after the 'Summary' stats?
"polr": I had been concerned about the dependent var not being a true continuous variable. I take it 'polr' would be the safer option?