I need some help. I am currently writing my master thesis in economics. I am Analyzing the second home rate in swiss municipalities. Totally I have 30 independent variables (2 dummy variables and the rest a numeric variables). Now my goal is the find out which of this variables have a significant effect on the second home rate. Until now I used the lm-function including all my variables (without any transformation or any interacion term) and afterwards I used the regsubsets-function in R to get the best model by the BIC criterion. So far so good, but then I began to test the OLS assumptions.
I got significant results for the Breusch Pagan test (to test homoscedacity), for the raintest (to test linearity) and for the reset test (to test modelspecification). This means there is violation of the linearity assumption, normally distribution of residuals assumption and the homoscedacity assumption.
Now there are two possibilities either I transform variables or I use for example glm. Honestly I would like to work with OLS, so I need to transform variables. But how do I know which variables I need to transform? And which transformation they need? Do I need to consider all 30 independent variables for transformation?
I know there is the function powertransform or I used the histograms of every variables to detect the transformation. But this way didn't work well. Does anyone have a good suggestion for me?
And here some Component+ Residual plots for some variables. The problem of heteroscedasticity is clearly obvious.
I hope my explanation is fine. If not let me know and I will give some further informations.