I have landing data (kg) of five species where I try to identify which factors may be contributing most strongly to the high catch. Before, I had 4 predictor variables (Depth, Chlorophyll, Temperature and Bottom Type), the last one was a categorical variable with 17 levels, which was making it very difficult to interpret the results.Because of that, I reduce it the levels to just 4 variables, totaling 7 predictor variables (Depth, Chlorophyll, Temperature, Reef, Rhodolith, Seamount Slope and Unconsolidated). I transformed the 4 new variables into % and then in sine arc to further dilute the effects of the 0, and thus, being numerical, it could be easier to interpret the results. However, I am unsure if there is any problem with using compositional data in glm along with other continuous data. In the vif test there was a high collinearity but I saw a discussion where they said that it would be too much noise for nothing since the problem of multicollinearity is not statistical, but of a substantive nature and that I should not worry about it until the correlation is greater than 95, which means I'm using almost the same information twice. I have vif = 36 for two compositional variables in one model. I saw that one option would be to remove one of the variables, but I believe that it would not be feasible to remove any variable from the model considering my objective. Note: I am using the Gaussian family and used the box-cox transformation to normalize my dependent variable (cpue). my model: glm (CPUE ~ Depth + Chlorophyll + SST + Reef + Rodolith + SS + Uncons, data = chrysurus, family = gaussian)
Thanks a lot.