Hi there all!
I am struggling to correctly interpret how to check for the assumptions in my models. As I am still learning statistics and the use of R studio in this, some advice would be very much appreciated!
I've made 2 (2 dependent variables, calcium and ketone bodies) x 4 (with different combinations of 2 random factors namely none (GLM), farm, parity and farm and parity) models with a total of 5 x 3 independent variables which i've reduced to the final model based upon the AIC codes. The independent variables are the 5 behavioural classes divided over 3 weeks (week -3 to -1).
Beforehand, i've checked the distribution (using visualization using histograms) of both the dependent variables, one of which was positively skewed so i log transformed, the other was normal distributed. The independent variables were all normally distributed except for one which was mildly positively skewed, but i didn't transform this one (as i've read normal distributed independent variables is not one of the assumptions of a GLM or LME). Furthermore, i visually checked both the independent and dependent variables per random factor. The independent variables were different per random factor (in my case: per farm, certain mean behaviour times were definitely different, on one farm the cows eat more, at a certain parity they walk less etc). The dependent variables were not altered, just normally distributed.
I understand that at the end of the final model, some visualization is needed to check if all the assumptions were met in the models. Firstly, i found this a bit confusing as i thought that i already did this using the histograms checking for distribution? But that's not enough? So then i did new visualization using plot(finalmodelname) and qqnorm(residuals(finalmodelname)). Based upon what i've read and seen (youtube tutorials) so far.
If it's okay i would like to share some screenshots of my results to make sure i am actually doing this correct..? You're help would be very much appreciated!
So here some photos:
These are from the first GLM for calcium so without random factors. I did these for myself, as a check, as i am a real dummy so doing everything step by step i guess. Anyways: based upon the plot and the qq i would say: assumptions are met? But suddenly two other plots appear and i am in the dark here what this means?(happens to the other GLM as well). Obviously, this isn't the most important model, i could even throw it out as due to the differences in behaviour per random factor, the inclusion of the random factors are important.
Then, for all the other final LME models but two, the stuff below happens.
Based on these, i would say: this is good and so the assumptions are met? Is that correct?
And then for some reason, two LME are different and now this scares me:
Because there are two clouds of dots, and i don't know but ? Weird? But then the QQ plot is good?
I hope the questions are a bit clear, and i hope you can help me!
Thanks in advance!
(Ps. not a native english speaker and not a native statistician so sometimes i am having trouble with understanding the explanations that i read online, sorry in advance!)
(Ps2: i plotted everything in the beginning too, and it looked like this:
Should i've done the qq here too? Then how do i do this? Because