I calculated the mahalanobis distance with the intention to detect "careless responder" or in other words people who were not attentive while filling in a questionnaire.
Here´s what I´ve got so far:
mahad(x, plot = TRUE, flag = TRUE, confidence = 0.99, na.rm = TRUE)
x = data set
plot = Plot the resulting QQ graph
flag = Flag potential outliers using the confidence level specified in parameter
confidence = The desired confidence level of the result
na.rm = Should missing data be deleted
#Sample 1 -> =Ver1_items
mahad(Ver1_items, plot = TRUE, flag = TRUE, confidence = 0.99, na.rm = TRUE)
#1 53.28543 FALSE
#2 59.82937 FALSE
#3 70.93420 FALSE
#4 40.99005 FALSE
#5 61.38863 FALSE
#6 91.87906 TRUE
#7 50.07120 FALSE
#> the numbers on the left stand for the individual persons
The problem now is that I´ve got different samples who filled in a different amount of items. For example sample 1 got 40 items, sample 2 got 100 items. Person x belongs to sample 1 and 20 of her answers on the items stand out ("TRUE") compared to the sample´s mean. So half of her answers suggest her being inattentive or careless. Person y belongs to sample 2 and as well, 20 answers stand out. But now just 25 % of her answers suggest her being careless.
I hope this example makes my question more clear. Please excuse my english.
Maybe it´s also useful to add that my intention is not a categorization of people in "careless" and not "careless" as I intend to use constant variables. So the column "d_sq" is more important for my analysis than the column "flagged". But that might also be an unnecessary information...
To compare the mahalanobis distance I need the values of the different samples to be comparable. So i might need proportions or something like that. But I´m really struggeling to find a function that allows me to do so. I guess Mahalanobis is not the most common function so I hope theres somebody who may can help or has some advice. I would be super grateful as I´m really struggeling with R.
Thanks very much.