I'm using the following R code in power query, and it's working well to provide me the studentized residuals that I use to flag outliers in a big dataset :
As output, I get the 'df' object containing the col "group" + the augment output including the studentized residuals column.
My objective is obtaining the original dataset + the studentized column to avoid having 2 big tables (my original table has 20M of rows, not easy to have 2 tables with such big size ...). Thanks a lot for your help !
dataset <- as.data.frame(dataset) dataset$perf <- as.numeric(dataset$perf) dataset$factor1 <- as.factor(dataset$factor1) dataset$factor2 <- as.factor(dataset$factor2) df <- dataset %>% group_by(group) %>% mutate(unique_factor1 = n_distinct(factor1), unique_factor1 = n_distinct(factor1), var = var(perf)) %>% filter( unique_factor1 != 1 & unique_factor2 != 1 & var != 0 ) %>% do(cbind(group = .$group, lm(perf ~ factor1 + factor2, data = .) %>% augment))''''