Dear all,
I'm using the following R code in power query, and it's working well to provide me the studentized residuals that I use to flag outliers in a big dataset :
As output, I get the 'df' object containing the col "group" + the augment output including the studentized residuals column.
My objective is obtaining the original dataset + the studentized column to avoid having 2 big tables (my original table has 20M of rows, not easy to have 2 tables with such big size ...). Thanks a lot for your help !
''''library(tidyverse)
library(broom)
dataset <- as.data.frame(dataset)
dataset$perf <- as.numeric(dataset$perf)
dataset$factor1 <- as.factor(dataset$factor1)
dataset$factor2 <- as.factor(dataset$factor2)
df <- dataset %>%
group_by(group) %>%
mutate(unique_factor1 = n_distinct(factor1), unique_factor1 = n_distinct(factor1), var =
var(perf)) %>%
filter( unique_factor1 != 1 & unique_factor2 != 1 & var != 0 ) %>%
do(cbind(group = .$group, lm(perf ~ factor1 + factor2, data = .) %>% augment))''''