Hard for me to generalize the questions. Below is an example: picking out the first row that is less than half of the column maxima for column A, B, C. The outlier is column X, where the maximum should be the average of the maxima of the normal columns (A, B, & C). The code below works for non-grouped data:
library(tidyverse)
df <- tribble(
~A, ~B, ~C, ~X,
10, 12, 8, 5,
8, 5, 7, 4,
4, 2, 4, 3,
2, 1, 2, 2,
0, 0, 0, 1
)
df %>%
{
X_max <- summarise_at(., vars(A:C), max) %>% rowMeans()
bind_cols(summarise_at(., vars(A:C), list(~match(TRUE, .< 0.5 * max(.)))),
summarise_at(., vars(X), list(~match(TRUE, .< 0.5 * X_max)))
)}
#> # A tibble: 1 x 4
#> A B C X
#> <int> <int> <int> <int>
#> 1 3 2 4 2
Created on 2019-03-30 by the reprex package (v0.2.1)
The results for column A
is 3
because A[3] = 4 < 0.5 * 10
;
the results for column X
is 2
because X[2] = 4 < mean(10, 12, 8) * 0.5
.
My question is how to do the same for grouped data? For example df2 <- bind_rows(df, df +2, .id = "grp") %>% group_by(grp)