Trouble writing function to find outliers

Longshot408 · February 18, 2021, 1:40pm

I found a useful formula just now to identify outliers in a data set and am trying to come up with R code for it, but I'm running into two snags; any help would be greatly appreciated.

First, I tried recreating the example from the textbook I'm reading, which used the following 5 numbers and formula to identify outliers:

df=tibble::tribble(~num,
           19,
           25,
           28,
           32,
           10000)

round(abs(df$num[]-median(df$num))/(1.483*4),2)

The number 1.483 is a constant, while 4 is supposed to be the MAD of the data set (I got this from the textbook). However, when I use the mad() call in R, it gives me a different answer than the textbook:

round(abs(df$num[]-median(df$num))/(1.483*mad(df$num)),2)

Anyone know why R's MAD is different? Is it a rounding thing, or a problem with my code?

Second, I'm trying to write a function that adds a 0/1 coding column to this tibble to identify outliers so I can filter them easily. When I write it as a line of code, it works perfectly--but I can't get the function to work though...

df=df %>% mutate(outlier=ifelse(round(abs(num[]-median(num))/(1.483*mad(num)),2)>2.24,1,0))

find_outliers=function(df,col){
  df=df %>% mutate(outlier=if_else(round(abs(col[]-median(col))/(1.483*mad(col)),2)>2.24,1,0))
  
  return(df)
}

find_outliers(df,"num")

nirgrahamuk · February 18, 2021, 2:00pm

try
mad(df$num,constant = 1)

Longshot408 · February 18, 2021, 2:06pm

That worked, thanks!! One problem down at least!

jeremy · February 18, 2021, 2:32pm

This works for me. You need double {{}} around column names inside the function, and you don't need to quote the col name when you call the function argument.

library(tidyverse)
library(rlang)

dat <- tibble::tribble(~num,
                   19,
                   25,
                   28,
                   32,
                   10000)

dat %>% mutate(outlier=ifelse(round(abs(num-median(num))/(1.483*mad(num)),2)>2.24,1,0))

find_outliers <- function(df,col){
 df %>%
  mutate(outlier = if_else(round(abs({{col}} - median({{col}}))/(1.483*mad({{col}},constant = 1)), 2) > 2.24, 1, 0))
}

find_outliers(df = dat, col = num)

more details here Programming with dplyr • dplyr

Longshot408 · February 18, 2021, 2:41pm

Amazing! I was so confused by some of my functions broke without quotes and some didn't. Thanks!

jeremy · February 18, 2021, 2:51pm

Also, you may not need to load rlang. I loaded it just in case. Try just loading tidyverse and see if it still works.

system · February 25, 2021, 2:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.