I found a useful formula just now to identify outliers in a data set and am trying to come up with R code for it, but I'm running into two snags; any help would be greatly appreciated.
First, I tried recreating the example from the textbook I'm reading, which used the following 5 numbers and formula to identify outliers:
df=tibble::tribble(~num,
19,
25,
28,
32,
10000)
round(abs(df$num[]-median(df$num))/(1.483*4),2)
The number 1.483 is a constant, while 4 is supposed to be the MAD of the data set (I got this from the textbook). However, when I use the mad() call in R, it gives me a different answer than the textbook:
round(abs(df$num[]-median(df$num))/(1.483*mad(df$num)),2)
Anyone know why R's MAD is different? Is it a rounding thing, or a problem with my code?
Second, I'm trying to write a function that adds a 0/1 coding column to this tibble to identify outliers so I can filter them easily. When I write it as a line of code, it works perfectly--but I can't get the function to work though...
df=df %>% mutate(outlier=ifelse(round(abs(num[]-median(num))/(1.483*mad(num)),2)>2.24,1,0))
find_outliers=function(df,col){
df=df %>% mutate(outlier=if_else(round(abs(col[]-median(col))/(1.483*mad(col)),2)>2.24,1,0))
return(df)
}
find_outliers(df,"num")