Calculation of outlier values

Hi. I am having a survey dataset with hundreds of variables. I have given a sample dataset here. I want to find out the outlier values. I want to get z-score for each variable. How can I get that?

library(tidyverse)

data<-tibble::tribble(
  ~student_name,    ~id, ~l1c1, ~l1c2, ~l1c3, ~total,
            "a", "S001",    1L,    0L,    1L,     2L,
            "b", "S002",    1L,    0L,    2L,     3L,
            "c", "S003",    1L,    2L,    1L,     4L,
            "d", "S004",    0L,    1L,    1L,     2L,
            "e", "S005",    0L,    1L,    0L,     1L,
            "f", "S006",    1L,    1L,    2L,     4L,
            "g", "S007",    1L,    0L,    0L,     1L,
            "h", "S008",    0L,    2L,    1L,     3L
  )
Created on 2022-05-09 by the reprex package (v2.0.1)

Something like this for the z-scores - the integers need to be converted to doubles.

data %>% 
  mutate(across(l1c1:total, as.numeric),
         across(l1c1:total, ~(. - mean(.))/sd(.)))

# A tibble: 8 × 6
  student_name id      l1c1   l1c2  l1c3  total
  <chr>        <chr>  <dbl>  <dbl> <dbl>  <dbl>
1 a            S001   0.725 -1.05   0    -0.418
2 b            S002   0.725 -1.05   1.32  0.418
3 c            S003   0.725  1.35   0     1.25 
4 d            S004  -1.21   0.150  0    -0.418
5 e            S005  -1.21   0.150 -1.32 -1.25 
6 f            S006   0.725  0.150  1.32  1.25 
7 g            S007   0.725 -1.05  -1.32 -1.25 
8 h            S008  -1.21   1.35   0     0.418

Can't I get it variable-wise? Like the z-score of l1c1, z-score of l1c2, etc. ?

My point was rather than row-wise, can't it be made column-wise?

It is column-wise.

> (data$l1c1 - mean(data$l1c1))/sd(data$l1c1)
[1]  0.7245688  0.7245688  0.7245688 -1.2076147 -1.2076147  0.7245688  0.7245688 -1.2076147

The use of across() is just to mutate across multiple columns.

Oh Ok. I got it now.
Thanks a lot.

Regards,
Nithin

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.