Normality test for splited data?

PaulMaul · February 28, 2020, 6:10pm

maybe its a trivial question, but I couldn't find anything so far. I have a dataset with about 23.100 returns split in groups of 50 returns each via:

split(dat, sample(rep(1:462, 50)))

Probably there is a way to perform a normality test with a loop for every single subset, but I don't know how. Help is very much appreciated.

thanks in advance

mattwarkentin · February 28, 2020, 7:44pm

Hi @PaulMaul,

This is probably how I would go about doing it, to keep it all self-contained in a data frame and using functional programming rather than a for loop. Assuming you have 10,000 data points, with some ID column indicating your 50 groups...

library(tidyverse)

data <- tibble(
  id = rep(1:50, each = 200),
  x = rnorm(10000)
)

data %>% 
  group_nest(id) %>%
  hoist(data, x = 'x') %>% 
  mutate(normality_test = map_dbl(x, ~shapiro.test(.)$p.value))
#> # A tibble: 50 x 4
#>       id x           data               normality_test
#>    <int> <list>      <list>                      <dbl>
#>  1     1 <dbl [200]> <tibble [200 × 0]>         0.774 
#>  2     2 <dbl [200]> <tibble [200 × 0]>         0.142 
#>  3     3 <dbl [200]> <tibble [200 × 0]>         0.360 
#>  4     4 <dbl [200]> <tibble [200 × 0]>         0.743 
#>  5     5 <dbl [200]> <tibble [200 × 0]>         0.342 
#>  6     6 <dbl [200]> <tibble [200 × 0]>         0.0921
#>  7     7 <dbl [200]> <tibble [200 × 0]>         0.169 
#>  8     8 <dbl [200]> <tibble [200 × 0]>         0.909 
#>  9     9 <dbl [200]> <tibble [200 × 0]>         0.675 
#> 10    10 <dbl [200]> <tibble [200 × 0]>         0.657 
#> # … with 40 more rows

^{Created on 2020-02-28 by the reprex package (v0.3.0)}

system · March 20, 2020, 7:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.