maybe its a trivial question, but I couldn't find anything so far. I have a dataset with about 23.100 returns split in groups of 50 returns each via:
split(dat, sample(rep(1:462, 50)))
Probably there is a way to perform a normality test with a loop for every single subset, but I don't know how. Help is very much appreciated.
thanks in advance
Hi @PaulMaul ,
This is probably how I would go about doing it, to keep it all self-contained in a data frame and using functional programming rather than a for
loop. Assuming you have 10,000 data points, with some ID column indicating your 50 groups...
library(tidyverse)
data <- tibble(
id = rep(1:50, each = 200),
x = rnorm(10000)
)
data %>%
group_nest(id) %>%
hoist(data, x = 'x') %>%
mutate(normality_test = map_dbl(x, ~shapiro.test(.)$p.value))
#> # A tibble: 50 x 4
#> id x data normality_test
#> <int> <list> <list> <dbl>
#> 1 1 <dbl [200]> <tibble [200 × 0]> 0.774
#> 2 2 <dbl [200]> <tibble [200 × 0]> 0.142
#> 3 3 <dbl [200]> <tibble [200 × 0]> 0.360
#> 4 4 <dbl [200]> <tibble [200 × 0]> 0.743
#> 5 5 <dbl [200]> <tibble [200 × 0]> 0.342
#> 6 6 <dbl [200]> <tibble [200 × 0]> 0.0921
#> 7 7 <dbl [200]> <tibble [200 × 0]> 0.169
#> 8 8 <dbl [200]> <tibble [200 × 0]> 0.909
#> 9 9 <dbl [200]> <tibble [200 × 0]> 0.675
#> 10 10 <dbl [200]> <tibble [200 × 0]> 0.657
#> # … with 40 more rows
Created on 2020-02-28 by the reprex package (v0.3.0)
system
Closed
March 20, 2020, 7:49pm
3
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.