# Normality test for splited data?

maybe its a trivial question, but I couldn't find anything so far. I have a dataset with about 23.100 returns split in groups of 50 returns each via:

split(dat, sample(rep(1:462, 50)))

Probably there is a way to perform a normality test with a loop for every single subset, but I don't know how. Help is very much appreciated.

Hi @PaulMaul,

This is probably how I would go about doing it, to keep it all self-contained in a data frame and using functional programming rather than a `for` loop. Assuming you have 10,000 data points, with some ID column indicating your 50 groups...

``````library(tidyverse)

data <- tibble(
id = rep(1:50, each = 200),
x = rnorm(10000)
)

data %>%
group_nest(id) %>%
hoist(data, x = 'x') %>%
mutate(normality_test = map_dbl(x, ~shapiro.test(.)\$p.value))
#> # A tibble: 50 x 4
#>       id x           data               normality_test
#>    <int> <list>      <list>                      <dbl>
#>  1     1 <dbl > <tibble [200 × 0]>         0.774
#>  2     2 <dbl > <tibble [200 × 0]>         0.142
#>  3     3 <dbl > <tibble [200 × 0]>         0.360
#>  4     4 <dbl > <tibble [200 × 0]>         0.743
#>  5     5 <dbl > <tibble [200 × 0]>         0.342
#>  6     6 <dbl > <tibble [200 × 0]>         0.0921
#>  7     7 <dbl > <tibble [200 × 0]>         0.169
#>  8     8 <dbl > <tibble [200 × 0]>         0.909
#>  9     9 <dbl > <tibble [200 × 0]>         0.675
#> 10    10 <dbl > <tibble [200 × 0]>         0.657
#> # … with 40 more rows
``````

Created on 2020-02-28 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.