Applying dunn.test using purrr map()

purrr

#1

Hi again. I'm trying to use the map function to use the dunn.test package to perform some post-hoc analyses. The below code sort of works in that I can a) split the data by one variable and b) apply a dunn test of some values against another variable. Unfortuantely the returned results are identical for each of the split groupings.


library(dunn.test)
data <- data.frame(
              groupA = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A",
                         "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A",
                         "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
              groupB = c("L", "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L",
                         "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L",
                         "M", "N", "L", "M", "L", "M", "N", "L", "M", "N"),
               value = c(10, 111.5, 360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100,
                         360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100, 360,
                         90, 120, 180, 50, 60, 70, 80, 90)
        )


data %>%
  split(.$groupA) %>%
  map(~dunn.test(x = data$value, g = data$groupB, method = "bonferroni")) %>%
  map_dfr(~ as.data.frame(.), id = "groupA")

  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 1.0649, df = 2, p-value = 0.59


                           Comparison of x by group                            
                                 (Bonferroni)                                  
Col Mean-|
Row Mean |          L          M
---------+----------------------
       M |  -0.477697
         |     0.9493
         |
       N |  -1.031944  -0.589682
         |     0.4531     0.8331

alpha = 0.05
Reject Ho if p <= alpha/2
  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 1.0649, df = 2, p-value = 0.59


                           Comparison of x by group                            
                                 (Bonferroni)                                  
Col Mean-|
Row Mean |          L          M
---------+----------------------
       M |  -0.477697
         |     0.9493
         |
       N |  -1.031944  -0.589682
         |     0.4531     0.8331

alpha = 0.05
Reject Ho if p <= alpha/2

At first, what I thought was happening is that dunn.test is getting applied to the whole dataset rather than first splitting the data, then doing a dunn.test within each group, but as you can see by the output, some of the values do not match up with the above (edit: I missed out the method = bonferroni which when included outputs the same as above):

dunn.test(x = data$value, g = data$groupB)

  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 1.0649, df = 2, p-value = 0.59


                           Comparison of x by group                            
                                (No adjustment)                                
Col Mean-|
Row Mean |          L          M
---------+----------------------
       M |  -0.477697
         |     0.3164
         |
       N |  -1.031944  -0.589682
         |     0.1510     0.2777

alpha = 0.05
Reject Ho if p <= alpha/2

Finally, I'm also trying to output the data into a data.frame - or anything really - which I can pass to KableExtra to neatly output a table (I'm using this in a Rmarkdown document). map_dfr almost works, but it drops the variable by which I split the data.

EDIT: So I think the dunn.test is getting applied to the whole dataset, I missed out the method = "bonferroni" in the second bit of code... When I include that, the output is the same as in the first example


#2

You can do this pretty nicely with the group_by/nest/mutate method. It looks like this:

library(dunn.test)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(tidyr)

data <- data.frame(
  groupA = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A",
             "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A",
             "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
  groupB = c("L", "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L",
             "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L",
             "M", "N", "L", "M", "L", "M", "N", "L", "M", "N"),
  value = c(10, 111.5, 360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100,
            360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100, 360,
            90, 120, 180, 50, 60, 70, 80, 90)
)


results_tbl <- data %>%
  group_by(groupA) %>%
  nest() %>% 
  mutate(model = map(data, ~dunn.test(x = .x$value, g = .x$groupB, method = "bonferroni") %>% as_tibble())) %>% 
  unnest(model)
#>   Kruskal-Wallis rank sum test
#> 
#> data: x and group
#> Kruskal-Wallis chi-squared = 12.5304, df = 2, p-value = 0
#> 
#> 
#>                            Comparison of x by group                            
#>                                  (Bonferroni)                                  
#> Col Mean-|
#> Row Mean |          L          M
#> ---------+----------------------
#>        M |  -2.359885
#>          |     0.0274
#>          |
#>        N |  -3.371967  -1.445128
#>          |    0.0011*     0.2226
#> 
#> alpha = 0.05
#> Reject Ho if p <= alpha/2
#>   Kruskal-Wallis rank sum test
#> 
#> data: x and group
#> Kruskal-Wallis chi-squared = 3.8857, df = 2, p-value = 0.14
#> 
#> 
#>                            Comparison of x by group                            
#>                                  (Bonferroni)                                  
#> Col Mean-|
#> Row Mean |          L          M
#> ---------+----------------------
#>        M |   1.971221
#>          |     0.0730
#>          |
#>        N |   0.985610  -0.985610
#>          |     0.4865     0.4865
#> 
#> alpha = 0.05
#> Reject Ho if p <= alpha/2

results_tbl
#> # A tibble: 6 x 6
#>   groupA  chi2      Z        P P.adjusted comparisons
#>   <fct>  <dbl>  <dbl>    <dbl>      <dbl> <chr>      
#> 1 A      12.5  -2.36  0.00914     0.0274  L - M      
#> 2 A      12.5  -3.37  0.000373    0.00112 L - N      
#> 3 A      12.5  -1.45  0.0742      0.223   M - N      
#> 4 B       3.89  1.97  0.0243      0.0730  L - M      
#> 5 B       3.89  0.986 0.162       0.486   L - N      
#> 6 B       3.89 -0.986 0.162       0.486   M - N

Created on 2018-09-26 by the reprex package (v0.2.0).

The nice thing about this is that it keeps the results of the comparisons with the grouping variable itself, which is something that is lost with the split/map method. It looks like the dunn.test model prints outputs by default but the results can be captured in a tibble pretty nicely that can then be used in a kable like you are looking for.


#3

Thanks @tbradley, that works a treat. And wouldn't you know it, despite already searching the forum for the past hour before posting, I just happened upon one of your own old posts detailing precisely this.... How to use `map` with `cor`. :roll_eyes: Thanks very much


#4

No worries! If you are interested, I have also written a blog post about using this type of method with PCA too, which you can find here


#5

You have a small mistake in your original code. You refer to the whole dataset, data, within map() rather than the element of the list from split(). Using the formula notation, you can refer to each element of the list you are passing to map() in via .x.

data %>%
     split(.$groupA) %>%
     map(~dunn.test(x = .x$value, g = .x$groupB, method = "bonferroni")) %>%
     map_dfr(~ as.data.frame(.), .id = "groupA")