Please help with tidyr::unnest()

Andrzej · May 2, 2020, 7:35am

Hi All,
I work with Iris dataset.

iris %>% group_by(Species) %>% 
  summarise_all(.funs = function(x) list(enframe(quantile(x,
 probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>% tidyr::unnest()

Which gives me this warning:

Warning message:
`cols` is now required.
Please use `cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)`

So I did it:

iris %>% group_by(Species) %>% 
  summarise_all(.funs = function(x) list(enframe(quantile(x, 
probs = c(0.25,0.5,0.75), na.rm = TRUE)))) 
%>% tidyr::unnest(cols = c(Sepal.Length, Sepal.Width, 
Petal.Length, Petal.Width))

but this resulted in error:

Error: Column names `name`, `value`, `name`, `value`, `name`, 
and 1 more must not be duplicated.

What did I do wrong ?

siddharthprabhu · May 2, 2020, 7:51am

As the error message states, the columns resulting from unnest() must not have duplicated names. You can fix this by specifying the names_repair parameter.

library(tidyverse)

# With names_repair
iris %>% 
  group_by(Species) %>% 
  summarise_all(.funs = function(x) list(enframe(
    quantile(x, probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>% 
  unnest(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), 
         names_repair = "unique")
#> New names:
#> * name -> name...2
#> * value -> value...3
#> * name -> name...4
#> * value -> value...5
#> * name -> name...6
#> * ...
#> # A tibble: 9 x 9
#>   Species name...2 value...3 name...4 value...5 name...6 value...7 name...8
#>   <fct>   <chr>        <dbl> <chr>        <dbl> <chr>        <dbl> <chr>   
#> 1 setosa  25%           4.8  25%           3.2  25%           1.4  25%     
#> 2 setosa  50%           5    50%           3.4  50%           1.5  50%     
#> 3 setosa  75%           5.2  75%           3.68 75%           1.58 75%     
#> 4 versic~ 25%           5.6  25%           2.52 25%           4    25%     
#> 5 versic~ 50%           5.9  50%           2.8  50%           4.35 50%     
#> 6 versic~ 75%           6.3  75%           3    75%           4.6  75%     
#> 7 virgin~ 25%           6.22 25%           2.8  25%           5.1  25%     
#> 8 virgin~ 50%           6.5  50%           3    50%           5.55 50%     
#> 9 virgin~ 75%           6.9  75%           3.18 75%           5.88 75%     
#> # ... with 1 more variable: value...9 <dbl>

^{Created on 2020-05-02 by the reprex package (v0.3.0)}

names_sep can also be used which results in more meaningful names.

iris %>% 
  group_by(Species) %>% 
  summarise_all(.funs = function(x) list(enframe(
    quantile(x, probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>% 
  unnest(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), 
         names_sep = "_")
#> # A tibble: 9 x 9
#>   Species Sepal.Length_na~ Sepal.Length_va~ Sepal.Width_name Sepal.Width_val~
#>   <fct>   <chr>                       <dbl> <chr>                       <dbl>
#> 1 setosa  25%                          4.8  25%                          3.2 
#> 2 setosa  50%                          5    50%                          3.4 
#> 3 setosa  75%                          5.2  75%                          3.68
#> 4 versic~ 25%                          5.6  25%                          2.52
#> 5 versic~ 50%                          5.9  50%                          2.8 
#> 6 versic~ 75%                          6.3  75%                          3   
#> 7 virgin~ 25%                          6.22 25%                          2.8 
#> 8 virgin~ 50%                          6.5  50%                          3   
#> 9 virgin~ 75%                          6.9  75%                          3.18
#> # ... with 4 more variables: Petal.Length_name <chr>, Petal.Length_value <dbl>,
#> #   Petal.Width_name <chr>, Petal.Width_value <dbl>

^{Created on 2020-05-02 by the reprex package (v0.3.0)}

Andrzej · May 2, 2020, 11:34am

Thank you for the help.

It works and adds a _name and _value prefixes, but I have changed cols names:

df2 <- df %>% rename_at(vars(ends_with("_name")),
list(~str_replace(.,"_name","_Quantile")))

There are unnest_auto() and nest_by() functions in tidyr as well, so if you have in mind some
examples of using them I will be very grateful.
Thanks.

siddharthprabhu · May 2, 2020, 11:52am

These links have some examples of those functions:

Andrzej · May 2, 2020, 1:00pm

Thank you,
This dataframe looks wide to me:

Would it be sensibly to make it long somehow?

siddharthprabhu · May 2, 2020, 3:00pm

You can use tidyr::pivot_longer(). But if a long data frame is your ultimate goal, your task can be simplified a lot with the development version of dplyr. I assume you already have this version since you're asking about nest_by() which is a new function.

library(dplyr, warn.conflicts = FALSE)
packageVersion("dplyr")
#> [1] '0.8.99.9002'
library(tidyr)

iris %>% 
  pivot_longer(cols = -Species) %>% 
  group_by(Species, name) %>% 
  summarise(quantile = c(0.25, 0.5, 0.75), 
            value = quantile(value, c(0.25, 0.5, 0.75)))
#> # A tibble: 36 x 4
#> # Groups:   Species [3]
#>    Species name         quantile value
#>    <fct>   <chr>           <dbl> <dbl>
#>  1 setosa  Petal.Length     0.25  1.4 
#>  2 setosa  Petal.Length     0.5   1.5 
#>  3 setosa  Petal.Length     0.75  1.58
#>  4 setosa  Petal.Width      0.25  0.2 
#>  5 setosa  Petal.Width      0.5   0.2 
#>  6 setosa  Petal.Width      0.75  0.3 
#>  7 setosa  Sepal.Length     0.25  4.8 
#>  8 setosa  Sepal.Length     0.5   5   
#>  9 setosa  Sepal.Length     0.75  5.2 
#> 10 setosa  Sepal.Width      0.25  3.2 
#> # ... with 26 more rows

^{Created on 2020-05-02 by the reprex package (v0.3.0)}

Andrzej · May 2, 2020, 8:26pm

Thank you, this is exactly what I was looking for.
Thanks again.

system · May 9, 2020, 8:26pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.