Hi All,
I work with Iris dataset.
iris %>% group_by(Species) %>%
summarise_all(.funs = function(x) list(enframe(quantile(x,
probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>% tidyr::unnest()
Which gives me this warning:
Warning message:
`cols` is now required.
Please use `cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)`
So I did it:
iris %>% group_by(Species) %>%
summarise_all(.funs = function(x) list(enframe(quantile(x,
probs = c(0.25,0.5,0.75), na.rm = TRUE))))
%>% tidyr::unnest(cols = c(Sepal.Length, Sepal.Width,
Petal.Length, Petal.Width))
but this resulted in error:
Error: Column names `name`, `value`, `name`, `value`, `name`,
and 1 more must not be duplicated.
What did I do wrong ?
As the error message states, the columns resulting from unnest()
must not have duplicated names. You can fix this by specifying the names_repair
parameter.
library(tidyverse)
# With names_repair
iris %>%
group_by(Species) %>%
summarise_all(.funs = function(x) list(enframe(
quantile(x, probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>%
unnest(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
names_repair = "unique")
#> New names:
#> * name -> name...2
#> * value -> value...3
#> * name -> name...4
#> * value -> value...5
#> * name -> name...6
#> * ...
#> # A tibble: 9 x 9
#> Species name...2 value...3 name...4 value...5 name...6 value...7 name...8
#> <fct> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr>
#> 1 setosa 25% 4.8 25% 3.2 25% 1.4 25%
#> 2 setosa 50% 5 50% 3.4 50% 1.5 50%
#> 3 setosa 75% 5.2 75% 3.68 75% 1.58 75%
#> 4 versic~ 25% 5.6 25% 2.52 25% 4 25%
#> 5 versic~ 50% 5.9 50% 2.8 50% 4.35 50%
#> 6 versic~ 75% 6.3 75% 3 75% 4.6 75%
#> 7 virgin~ 25% 6.22 25% 2.8 25% 5.1 25%
#> 8 virgin~ 50% 6.5 50% 3 50% 5.55 50%
#> 9 virgin~ 75% 6.9 75% 3.18 75% 5.88 75%
#> # ... with 1 more variable: value...9 <dbl>
Created on 2020-05-02 by the reprex package (v0.3.0)
names_sep
can also be used which results in more meaningful names.
iris %>%
group_by(Species) %>%
summarise_all(.funs = function(x) list(enframe(
quantile(x, probs = c(0.25,0.5,0.75), na.rm = TRUE)))) %>%
unnest(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
names_sep = "_")
#> # A tibble: 9 x 9
#> Species Sepal.Length_na~ Sepal.Length_va~ Sepal.Width_name Sepal.Width_val~
#> <fct> <chr> <dbl> <chr> <dbl>
#> 1 setosa 25% 4.8 25% 3.2
#> 2 setosa 50% 5 50% 3.4
#> 3 setosa 75% 5.2 75% 3.68
#> 4 versic~ 25% 5.6 25% 2.52
#> 5 versic~ 50% 5.9 50% 2.8
#> 6 versic~ 75% 6.3 75% 3
#> 7 virgin~ 25% 6.22 25% 2.8
#> 8 virgin~ 50% 6.5 50% 3
#> 9 virgin~ 75% 6.9 75% 3.18
#> # ... with 4 more variables: Petal.Length_name <chr>, Petal.Length_value <dbl>,
#> # Petal.Width_name <chr>, Petal.Width_value <dbl>
Created on 2020-05-02 by the reprex package (v0.3.0)
Thank you for the help.
It works and adds a _name and _value prefixes, but I have changed cols names:
df2 <- df %>% rename_at(vars(ends_with("_name")),
list(~str_replace(.,"_name","_Quantile")))
There are unnest_auto() and nest_by() functions in tidyr as well, so if you have in mind some
examples of using them I will be very grateful.
Thanks.
These links have some examples of those functions:
Thank you,
This dataframe looks wide to me:
Would it be sensibly to make it long somehow?
You can use tidyr::pivot_longer()
. But if a long data frame is your ultimate goal, your task can be simplified a lot with the development version of dplyr
. I assume you already have this version since you're asking about nest_by()
which is a new function.
library(dplyr, warn.conflicts = FALSE)
packageVersion("dplyr")
#> [1] '0.8.99.9002'
library(tidyr)
iris %>%
pivot_longer(cols = -Species) %>%
group_by(Species, name) %>%
summarise(quantile = c(0.25, 0.5, 0.75),
value = quantile(value, c(0.25, 0.5, 0.75)))
#> # A tibble: 36 x 4
#> # Groups: Species [3]
#> Species name quantile value
#> <fct> <chr> <dbl> <dbl>
#> 1 setosa Petal.Length 0.25 1.4
#> 2 setosa Petal.Length 0.5 1.5
#> 3 setosa Petal.Length 0.75 1.58
#> 4 setosa Petal.Width 0.25 0.2
#> 5 setosa Petal.Width 0.5 0.2
#> 6 setosa Petal.Width 0.75 0.3
#> 7 setosa Sepal.Length 0.25 4.8
#> 8 setosa Sepal.Length 0.5 5
#> 9 setosa Sepal.Length 0.75 5.2
#> 10 setosa Sepal.Width 0.25 3.2
#> # ... with 26 more rows
Created on 2020-05-02 by the reprex package (v0.3.0)
Thank you, this is exactly what I was looking for.
Thanks again.
system
Closed
May 9, 2020, 8:26pm
8
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.