Multiple assignment in dplyr::mutate()

gray · September 10, 2018, 4:10pm

I can't find a good tidyverse solution to create two separate columns for low and high confidence intervals

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>% 
    mutate(low_high = map(a, ~t.test(., conf.level = 0.95)$conf.int))
#> # A tibble: 2 x 2
#>   a         low_high 
#>   <list>    <list>   
#> 1 <dbl [2]> <dbl [2]>
#> 2 <dbl [2]> <dbl [2]>

Is it possible to do something like

... %>% mutate(c("low", "high") = map(conf_int_function(some_column)))

when we apply some function to each row

cderv · September 10, 2018, 4:27pm

You can try to work on formating the result of your t-test to be able to use tidy tools.
Here is one way - and surely not the only one and more efficient one.

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>% 
  mutate(low_high = map(a, ~t.test(., conf.level = 0.95)$conf.int %>%
                          # name the t-test resulting numeric vector
                          set_names(c("low", "high")) %>% 
                          # transform the vector in tibble
                          enframe)) %>%
  # unnest only this column (keep the a list column)
  unnest(low_high, .drop = FALSE) %>%
  # spread result using tidy to have two column
  spread(name, value)
#> # A tibble: 2 x 3
#>   a          high   low
#>   <list>    <dbl> <dbl>
#> 1 <dbl [2]>  7.85 -4.85
#> 2 <dbl [2]>  9.85 -2.85

Created on 2018-09-10 by the reprex package (v0.2.0).

Another way: the `broom`

you can use broom to tidy the t.test result. You’ll get tibble that you can filer to select the result you.

library(tidyverse)
library(broom)
tibble(a = list(c(1, 2), c(3, 4))) %>%
  mutate(ttest = map(a, t.test, conf.level = 0.95) %>% map(tidy) %>% map(~ select(., conf.low, conf.high))) %>%
  unnest(ttest, .drop = FALSE)
#> # A tibble: 2 x 3
#>   a         conf.low conf.high
#>   <list>       <dbl>     <dbl>
#> 1 <dbl [2]>    -4.85      7.85
#> 2 <dbl [2]>    -2.85      9.85

I let you see want broom::tidy returns.

gray · September 10, 2018, 4:57pm

Thanks, cderv.
They are good workarounds for this specific case but unfortunately don't solve the main problem.
Anyway, I was impressed by broom solution

cderv · September 10, 2018, 6:02pm

You mean you do not look for a solution specific to t.test ?
What is your desired output ?

Sorry, if I misunderstood.

gray · September 10, 2018, 10:35pm

That's a very narrow specific case.
In other words, I think it would be great to have a short expression (1-2 lines of code) when you calculate something once and create more than one column based on it. I believed I missed something in the tidyverse system of packages. If not, it should be:

some complex object in each line
split it into a number of other columns (in my specific case it's 2)
purrr::set_names(), the easist part

gray · September 10, 2018, 10:37pm

should I close this topic if the exact solution (with dplyr::mutate) doesn't exist for the moment? Should I
fix the description?

mara · September 10, 2018, 11:36pm

You can mark it as solved when your issue's been resolved.

rensa · September 11, 2018, 1:57am

I feel like it would be easier, if it's a fixed number of elements in each vector (as with a confidence interval), to just pull them out individually with map_dbl:

library(tidyverse)
df1 = data_frame(
  x = 1:10,
  low_high = list(1:2, 2:3, 3:4, 4:5, 5:6))
df1
# A tibble: 5 x 2
#        x low_high 
#    <int> <list>   
#  1     1 <int [2]>
#  2     2 <int [2]>
#  3     3 <int [2]>
#  4     4 <int [2]>
#  5     5 <int [2]>

df1 %>% mutate(
  low = map_dbl(low_high, 1),
  high = map_dbl(low_high, 2))
# A tibble: 5 x 4
#        x low_high    low  high
#    <int> <list>    <dbl> <dbl>
#  1     1 <int [2]>     1     2
#  2     2 <int [2]>     2     3
#  3     3 <int [2]>     3     4
#  4     4 <int [2]>     4     5
#  5     5 <int [2]>     5     6

And then just select(-low_high) afterward if you don't want the original list column

It sounds like you want a more general solution, though, so maybe I'm stating the obvious! If I were going to write a function to tackle that workflow, I'd probably try to "unnest" the vectors by creating rows, not columns, as there's no guarantee with a list-column of vectors that each vector will be of the same length (though I suppose you could fill the extra columns for shorter vectors with NA or something).

cderv · September 11, 2018, 6:05am

I think the approach with enframe is rather specific, as it is rowise and you don't need to know the number of element in you complex element. It should apply to all list columns that you want to transform.

Name the vector element in the list column
enframe to get a tibble (this is the step that put it rowise)
unnest the result
spread as desired.

It is not specific to t.test or a two element list.

I think to approach this colwise, it is as @rensa said and gave as example, you need to know the number of element.

In fact, it the complex list is in the correct form you could use the splice operator !!! from rlang. The correct form is a list of desired column.

library(tidyverse)

# takes a column and format result in a list of desired column
conf_int_function <- function(column) {
  res <- map(column, ~ t.test(.x, conf.level = 0.95))
  conf <- map(res, "conf.int") %>% map(set_names, c('low', 'high'))
  transpose(conf) %>% simplify_all()
}

tibble(a = list(c(1, 2), c(3, 4))) %>%
  # use the splice operator to get the columns
  # You need `.$a` - it is not working with tidyeval
  mutate(!!!conf_int_function(.$a))
#> # A tibble: 2 x 3
#>   a           low  high
#>   <list>    <dbl> <dbl>
#> 1 <dbl [2]> -4.85  7.85
#> 2 <dbl [2]> -2.85  9.85

Created on 2018-09-11 by the reprex package (v0.2.0).

gray · September 11, 2018, 2:31pm

based on @cderv answer and his idea of using !!!

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>%
    mutate(!!!map(transpose(.$a), unlist))
#> # A tibble: 2 x 3
#>   a         `c(1, 3)` `c(2, 4)`
#>   <list>        <dbl>     <dbl>
#> 1 <dbl [2]>         1         2
#> 2 <dbl [2]>         3         4

Created on 2018-09-11 by the reprex package (v0.2.0).

rensa · September 12, 2018, 12:14am

Looks good! The only caveat I'd mention is that this fails if the vectors in your list-column vary in size at all. That might not be a problem for your use-case, but it might be worth keeping in mind if you use it in production!

gray · September 12, 2018, 11:27am

you're right. I think it is a common problem for all such functions like tidyr::separate()

Multiple assignment in dplyr::mutate()

Another way: the broom

Another way: the `broom`