Multiple assignment in dplyr::mutate()


#1

I can't find a good tidyverse solution to create two separate columns for low and high confidence intervals

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>% 
    mutate(low_high = map(a, ~t.test(., conf.level = 0.95)$conf.int))
#> # A tibble: 2 x 2
#>   a         low_high 
#>   <list>    <list>   
#> 1 <dbl [2]> <dbl [2]>
#> 2 <dbl [2]> <dbl [2]>

Is it possible to do something like

... %>% mutate(c("low", "high") = map(conf_int_function(some_column)))

when we apply some function to each row


#2

You can try to work on formating the result of your t-test to be able to use tidy tools.
Here is one way - and surely not the only one and more efficient one.

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>% 
  mutate(low_high = map(a, ~t.test(., conf.level = 0.95)$conf.int %>%
                          # name the t-test resulting numeric vector
                          set_names(c("low", "high")) %>% 
                          # transform the vector in tibble
                          enframe)) %>%
  # unnest only this column (keep the a list column)
  unnest(low_high, .drop = FALSE) %>%
  # spread result using tidy to have two column
  spread(name, value)
#> # A tibble: 2 x 3
#>   a          high   low
#>   <list>    <dbl> <dbl>
#> 1 <dbl [2]>  7.85 -4.85
#> 2 <dbl [2]>  9.85 -2.85

Created on 2018-09-10 by the reprex package (v0.2.0).

Another way: the broom :package:

you can use broom to tidy the t.test result. You’ll get tibble that you can filer to select the result you.

library(tidyverse)
library(broom)
tibble(a = list(c(1, 2), c(3, 4))) %>%
  mutate(ttest = map(a, t.test, conf.level = 0.95) %>% map(tidy) %>% map(~ select(., conf.low, conf.high))) %>%
  unnest(ttest, .drop = FALSE)
#> # A tibble: 2 x 3
#>   a         conf.low conf.high
#>   <list>       <dbl>     <dbl>
#> 1 <dbl [2]>    -4.85      7.85
#> 2 <dbl [2]>    -2.85      9.85

I let you see want broom::tidy returns.


#3

Thanks, cderv.
They are good workarounds for this specific case but unfortunately don't solve the main problem.
Anyway, I was impressed by broom solution


#4

You mean you do not look for a solution specific to t.test ?
What is your desired output ?

Sorry, if I misunderstood.


#5

That's a very narrow specific case.
In other words, I think it would be great to have a short expression (1-2 lines of code) when you calculate something once and create more than one column based on it. I believed I missed something in the tidyverse system of packages. If not, it should be:

  1. some complex object in each line
  2. split it into a number of other columns (in my specific case it's 2)
  3. purrr::set_names(), the easist part

#6

should I close this topic if the exact solution (with dplyr::mutate) doesn't exist for the moment? Should I
fix the description?


#7

You can mark it as solved when your issue's been resolved.:+1:


#8

I feel like it would be easier, if it's a fixed number of elements in each vector (as with a confidence interval), to just pull them out individually with map_dbl:

library(tidyverse)
df1 = data_frame(
  x = 1:10,
  low_high = list(1:2, 2:3, 3:4, 4:5, 5:6))
df1
# A tibble: 5 x 2
#        x low_high 
#    <int> <list>   
#  1     1 <int [2]>
#  2     2 <int [2]>
#  3     3 <int [2]>
#  4     4 <int [2]>
#  5     5 <int [2]>

df1 %>% mutate(
  low = map_dbl(low_high, 1),
  high = map_dbl(low_high, 2))
# A tibble: 5 x 4
#        x low_high    low  high
#    <int> <list>    <dbl> <dbl>
#  1     1 <int [2]>     1     2
#  2     2 <int [2]>     2     3
#  3     3 <int [2]>     3     4
#  4     4 <int [2]>     4     5
#  5     5 <int [2]>     5     6

And then just select(-low_high) afterward if you don't want the original list column :slight_smile:

It sounds like you want a more general solution, though, so maybe I'm stating the obvious! If I were going to write a function to tackle that workflow, I'd probably try to "unnest" the vectors by creating rows, not columns, as there's no guarantee with a list-column of vectors that each vector will be of the same length (though I suppose you could fill the extra columns for shorter vectors with NA or something).


#9

I think the approach with enframe is rather specific, as it is rowise and you don't need to know the number of element in you complex element. It should apply to all list columns that you want to transform.

  1. Name the vector element in the list column
  2. enframe to get a tibble (this is the step that put it rowise)
  3. unnest the result
  4. spread as desired.

It is not specific to t.test or a two element list.

I think to approach this colwise, it is as @rensa said and gave as example, you need to know the number of element.

In fact, it the complex list is in the correct form you could use the splice operator !!! from rlang. The correct form is a list of desired column.

library(tidyverse)

# takes a column and format result in a list of desired column
conf_int_function <- function(column) {
  res <- map(column, ~ t.test(.x, conf.level = 0.95))
  conf <- map(res, "conf.int") %>% map(set_names, c('low', 'high'))
  transpose(conf) %>% simplify_all()
}

tibble(a = list(c(1, 2), c(3, 4))) %>%
  # use the splice operator to get the columns
  # You need `.$a` - it is not working with tidyeval
  mutate(!!!conf_int_function(.$a))
#> # A tibble: 2 x 3
#>   a           low  high
#>   <list>    <dbl> <dbl>
#> 1 <dbl [2]> -4.85  7.85
#> 2 <dbl [2]> -2.85  9.85

Created on 2018-09-11 by the reprex package (v0.2.0).


#10

based on @cderv answer and his idea of using !!!

library(tidyverse)
tibble(a = list(c(1, 2), c(3, 4))) %>%
    mutate(!!!map(transpose(.$a), unlist))
#> # A tibble: 2 x 3
#>   a         `c(1, 3)` `c(2, 4)`
#>   <list>        <dbl>     <dbl>
#> 1 <dbl [2]>         1         2
#> 2 <dbl [2]>         3         4

Created on 2018-09-11 by the reprex package (v0.2.0).


#11

Looks good! The only caveat I'd mention is that this fails if the vectors in your list-column vary in size at all. That might not be a problem for your use-case, but it might be worth keeping in mind if you use it in production!


#12

you're right. I think it is a common problem for all such functions like tidyr::separate()