Working with a nested list as a column in a tibble

tw0handt0uch · December 14, 2019, 6:10am

Hello. I created a function that returns multiple values. I then used the function in a mutate with map - it returns a column that is of type list. I want to mutate another new column that does a simple function on all the values stored in the cell to the left but I'm getting errors. Could you help me to understand how to access the values stored in the list for additional mutates? I have tried various versions of unnest() and indexing but haven't found the proper answer. Thank you!

library(tidyverse)

example_seq_tbl <- tibble(var_a = seq(from = 1, to = 5, by = 1))

#a function that returns 10 values
x_function <- 
function(x){
  rnorm(n = 10,
        mean = x
  )
}         
 
#map function over var_a, now each observation in new_col has 10 values     
another_tbl <- example_seq_tbl %>%
  mutate(new_col = map(var_a, x_function)) 

#this code below doesn't work but shows what I am trying to do
#i want to add a new column that operates on the 10 values in new_col like, max, mean, quantile, etc
another_tbl %>%
  mutate(another_col = max(new_col))

FJCC · December 14, 2019, 6:43am

Two methods:

library(purrr)
#> Warning: package 'purrr' was built under R version 3.5.3
library(tibble)
#> Warning: package 'tibble' was built under R version 3.5.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.5.3
example_seq_tbl <- tibble(var_a = seq(from = 1, to = 5, by = 1))

#a function that returns 10 values
x_function <- 
  function(x){
    rnorm(n = 10,
          mean = x
    )
  }         

#map function over var_a, now each observation in new_col has 10 values     
another_tbl <- example_seq_tbl %>%
  mutate(new_col = map(var_a, x_function)) 

third_tbl <- another_tbl %>%
  mutate(another_col = map_dbl(new_col, ~max(.)))
third_tbl
#> # A tibble: 5 x 3
#>   var_a new_col    another_col
#>   <dbl> <list>           <dbl>
#> 1     1 <dbl [10]>        2.09
#> 2     2 <dbl [10]>        3.19
#> 3     3 <dbl [10]>        3.84
#> 4     4 <dbl [10]>        5.52
#> 5     5 <dbl [10]>        5.96


Again <- another_tbl %>% unnest(cols = new_col) %>% 
  group_by(var_a) %>% summarize(Max = max(new_col))
Again
#> # A tibble: 5 x 2
#>   var_a   Max
#>   <dbl> <dbl>
#> 1     1  2.09
#> 2     2  3.19
#> 3     3  3.84
#> 4     4  5.52
#> 5     5  5.96

^{Created on 2019-12-13 by the reprex package (v0.3.0.9000)}

ramirabal · December 18, 2019, 8:51pm

when is it requiered to add ~ and "(.)", I don't know exactly what they mean, because if instead you just use max it produces the same result::

third_tbl <- another_tbl %>%
  mutate(another_col = map_dbl(new_col, max)
#> # A tibble: 5 x 3
#> var_a new_col    another_col
#> <dbl> <list>           <dbl>
#> 1     1 <dbl [10]>        1.78
#> 2     2 <dbl [10]>        3.09
#> 3     3 <dbl [10]>        5.18
#> 4     4 <dbl [10]>        6.08
#> 5     5 <dbl [10]>        6.73

siddharthprabhu · December 19, 2019, 6:03am

@ramirabal In purrr, ~ is shorthand for defining anonymous functions while . is used to refer to the current element of the iterable (kind of how you might define an iterator i in a for loop.

For example, these are equivalent:

x <- c(2, 3, 4)

map_dbl(x, function(x) x ^ 2)
[1]  4  9 16

map_dbl(x, ~ . ^ 2)
[1]  4  9 16

Andrzej · December 19, 2019, 9:41am

Hi,
can we use .x as well for refering to the current element of the iterable (current subset of dataset) ?
Here, it gives the same result, but will it always be the case ?

library(purrr)
x <- c(2, 3, 4)

map_dbl(x, ~ .x ^ 2)
#> [1]  4  9 16

^{Created on 2019-12-19 by the reprex package (v0.3.0)}

siddharthprabhu · December 19, 2019, 11:36am

Edit: Apparently, I was mistaken. Turns out using .x is the recommended approach. See the post by @andresrcs below.

@Andrzej You can but it's better practice to use . for the single argument case.

From the purrr documentation:

.f
A function, formula, or vector (not necessarily atomic).

If a function, it is used as is.

If a formula, e.g. ~ .x + 2, it is converted to a function. There are three ways to refer to the arguments:

For a single argument function, use .

For a two argument function, use .x and .y

For more arguments, use ..1, ..2, ..3 etc

andresrcs · December 19, 2019, 12:28pm

This is not always true, because it could be confusing when used inside a pipe, see this related post.

Andrzej · December 20, 2019, 1:52pm

Thank you @andresrcs for your detailed explanation,

best regards,
Andrzej

system · December 27, 2019, 1:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.