Unexpected map() result

Using this nested data_frame:

df <- diamonds %>% 
  head(n=1000) %>% 
  nest(-cut, -color, -clarity, -table)

I can map() down the nested column like so:


df %>% 
  mutate(new = map(data, ~ sum(c(.x$x, .x$y, .x$z))))

However I wold like to use some of the other column values as arguments to the function in map() and unexpectedly when I try something like this:


df %>% 
  mutate(new = map(data, ~ mean(.x$x)  +  table )) 

I get:

# A tibble: 560 x 6
         cut color clarity table             data         new
       <ord> <ord>   <ord> <dbl>           <list>      <list>
 1     Ideal     E     SI2    55 <tibble [3 x 6]> <dbl [560]>
 2   Premium     E     SI1    61 <tibble [4 x 6]> <dbl [560]>
 3      Good     E     VS1    65 <tibble [1 x 6]> <dbl [560]>
 4   Premium     I     VS2    58 <tibble [2 x 6]> <dbl [560]>
 5      Good     J     SI2    58 <tibble [1 x 6]> <dbl [560]>
 6 Very Good     J    VVS2    57 <tibble [1 x 6]> <dbl [560]>
 7 Very Good     I    VVS1    57 <tibble [1 x 6]> <dbl [560]>
 8 Very Good     H     SI1    55 <tibble [2 x 6]> <dbl [560]>
 9      Fair     E     VS2    61 <tibble [1 x 6]> <dbl [560]>
10 Very Good     H     VS1    61 <tibble [1 x 6]> <dbl [560]>
# ... with 550 more rows

Where each value of sum(.x$x) has been added to the entire table column instead of just the appropriate value for the row. I expected the following lines to be equivalent:


 mutate(new = map(data, ~ table ))
 mutate(new = table )

but they are not. How do I specify just the appropriate row values of table in map()?

Note this is a toy example for a custom function that requires this approach. I can't just extract the mean(.x$x) into a new column and add it to table. I've also tried nesting table within data but that doesn't work with my function either.

A prose description plus code snippets isn't enough, you also need to make a simple reprex that:

  1. Builds the input data you are using.
  2. The function you are trying to write, even if it doesn't work.
  3. Usage of the function you are trying to write, even if it doesn't work.
  4. Builds the output data you want the function to produce.

You can learn more about reprex's here:

Right now the is an issue with the version of reprex that is in CRAN so you should download it directly from github.

Until CRAN catches up with the latest version install reprex with

devtools::install_github("tidyverse/reprex")

The reason we ask for a reprex is that it is the easiest and quickest way for someone to understand the issue you are running into and answer it and to see the results you are seeing.

Nearly everyone here who is answering questions is doing it on their own time and really appreciate anything you can do to minimize that time.

In any case the issue you are running into is that table is a column, i.e. a vector, as you are using it in the map function. It looks like map should just pass in the value of table for each row but that isn't how map interprets table

The reprex below I think will help you see what is happing and how to fix it.

suppressPackageStartupMessages(library(tidyverse))

df <- diamonds %>% 
    head(n=1000) %>% 
    nest(-cut, -color, -clarity, -table)


# map returns a list not
# an atomic value. map_dbl might be a better for what
# you are trying to do because it nicely prints out a value
# and can be easier to work with
df %>% 
    mutate(new = map_dbl(data, ~ sum(c(.x$x, .x$y, .x$z))))
#> # A tibble: 560 x 6
#>    cut       color clarity table data               new
#>    <ord>     <ord> <ord>   <dbl> <list>           <dbl>
#>  1 Ideal     E     SI2       55. <tibble [3 × 6]>  41.2
#>  2 Premium   E     SI1       61. <tibble [4 × 6]>  55.9
#>  3 Good      E     VS1       65. <tibble [1 × 6]>  10.4
#>  4 Premium   I     VS2       58. <tibble [2 × 6]>  27.2
#>  5 Good      J     SI2       58. <tibble [1 × 6]>  11.4
#>  6 Very Good J     VVS2      57. <tibble [1 × 6]>  10.4
#>  7 Very Good I     VVS1      57. <tibble [1 × 6]>  10.4
#>  8 Very Good H     SI1       55. <tibble [2 × 6]>  26.2
#>  9 Fair      E     VS2       61. <tibble [1 × 6]>  10.1
#> 10 Very Good H     VS1       61. <tibble [1 × 6]>  10.4
#> # ... with 550 more rows



# when you pass in table here you are passing in

# whole column, i.e. a vector
df %>% 
    mutate(new = map(data, ~ mean(.x$x)  +  table ))
#> # A tibble: 560 x 6
#>    cut       color clarity table data             new        
#>    <ord>     <ord> <ord>   <dbl> <list>           <list>     
#>  1 Ideal     E     SI2       55. <tibble [3 × 6]> <dbl [560]>
#>  2 Premium   E     SI1       61. <tibble [4 × 6]> <dbl [560]>
#>  3 Good      E     VS1       65. <tibble [1 × 6]> <dbl [560]>
#>  4 Premium   I     VS2       58. <tibble [2 × 6]> <dbl [560]>
#>  5 Good      J     SI2       58. <tibble [1 × 6]> <dbl [560]>
#>  6 Very Good J     VVS2      57. <tibble [1 × 6]> <dbl [560]>
#>  7 Very Good I     VVS1      57. <tibble [1 × 6]> <dbl [560]>
#>  8 Very Good H     SI1       55. <tibble [2 × 6]> <dbl [560]>
#>  9 Fair      E     VS2       61. <tibble [1 × 6]> <dbl [560]>
#> 10 Very Good H     VS1       61. <tibble [1 × 6]> <dbl [560]>
#> # ... with 550 more rows

# you can see that more clearly by just passing in table
df %>% 
    mutate(new = map(data, ~ table ))
#> # A tibble: 560 x 6
#>    cut       color clarity table data             new        
#>    <ord>     <ord> <ord>   <dbl> <list>           <list>     
#>  1 Ideal     E     SI2       55. <tibble [3 × 6]> <dbl [560]>
#>  2 Premium   E     SI1       61. <tibble [4 × 6]> <dbl [560]>
#>  3 Good      E     VS1       65. <tibble [1 × 6]> <dbl [560]>
#>  4 Premium   I     VS2       58. <tibble [2 × 6]> <dbl [560]>
#>  5 Good      J     SI2       58. <tibble [1 × 6]> <dbl [560]>
#>  6 Very Good J     VVS2      57. <tibble [1 × 6]> <dbl [560]>
#>  7 Very Good I     VVS1      57. <tibble [1 × 6]> <dbl [560]>
#>  8 Very Good H     SI1       55. <tibble [2 × 6]> <dbl [560]>
#>  9 Fair      E     VS2       61. <tibble [1 × 6]> <dbl [560]>
#> 10 Very Good H     VS1       61. <tibble [1 × 6]> <dbl [560]>
#> # ... with 550 more rows


# you need to do is to pass into map the table column as
# the first argument so that it iterates each row in that column

df %>% mutate(new = map_dbl(.$table, ~ .))
#> # A tibble: 560 x 6
#>    cut       color clarity table data               new
#>    <ord>     <ord> <ord>   <dbl> <list>           <dbl>
#>  1 Ideal     E     SI2       55. <tibble [3 × 6]>   55.
#>  2 Premium   E     SI1       61. <tibble [4 × 6]>   61.
#>  3 Good      E     VS1       65. <tibble [1 × 6]>   65.
#>  4 Premium   I     VS2       58. <tibble [2 × 6]>   58.
#>  5 Good      J     SI2       58. <tibble [1 × 6]>   58.
#>  6 Very Good J     VVS2      57. <tibble [1 × 6]>   57.
#>  7 Very Good I     VVS1      57. <tibble [1 × 6]>   57.
#>  8 Very Good H     SI1       55. <tibble [2 × 6]>   55.
#>  9 Fair      E     VS2       61. <tibble [1 × 6]>   61.
#> 10 Very Good H     VS1       61. <tibble [1 × 6]>   61.
#> # ... with 550 more rows


# but you actually need two columns so use map2 
# (or pmap is more than two columns)
# note that `.` in the second arg of map2 is a different
# variable than in the third argument

df %>% mutate(new = map2_dbl(data, .$table, ~ sum(c(.x$x, .x$y, .x$z) + .y)))
#> # A tibble: 560 x 6
#>    cut       color clarity table data               new
#>    <ord>     <ord> <ord>   <dbl> <list>           <dbl>
#>  1 Ideal     E     SI2       55. <tibble [3 × 6]>  536.
#>  2 Premium   E     SI1       61. <tibble [4 × 6]>  788.
#>  3 Good      E     VS1       65. <tibble [1 × 6]>  205.
#>  4 Premium   I     VS2       58. <tibble [2 × 6]>  375.
#>  5 Good      J     SI2       58. <tibble [1 × 6]>  185.
#>  6 Very Good J     VVS2      57. <tibble [1 × 6]>  181.
#>  7 Very Good I     VVS1      57. <tibble [1 × 6]>  181.
#>  8 Very Good H     SI1       55. <tibble [2 × 6]>  356.
#>  9 Fair      E     VS2       61. <tibble [1 × 6]>  193.
#> 10 Very Good H     VS1       61. <tibble [1 × 6]>  193.
#> # ... with 550 more rows

Created on 2018-03-14 by the reprex package (v0.2.0).

3 Likes

Thank-you @danr

I was looking for a generalisable solution for more than one column and your suggestion of pmap() fits the bill.

library(tidyverse)

#data
df <- diamonds %>% 
  head(n=1000) %>% 
  nest(-cut, -color, -clarity, -table)

#function
df %>% 
  mutate(new = pmap_dbl(list(data, .$table), ~ sum(c(..1$x, ..1$y, ..1$z) + ..2)))
#> # A tibble: 560 x 6
#>          cut color clarity table             data    new
#>        <ord> <ord>   <ord> <dbl>           <list>  <dbl>
#>  1     Ideal     E     SI2    55 <tibble [3 x 6]> 536.22
#>  2   Premium     E     SI1    61 <tibble [4 x 6]> 787.91
#>  3      Good     E     VS1    65 <tibble [1 x 6]> 205.43
#>  4   Premium     I     VS2    58 <tibble [2 x 6]> 375.21
#>  5      Good     J     SI2    58 <tibble [1 x 6]> 185.44
#>  6 Very Good     J    VVS2    57 <tibble [1 x 6]> 181.38
#>  7 Very Good     I    VVS1    57 <tibble [1 x 6]> 181.40
#>  8 Very Good     H     SI1    55 <tibble [2 x 6]> 356.18
#>  9      Fair     E     VS2    61 <tibble [1 x 6]> 193.14
#> 10 Very Good     H     VS1    61 <tibble [1 x 6]> 193.44
#> # ... with 550 more rows

Created on 2018-03-15 by the reprex package (v0.2.0).

2 Likes

Looks good and thanks for using a reprex to show the results!

2 Likes