Purrr and Mutate not working together?


I'm running into an unexpected problem. Because I could swear this used to work.

I have a nested dataframe, with an "ID" column (n = 55) and a "Data" listcolumn. I wish to create another listcolumn which is a lagged version of the "Data" column.

nested_df <-  nested_df %>%
mutate(data_lag = map(.x = .$data,
                      .f  = lag, 1))

I get the following error
Error: Column data_lag must be length 1 (the group size), not 55

doing the following works however:

nested_df$data_lag <-  map(.x = nested_df$data,
                           .f  = lag, 1))

Where am I going wrong here?

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.


If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

My apologies, I thought it would be clear enough. I have included some example.


my_df <- data.frame("ID" = rep(1:3, each = 3),
                    "X1" = rbinom(9, 1, 0.5),
                    "X2" = rbinom(9, 1, 0.5),
                    "X3" = rbinom(9, 1, 0.5))
nested_df <- my_df %>%
  group_by(ID) %>% 

  if (lag>0){
    for (i in 1:lag){
  }else if (lag==0){

nested_df <- nested_df %>% 
  mutate(data_lag = map(.x = .$data,
                        .f = lagthemats, 1))
#> Error: Column `data_lag` must be length 1 (the group size), not 3

Created on 2019-12-04 by the reprex package (v0.3.0)

1 Like

Thanks for this! If i understand correctly, you're wanting to run your lagthemats function for each bit of data in your nested data frame that corresponds to each ID?

This should do the trick: mutate(nested_df, data_lag = map(data, lagthemats, 1))

When you use map() inside mutate() you can refer to variables in the data directly and don't need to use the $ selection syntax that you had originally.

A word of caution about lagthemats(), though, it looks like it's maybe calculating the lead of each row, not the lag (at least based on how I would define lead/lag), i.e. you're getting the next row each time, not the previous one:

> nested_df %>% 
+     mutate(data_lag = map(data, lagthemats, 1)) %>% 
+     unnest()
# A tibble: 9 x 7
# Groups:   ID [3]
     ID    X1    X2    X3   X11   X21   X31
  <dbl> <int> <int> <int> <int> <int> <int>
1     1     0     1     0     0     1     1
2     1     0     1     1     1     1     1
3     1     1     1     1    NA    NA    NA
4     2     0     0     0     0     0     0
5     2     0     0     0     0     1     1
6     2     0     1     1    NA    NA    NA
7     3     1     1     1     0     1     1
8     3     0     1     1     1     0     1
9     3     1     0     1    NA    NA    NA

Thanks Jim, it does indeed work! In the past I think I've always used .$ though, strange!

Yes, I suppose it's a lead and not a lag, it's not my function though. I'd have to ask the reasoning of the colleague who named it! Thanks for the heads up though!

@jim89 I was wondering how to efficiently rename the column names now? so instead of X11.. X31 I could get X1 Lag, ... , X3 Lag?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.