Pivot_wider troubles

Hi, I'm having issues with pivot_wider. I keep getting this error code:
Warning message:
Values in ppg are not uniquely identified; output will contain list-cols.

  • Use values_fn = list(ppg = list) to suppress this warning.
  • Use values_fn = list(ppg = length) to identify where the duplicates arise
  • Use values_fn = list(ppg = summary_fun) to summarise duplicates

I have tried creating a row number but this doesn't work. Any help would be appreciated.

I have run code to re-produce my example:

dput(head(df_long, n = 20))
structure(list(pos = c("D", "F", "F", "F", "F", "F", "F", "F",
"D", "D", "D", "D", "D", "F", "F", "F", "F", "F", "F", "F"),
age.group = c("late", "early", "early", "early", "early",
"early", "early", "early", "early", "early", "early", "early",
"early", "early", "early", "early", "early", "early", "early",
"early"), DY+/- = structure(c(1L, 1L, 4L, 5L, 1L, 4L, 5L,
6L, 2L, 1L, 4L, 5L, 6L, 1L, 4L, 5L, 6L, 5L, 5L, 1L), .Label = c("DY",
"DY-1", "DY-2", "DY+1", "DY+2", "DY+3"), class = "factor"),
ppg = c(0.16, 0.8088, 0.8525, 1.2115, 0.3051, 0.403, 0.6508,
1.0909, 0.2381, 0.2623, 0.4167, 0.2615, 0.3, 0.3433, 0.7794,
0.8676, 1.6912, 0, 0.5116, 0.0909), player = c("a.j.-cook",
"a.j.-jenks", "a.j.-jenks", "a.j.-jenks", "aaron-berisha",
"aaron-berisha", "aaron-berisha", "aaron-berisha", "aaron-haydon",
"aaron-haydon", "aaron-haydon", "aaron-haydon", "aaron-haydon",
"aaron-luchuk", "aaron-luchuk", "aaron-luchuk", "aaron-luchuk",
"aaron-scott", "aaron-snow", "aaron-taylor")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))

Thank you.

You haven't specified which variable you want to pivot. Here's an example where the categories in DY+/- are being pivoted into columns and filled with values from the ppg variable.

library(tidyr)

df_long <- structure(list(pos = c("D", "F", "F", "F", "F", "F", "F", "F", "D", "D", 
                                  "D", "D", "D", "F", "F", "F", "F", "F", "F", "F"),
                          age.group = c("late", "early", "early", "early", "early",
                                        "early", "early", "early", "early", "early", 
                                        "early", "early", "early", "early", "early", 
                                        "early", "early", "early", "early", "early"), 
                          `DY+/-` = structure(c(1L, 1L, 4L, 5L, 1L, 4L, 5L, 6L, 2L, 1L, 
                                                4L, 5L, 6L, 1L, 4L, 5L, 6L, 5L, 5L, 1L), 
                                              .Label = c("DY", "DY-1", "DY-2", "DY+1", "DY+2", "DY+3"), class = "factor"),
                          ppg = c(0.16, 0.8088, 0.8525, 1.2115, 0.3051, 0.403, 0.6508, 1.0909, 0.2381, 0.2623, 
                                  0.4167, 0.2615, 0.3, 0.3433, 0.7794, 0.8676, 1.6912, 0, 0.5116, 0.0909), 
                          player = c("a.j.-cook", "a.j.-jenks", "a.j.-jenks", "a.j.-jenks", "aaron-berisha", 
                                     "aaron-berisha", "aaron-berisha", "aaron-berisha", "aaron-haydon", "aaron-haydon", 
                                     "aaron-haydon", "aaron-haydon", "aaron-haydon", "aaron-luchuk", "aaron-luchuk", 
                                     "aaron-luchuk", "aaron-luchuk", "aaron-scott", "aaron-snow", "aaron-taylor")), 
                     row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

pivot_wider(df_long, names_from = `DY+/-`, values_from = ppg)
#> # A tibble: 8 x 8
#>   pos   age.group player             DY `DY+1` `DY+2` `DY+3` `DY-1`
#>   <chr> <chr>     <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 D     late      a.j.-cook      0.16   NA     NA      NA    NA    
#> 2 F     early     a.j.-jenks     0.809   0.852  1.21   NA    NA    
#> 3 F     early     aaron-berisha  0.305   0.403  0.651   1.09 NA    
#> 4 D     early     aaron-haydon   0.262   0.417  0.262   0.3   0.238
#> 5 F     early     aaron-luchuk   0.343   0.779  0.868   1.69 NA    
#> 6 F     early     aaron-scott   NA      NA      0      NA    NA    
#> 7 F     early     aaron-snow    NA      NA      0.512  NA    NA    
#> 8 F     early     aaron-taylor   0.0909 NA     NA      NA    NA

Created on 2020-05-19 by the reprex package (v0.3.0)

Yes, sorry I wanted exactly what you did.

However, I'm not sure what you did to make the pivot_wider work because when i run a str() command the str(df_long) looks to me to be the same as my str(overage) **** see below

When I copy your code
pivot_wider(overage, names_from = DY+/-, values_from = ppg)

I get the same error message
10 D early adam-bignell <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>

… with 2,188 more rows

Warning message:
Values in ppg are not uniquely identified; output will contain list-cols.

  • Use values_fn = list(ppg = list) to suppress this warning.
  • Use values_fn = list(ppg = length) to identify where the duplicates arise
  • Use values_fn = list(ppg = summary_fun) to summarise duplicates

tibble [20 × 5] (S3: tbl_df/tbl/data.frame)
pos : chr [1:20] "D" "F" "F" "F" ... age.group: chr [1:20] "late" "early" "early" "early" ...
DY+/- : Factor w/ 6 levels "DY","DY-1","DY-2",..: 1 1 4 5 1 4 5 6 2 1 ... ppg : num [1:20] 0.16 0.809 0.853 1.212 0.305 ...
$ player : chr [1:20] "a.j.-cook" "a.j.-jenks" "a.j.-jenks" "a.j.-jenks" ...

str(overage)
tibble [5,719 × 5] (S3: tbl_df/tbl/data.frame)
pos : chr [1:5719] "D" "F" "F" "F" ... age.group: chr [1:5719] "late" "early" "early" "early" ...
DY+/- : Factor w/ 6 levels "DY","DY-1","DY-2",..: 1 1 4 5 1 4 5 6 2 1 ... ppg : num [1:5719] 0.16 0.809 0.853 1.212 0.305 ...
$ player : chr [1:5719] "a.j.-cook" "a.j.-jenks" "a.j.-jenks" "a.j.-jenks" ...

The warning is telling you that your original data contains multiple ppg values for at least some combinations of pos, age.group & player. As a result, pivot_wider() doesn't know how to handle them and is asking you to specify an aggregation method via values_fn (in the absence of which it will return list-cols).

To identify where your duplicates are, you can use the values_fn = list(ppg = length) as suggested in the warning. Here I've applied it to the sample data you provided earlier.

> pivot_wider(df_long, names_from = `DY+/-`, values_from = ppg, values_fn = list(ppg = length))
# A tibble: 8 x 8
  pos   age.group player           DY `DY+1` `DY+2` `DY+3` `DY-1`
  <chr> <chr>     <chr>         <int>  <int>  <int>  <int>  <int>
1 D     late      a.j.-cook         1     NA     NA     NA     NA
2 F     early     a.j.-jenks        1      1      1     NA     NA
3 F     early     aaron-berisha     1      1      1      1     NA
4 D     early     aaron-haydon      1      1      1      1      1
5 F     early     aaron-luchuk      1      1      1      1     NA
6 F     early     aaron-scott      NA     NA      1     NA     NA
7 F     early     aaron-snow       NA     NA      1     NA     NA
8 F     early     aaron-taylor      1     NA     NA     NA     NA

As you can see, the maximum length of each element after the pivoting operation is 1. If you apply this function to your data, you will find values greater than 1. Those are the duplicate entries that are causing the warning. You need to decide how to aggregate those values.

1 Like

Thank you for your time.