Pivot_wider troubles

turkjr19 · May 19, 2020, 4:28pm

Hi, I'm having issues with pivot_wider. I keep getting this error code:
Warning message:
Values in ppg are not uniquely identified; output will contain list-cols.

Use values_fn = list(ppg = list) to suppress this warning.
Use values_fn = list(ppg = length) to identify where the duplicates arise
Use values_fn = list(ppg = summary_fun) to summarise duplicates

I have tried creating a row number but this doesn't work. Any help would be appreciated.

I have run code to re-produce my example:

dput(head(df_long, n = 20))
structure(list(pos = c("D", "F", "F", "F", "F", "F", "F", "F",
"D", "D", "D", "D", "D", "F", "F", "F", "F", "F", "F", "F"),
age.group = c("late", "early", "early", "early", "early",
"early", "early", "early", "early", "early", "early", "early",
"early", "early", "early", "early", "early", "early", "early",
"early"), DY+/- = structure(c(1L, 1L, 4L, 5L, 1L, 4L, 5L,
6L, 2L, 1L, 4L, 5L, 6L, 1L, 4L, 5L, 6L, 5L, 5L, 1L), .Label = c("DY",
"DY-1", "DY-2", "DY+1", "DY+2", "DY+3"), class = "factor"),
ppg = c(0.16, 0.8088, 0.8525, 1.2115, 0.3051, 0.403, 0.6508,
1.0909, 0.2381, 0.2623, 0.4167, 0.2615, 0.3, 0.3433, 0.7794,
0.8676, 1.6912, 0, 0.5116, 0.0909), player = c("a.j.-cook",
"a.j.-jenks", "a.j.-jenks", "a.j.-jenks", "aaron-berisha",
"aaron-berisha", "aaron-berisha", "aaron-berisha", "aaron-haydon",
"aaron-haydon", "aaron-haydon", "aaron-haydon", "aaron-haydon",
"aaron-luchuk", "aaron-luchuk", "aaron-luchuk", "aaron-luchuk",
"aaron-scott", "aaron-snow", "aaron-taylor")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))

Thank you.

siddharthprabhu · May 19, 2020, 4:39pm

You haven't specified which variable you want to pivot. Here's an example where the categories in DY+/- are being pivoted into columns and filled with values from the ppg variable.

library(tidyr)

df_long <- structure(list(pos = c("D", "F", "F", "F", "F", "F", "F", "F", "D", "D", 
                                  "D", "D", "D", "F", "F", "F", "F", "F", "F", "F"),
                          age.group = c("late", "early", "early", "early", "early",
                                        "early", "early", "early", "early", "early", 
                                        "early", "early", "early", "early", "early", 
                                        "early", "early", "early", "early", "early"), 
                          `DY+/-` = structure(c(1L, 1L, 4L, 5L, 1L, 4L, 5L, 6L, 2L, 1L, 
                                                4L, 5L, 6L, 1L, 4L, 5L, 6L, 5L, 5L, 1L), 
                                              .Label = c("DY", "DY-1", "DY-2", "DY+1", "DY+2", "DY+3"), class = "factor"),
                          ppg = c(0.16, 0.8088, 0.8525, 1.2115, 0.3051, 0.403, 0.6508, 1.0909, 0.2381, 0.2623, 
                                  0.4167, 0.2615, 0.3, 0.3433, 0.7794, 0.8676, 1.6912, 0, 0.5116, 0.0909), 
                          player = c("a.j.-cook", "a.j.-jenks", "a.j.-jenks", "a.j.-jenks", "aaron-berisha", 
                                     "aaron-berisha", "aaron-berisha", "aaron-berisha", "aaron-haydon", "aaron-haydon", 
                                     "aaron-haydon", "aaron-haydon", "aaron-haydon", "aaron-luchuk", "aaron-luchuk", 
                                     "aaron-luchuk", "aaron-luchuk", "aaron-scott", "aaron-snow", "aaron-taylor")), 
                     row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

pivot_wider(df_long, names_from = `DY+/-`, values_from = ppg)
#> # A tibble: 8 x 8
#>   pos   age.group player             DY `DY+1` `DY+2` `DY+3` `DY-1`
#>   <chr> <chr>     <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 D     late      a.j.-cook      0.16   NA     NA      NA    NA    
#> 2 F     early     a.j.-jenks     0.809   0.852  1.21   NA    NA    
#> 3 F     early     aaron-berisha  0.305   0.403  0.651   1.09 NA    
#> 4 D     early     aaron-haydon   0.262   0.417  0.262   0.3   0.238
#> 5 F     early     aaron-luchuk   0.343   0.779  0.868   1.69 NA    
#> 6 F     early     aaron-scott   NA      NA      0      NA    NA    
#> 7 F     early     aaron-snow    NA      NA      0.512  NA    NA    
#> 8 F     early     aaron-taylor   0.0909 NA     NA      NA    NA

^{Created on 2020-05-19 by the reprex package (v0.3.0)}

turkjr19 · May 19, 2020, 7:01pm

Yes, sorry I wanted exactly what you did.

However, I'm not sure what you did to make the pivot_wider work because when i run a str() command the str(df_long) looks to me to be the same as my str(overage) **** see below

When I copy your code
pivot_wider(overage, names_from = DY+/-, values_from = ppg)

I get the same error message
10 D early adam-bignell <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>

… with 2,188 more rows

Warning message:
Values in ppg are not uniquely identified; output will contain list-cols.

Use values_fn = list(ppg = list) to suppress this warning.
Use values_fn = list(ppg = length) to identify where the duplicates arise
Use values_fn = list(ppg = summary_fun) to summarise duplicates

tibble [20 × 5] (S3: tbl_df/tbl/data.frame)
pos : chr [1:20] "D" "F" "F" "F" ... age.group: chr [1:20] "late" "early" "early" "early" ...
DY+/- : Factor w/ 6 levels "DY","DY-1","DY-2",..: 1 1 4 5 1 4 5 6 2 1 ... ppg : num [1:20] 0.16 0.809 0.853 1.212 0.305 ...
$ player : chr [1:20] "a.j.-cook" "a.j.-jenks" "a.j.-jenks" "a.j.-jenks" ...

str(overage)
tibble [5,719 × 5] (S3: tbl_df/tbl/data.frame)
pos : chr [1:5719] "D" "F" "F" "F" ... age.group: chr [1:5719] "late" "early" "early" "early" ...
DY+/- : Factor w/ 6 levels "DY","DY-1","DY-2",..: 1 1 4 5 1 4 5 6 2 1 ... ppg : num [1:5719] 0.16 0.809 0.853 1.212 0.305 ...
$ player : chr [1:5719] "a.j.-cook" "a.j.-jenks" "a.j.-jenks" "a.j.-jenks" ...

siddharthprabhu · May 19, 2020, 8:23pm

The warning is telling you that your original data contains multiple ppg values for at least some combinations of pos, age.group & player. As a result, pivot_wider() doesn't know how to handle them and is asking you to specify an aggregation method via values_fn (in the absence of which it will return list-cols).

To identify where your duplicates are, you can use the values_fn = list(ppg = length) as suggested in the warning. Here I've applied it to the sample data you provided earlier.

> pivot_wider(df_long, names_from = `DY+/-`, values_from = ppg, values_fn = list(ppg = length))
# A tibble: 8 x 8
  pos   age.group player           DY `DY+1` `DY+2` `DY+3` `DY-1`
  <chr> <chr>     <chr>         <int>  <int>  <int>  <int>  <int>
1 D     late      a.j.-cook         1     NA     NA     NA     NA
2 F     early     a.j.-jenks        1      1      1     NA     NA
3 F     early     aaron-berisha     1      1      1      1     NA
4 D     early     aaron-haydon      1      1      1      1      1
5 F     early     aaron-luchuk      1      1      1      1     NA
6 F     early     aaron-scott      NA     NA      1     NA     NA
7 F     early     aaron-snow       NA     NA      1     NA     NA
8 F     early     aaron-taylor      1     NA     NA     NA     NA

As you can see, the maximum length of each element after the pivoting operation is 1. If you apply this function to your data, you will find values greater than 1. Those are the duplicate entries that are causing the warning. You need to decide how to aggregate those values.

turkjr19 · May 19, 2020, 9:03pm

Thank you for your time.

system · June 9, 2020, 9:03pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.