pivot_wider changes the class of 'values' column from double numeric to integer

Here is the puzzle I've been scratching my head on:

df = data.frame(variables = c('a',    'b',     'c',     'c',     'd'),
                values =    c(1.12345, 2.12345, 3.12345, 4.12345, 1 ))

# All double numerics
.class2(df$values)

# The results do not matter at the moment, but what matters is that they're all integers for some reason
df |>
  pivot_wider(names_from = variables,
              values_from = values,
              values_fill = NA)

# Does not work
df |>
  pivot_wider(names_from = variables,
              values_from = as.double(values),
              values_fill = NA)

Is this a bug?

Using your code, pivot_wider() will put two numbers, 3.12345 and 4.12345, in the same location in the pivoted row, which requires a list. That means a list in all other row locations too. All the lists contain double numerics. The output does not display the contents of each list.

library(tidyverse)

df = data.frame(variables = c('a',    'b',     'c',     'c',     'd'),
                values =    c(1.12345, 2.12345, 3.12345, 4.12345, 1 ))

# The results do not matter at the moment, but what matters is that they're all integers for some reason
df |>
  pivot_wider(names_from = variables,
              values_from = values,
              values_fill = NA)
#> Warning: Values from `values` are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = {summary_fun}` to summarise duplicates.
#> * Use the following dplyr code to identify duplicates.
#>   {data} %>%
#>     dplyr::group_by(variables) %>%
#>     dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
#>     dplyr::filter(n > 1L)
#> # A tibble: 1 × 4
#>   a         b         c         d        
#>   <list>    <list>    <list>    <list>   
#> 1 <dbl [1]> <dbl [1]> <dbl [2]> <dbl [1]>

Created on 2022-11-22 with reprex v2.0.2

Adding the new variable "other" means it and "variables" uniquely identify each row in df.

library(tidyverse)

df = data.frame(variables = c('a',    'b',     'c',     'c',     'd'),
                values =    c(1.12345, 2.12345, 3.12345, 4.12345, 1 ),
                other = c('e', 'e', 'e', 'f', 'e'))

df |>
  pivot_wider(names_from = variables,
              values_from = values,
              values_fill = NA)
#> # A tibble: 2 × 5
#>   other     a     b     c     d
#>   <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 e      1.12  2.12  3.12     1
#> 2 f     NA    NA     4.12    NA

Created on 2022-11-22 with reprex v2.0.2
As noted in the warning message, you can also add values_fn = {max}, or {sum}, or some other function to summarise the values in the list.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.