values_fn = length in pivot_wider gives NULL length NA instead of zero

Hi,

  1. I am trying to use values_fn argument in pivot_wider function.
  2. Initially I end up with list column, and I want to use length function (in values_fn) for each cell to get the length
  3. My list columns contain NULL. It seems, length function does not work with list element NULL, it gives NA. But, length(NULL) = 0. So, I am expecting 0 instead of NA.
library(tidyverse)
dx<-data.frame(A = c(1,2,2,4,5),B = c("B1","B3","B3","B1","B3"),C = c("C1","C4","C5","C8","C9"))
dx
#>   A  B  C
#> 1 1 B1 C1
#> 2 2 B3 C4
#> 3 2 B3 C5
#> 4 4 B1 C8
#> 5 5 B3 C9
dx %>%
  pivot_wider(
    names_from = c(B),
    values_from = c(C),
    names_glue = "{B}_{.value}",
    values_fn = list
  )
#> # A tibble: 4 x 3
#>       A B1_C      B3_C     
#>   <dbl> <list>    <list>   
#> 1     1 <chr [1]> <NULL>   
#> 2     2 <NULL>    <chr [2]>
#> 3     4 <chr [1]> <NULL>   
#> 4     5 <NULL>    <chr [1]>

dx %>%
  pivot_wider(
    names_from = c(B),
    values_from = c(C),
    names_glue = "{B}_{.value}",
    values_fn = length
  )
#> # A tibble: 4 x 3
#>       A  B1_C  B3_C
#>   <dbl> <int> <int>
#> 1     1     1    NA
#> 2     2    NA     2
#> 3     4     1    NA
#> 4     5    NA     1

print(length(NULL))
#> [1] 0

Hi, I think you want values_fill:

dx %>%
  pivot_wider(
    names_from = c(B),
    values_from = c(C),
    names_glue = "{B}_{.value}",
    values_fn = length,
    values_fill = list(C = 0)
  )

out of interest, isn't this just counting, like

dx %>%
  group_by(A, B) %>%
  count() %>%
  pivot_wider(A, names_from = B, values_from = n, values_fill = list(n = 0))

rather than thinking about list columns and length and wide tables?

2 Likes

Thank you very much for the reply. It definitely solves the problem using values_fill. But, I was trying to understand how values_fn works and later use it for other purpose.

library(tidyverse)
dx<-data.frame(A = c(1,2,2,4,5),B = c("B1","B3","B3","B1","B3"),C = c("C1","C4","C5","C8","C9"))
p1<-dx %>%
  pivot_wider(
    names_from = c(B),
    values_from = c(C),
    names_glue = "{B}_{.value}",
    values_fn = list
  )
map_dbl(p1$B1_C, length)
#> [1] 1 0 1 0
map_dbl(p1$B3_C, length)
#> [1] 0 2 0 1

When I try to use values_fn = length in pivot_wider, technically it is not working correctly for each cell (as it's documentation says " Optionally, a function applied to the value in each cell in the output"). NULL is nothing special and length(NULL)=0. I was expecting output as the output of map_dbl as above. The other thing I have noticed problematic, other argument of values_fn (say na.rm of mean) can not be passed to ... (dot dot dot of pivot_wider). Thanks

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.