pivot_wider with extra rows instead of duplicate errors?

df = data.frame(variables = c('a', 'b', 'c', 'c'),
                values = c(1, 2, 3, 4),
                dates = c(6, 6, 6, 6))

# What df looks like
df
# variables values dates
#1         a      1     6
#2         b      2     6
#3         c      3     6
#4         c      4     6

###

# Output 1
df |>
  pivot_wider.(names_from = variables,
               values_from = values,
               values_fill = NA,
               values_fn = list)

#  dates     a     b       c
#      6     1     2  c(3,4)

###

# Output 2
df |>
  pivot_wider.(names_from = variables,
               values_from = values,
               values_fill = NA)

#  dates     a     b     c
#      6     1     1     2
# Where do b = 1 and c = 2 come from???

Neither of these outputs make sense. This is my desired output:

#  dates     a     b     c
#      6     1     2     3
#      6     1     2     4

How can I make this possible?

Hello,

first of all, I cannot replicate your Output 2. I assume you use tidyr::pivot_wider() (I write this, because there is a dot after your pivot_wider, so maybe those are indeed different functions I am not aware of)?

To achieve your desired goal, you can use the following:

df <- data.frame(variables = c('a', 'b', 'c', 'c'),
                 values = c(1, 2, 3, 4),
                 dates = c(6, 6, 6, 6))
library(tidyr); library(dplyr)

df |>
  ### add truly unique dates (IDs)
  group_by(variables) |> 
  mutate(unique_dates = paste(dates,row_number(), sep = '.')) |>
  ungroup() |>
  pivot_wider(id_cols = unique_dates,
              names_from = variables, values_from = values) |>
  ### remove the suffix
  mutate(unique_dates = stringr::str_extract(unique_dates, "[0-9]{1}")) |>
  ### fill the gaps with values from above
  fill(c(a,b), .direction = 'updown') |>
  ### rename unique_dates
  rename('dates' = unique_dates)
#> # A tibble: 2 × 4
#>   dates     a     b     c
#>   <chr> <dbl> <dbl> <dbl>
#> 1 6         1     2     3
#> 2 6         1     2     4

Created on 2022-11-22 by the reprex package (v2.0.1)

The behaviour of pivot_wider() however is logical. You force it to reshape the data on non unique entries (e.g. your id column dates is always the same, so it should be one row after reshaping, but the column names come from variables, which contain c twice). Hence, pivot_wider() does not know how to put two values into one cell, other than creating a list and informing you to specify a function which reduces all duplicates to just one actual entry.

Kind regards

1 Like

Thanks! But using paste(), str_extract() and fill() seem to be very error prone, especially when paste() forces convert to character.

I thought using id_cols = unique_dates would solve this problem, but it doesn't seem to be the case.

As a side note, pivot_wider.() is from older version of tidytable. It's identical to dplyr.

1 Like

Basically your pivoted groups need to all have length == 1. Group "a" has 1 row, group "b" has 1 row, and group "c" has two rows. tidyr defaults to returning your results in a list column when this doesn't happen. data.table (and tidytable) return the length of each group when length != 1. So that's why Output 2 has a result of a=1, b=1 and c=2.

Output 1 is returning things in a list because that's what you're choosing. Try using values_fn = mean and you'll see values_fn is essentially applying an aggregation function to each group.

That being said - since you're returning lists (which is the correct approach), all we need to do is convert all of the lists to vectors using unlist(), grouped by your non-list columns (in this case "date"). We can do this using summarize(), and this will expand the data frame.

library(tidytable, w = FALSE)

df = data.frame(variables = c('a', 'b', 'c', 'c'),
                values = c(1, 2, 3, 4),
                dates = c(6, 6, 6, 6))

df |>
  pivot_wider(names_from = variables,
              values_from = values,
              values_fill = NA,
              values_fn = list) %>%
  summarize(across(where(is.list), unlist),
            .by = !where(is.list))
#> # A tidytable: 2 × 4
#>   dates     a     b     c
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     6     1     2     3
#> 2     6     1     2     4

Hope that helps! If you have any questions let me know.

4 Likes

!where(is.list) is a neat solution! I definitely would not have thought of that. Thanks!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.