not get expected results from 'pivot_wider'

YuJiang · February 4, 2023, 8:37pm

Hello I have a problem about 'pivot_wider' in R. I have a transaction-level dataset of three columns: unique ID, customers, and purchase time. The example is shown below.

The first column, unique_ID is distinctive for each row. The second column is the customer and some customers only purchased only, like Grace or Mark while some other customers repurchased, like Jack and Alice. I want to widen this table in the following way since I need to calculate the number of days between the orders if they repurchased but I also need to keep the unique ID for each customer since I need to join other tables later. For Jack, he has two different unique_id and how can I keep these two different unique_id for him? Thanks for your help.

customer <- c('Jack', 'Grace', 'Jack','Mark', 'Alice', 'Ann', 'Alice', 'Jack')
unique_id <- 1:8
time <- c("2018-10-03 19:51:51",
                 "2018-10-05 19:55:15",
                 "2018-11-19 06:26:02",
                 "2019-01-06 15:24:30",
                 "2018-10-01 15:15:43",
                 "2018-10-05 11:12:54",
                 "2019-01-27 00:49:19",
                 "2018-10-03 10:10:34")

data <- as.data.frame(cbind(unique_id, customer, time))

test <- data %>%
  group_by(customer) %>%
  arrange(time) %>%     # unnecessary if sorted already
  mutate(instance = row_number()) %>%
  ungroup() %>%
  pivot_wider(id_cols = c(unique_id, customer), names_from = instance, values_from = time) %>%
  mutate(whether_to_repurchase = 1* !is.na('2'))  # 0 if 2 NA; 1 if not

There is the output.

Then how can I know whether the customer repurchase or not in the future? The column 'whether to repurchase' is always 1. Thanks for your help.

nirgrahamuk · February 4, 2023, 8:47pm

library(tidyverse)

df_1 <- tibble::tribble(
  ~unique_id, ~customer,                 ~time,
  "1",    "Jack", "2018-10-03 19:51:51",
  "2",   "Grace", "2018-10-05 19:55:15",
  "3",    "Jack", "2018-11-19 06:26:02",
  "4",    "Mark", "2019-01-06 15:24:30",
  "5",   "Alice", "2018-10-01 15:15:43",
  "6",     "Ann", "2018-10-05 11:12:54",
  "7",   "Alice", "2019-01-27 00:49:19",
  "8",    "Jack", "2018-10-03 10:10:34"
)

group_by(df_1,customer) |> summarise(
  transaction_ids = list(unique_id),
  time = list(time)
) |> unnest_wider(time,names_sep = "_") |> 
mutate(ever_reordered = !is.na(time_2))

YuJiang · February 5, 2023, 4:01am

Thanks for your help but the output does not include the unique_id?

nirgrahamuk · February 5, 2023, 7:48am

All the transaction ids are kept and can be retrieved.

YuJiang · February 6, 2023, 3:32am

nirgrahamuk:

library(tidyverse)

df_1 <- tibble::tribble(
  ~unique_id, ~customer,                 ~time,
  "1",    "Jack", "2018-10-03 19:51:51",
  "2",   "Grace", "2018-10-05 19:55:15",
  "3",    "Jack", "2018-11-19 06:26:02",
  "4",    "Mark", "2019-01-06 15:24:30",
  "5",   "Alice", "2018-10-01 15:15:43",
  "6",     "Ann", "2018-10-05 11:12:54",
  "7",   "Alice", "2019-01-27 00:49:19",
  "8",    "Jack", "2018-10-03 10:10:34"
)

group_by(df_1,customer) |> summarise(
  transaction_ids = list(unique_id),
  time = list(time)
) |> unnest_wider(time,names_sep = "_") |> 
mutate(ever_reordered = !is.na(time_2))

Gotcha and it works for me now. Thanks for your help and response.

system · February 13, 2023, 3:32am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.