pivot_wider doesn't change the dimension of dataset after applying for it

YuJiang · December 4, 2022, 4:03am

Hello, I have a question about pivot_wider. After applying for it, it doesn't change the dimensions of the original dataset.

I have a large dataset about the order data and some customers repurchased it while some didn't do it.

The simple sample dataset is as follows.

data_sample <- c('Jack', 'Grace', 'Jack','Mark', 'Alice', 'Ann', 'Alice', 'Jack')
id_sample <- c(1, 2, 1, 3, 4, 5, 4, 1)
time_sample <- c("2018-10-03 19:51:51",
                 "2018-10-05 19:55:15",
                 "2018-11-19 06:26:02",
                 "2019-01-06 15:24:30",
                 "2018-10-01 15:15:43",
                 "2018-10-05 11:12:54",
                 "2019-01-27 00:49:19",
                 "2018-10-03 10:10:34")

dat_sample <- as.data.frame(cbind(data_sample, id_sample, time_sample))

dat_sample

dat_cluster <- dat_sample %>% arrange(match(data_sample, unique(data_sample)), time_sample)

dat_cluster

It means Jack has ordered the item for three times and he repurchased; Alice has ordered it twice; Ann, Mark and Grace only ordered it once and they didn't repurchase. Therefore, how can I mutate a new variable, say, whether_to_purcahse for these customers? 1 means repurchase and 0 means no repurchase.

Since I want to transfer the dat_sample into the data frame in excel, shown below,how can I manipulate it in r with tidyverse? I know I need to first ensure whether this person repurchased or not and then I need to know how many times the person has ordered totally if he made any repurchase. Finally, I need to make the long dateset into a wider dataset. However, I have some problem in implementing these steps above. Any suggestions or help?

Thanks so much.

EconProf · December 4, 2022, 5:57am

Given that you tried pivot_wider(), but did not get the expected results, it is best to show us that code. We would be able to use it on the data you provided and find the source of the problem.

My attempt uses group_by(customer) to create a new variable Order_number for each customer. It is a sequence of integers, with the length of the sequence equal to their number of order dates. Jack has three order dates, so he gets the sequence 1, 2, 3. This means he made a first, second and third purchase. Ann only has one, so her sequence is just 1, which is her first and only purchase.

Try pivot_wider() on the resulting data frame and see what you can get.

library(tidyverse)

Customer <- c('Jack', 'Grace', 'Jack','Mark', 'Alice', 'Ann', 'Alice', 'Jack')
Order_Date <- c("2018-10-03 19:51:51",
                 "2018-10-05 19:55:15",
                 "2018-11-19 06:26:02",
                 "2019-01-06 15:24:30",
                 "2018-10-01 15:15:43",
                 "2018-10-05 11:12:54",
                 "2019-01-27 00:49:19",
                 "2018-10-03 10:10:34")

data_sample <- data.frame(Customer, Order_Date)

data_sample %>% 
  arrange(Customer, Order_Date) %>%  
  group_by(Customer) %>%  
    mutate(Order_number = 1:length(Order_Date)) %>% 
  ungroup()
#> # A tibble: 8 × 3
#>   Customer Order_Date          Order_number
#>   <chr>    <chr>                      <int>
#> 1 Alice    2018-10-01 15:15:43            1
#> 2 Alice    2019-01-27 00:49:19            2
#> 3 Ann      2018-10-05 11:12:54            1
#> 4 Grace    2018-10-05 19:55:15            1
#> 5 Jack     2018-10-03 10:10:34            1
#> 6 Jack     2018-10-03 19:51:51            2
#> 7 Jack     2018-11-19 06:26:02            3
#> 8 Mark     2019-01-06 15:24:30            1

^{Created on 2022-12-03 with reprex v2.0.2}

Flm · December 4, 2022, 4:00pm

Is this what you want?

library(tidyverse)

data_sample <- c('Jack', 'Grace', 'Jack','Mark', 'Alice', 'Ann', 'Alice', 'Jack')
id_sample <- c(1, 2, 1, 3, 4, 5, 4, 1)
time_sample <- c("2018-10-03 19:51:51",
                 "2018-10-05 19:55:15",
                 "2018-11-19 06:26:02",
                 "2019-01-06 15:24:30",
                 "2018-10-01 15:15:43",
                 "2018-10-05 11:12:54",
                 "2019-01-27 00:49:19",
                 "2018-10-03 10:10:34")

dat_sample <- as.data.frame(cbind(data_sample, id_sample, time_sample))

dat_sample

dat_cluster <- dat_sample %>% arrange(match(data_sample, unique(data_sample)), time_sample)

Code


dat_cluster %>%
  group_by(data_sample) %>%
  mutate(order = row_number(),
         total_orders = max(order),
         order = paste(order, "order", sep = "_")) %>%
  pivot_wider(names_from = order, values_from = time_sample) %>% 
  ungroup()

output

# A tibble: 5 × 6
  data_sample id_sample total_orders `1_order`           `2_order`      3_ord…¹
  <chr>       <chr>            <int> <chr>               <chr>          <chr>  
1 Jack        1                    3 2018-10-03 10:10:34 2018-10-03 19… 2018-1…
2 Grace       2                    1 2018-10-05 19:55:15 NA             NA     
3 Mark        3                    1 2019-01-06 15:24:30 NA             NA     
4 Alice       4                    2 2018-10-01 15:15:43 2019-01-27 00… NA     
5 Ann         5                    1 2018-10-05 11:12:54 NA             NA

EDIT
Using whether_to_purcahse (1 or 0)

dat_cluster %>%
  group_by(data_sample) %>%
  mutate(order = row_number(),
         whether_to_purcahse = ifelse(max(order)>1, 1, 0),
         order = paste(order, "order", sep = "_")) %>%
  pivot_wider(names_from = order, values_from = time_sample) %>% 
  ungroup()

Output

# A tibble: 5 × 6
  data_sample id_sample whether_to_purcahse `1_order`           2_ord…¹ 3_ord…²
  <chr>       <chr>                   <dbl> <chr>               <chr>   <chr>  
1 Jack        1                           1 2018-10-03 10:10:34 2018-1… 2018-1…
2 Grace       2                           0 2018-10-05 19:55:15 NA      NA     
3 Mark        3                           0 2019-01-06 15:24:30 NA      NA     
4 Alice       4                           1 2018-10-01 15:15:43 2019-0… NA     
5 Ann         5                           0 2018-10-05 11:12:54 NA      NA

system · January 15, 2023, 4:01pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.