Picking the first date of purchase from different Groups

Rscotty · September 30, 2018, 10:32pm

I'd like to have a table of the first purchase date of Fruits bought by each customer.

I have attempted to answer it. However, the result is not correct. If anybody could come up with the correct result without using "arrange" function [I have used it in my code] that would be such a great help. Thank you

Data:

library(tidyverse)

sample_1 <- tibble::tribble(
  ~Purchased_Date, ~Customer_ID,  ~Fruits,
     "24/02/2017",         123L,  "Apple",
     "24/02/2017",         123L,  "Apple",
      "3/10/2016",         124L, "Banana",
     "30/09/2016",         124L,  "Apple",
     "25/10/2016",         124L,  "Apple",
     "13/10/2016",         125L,  "Apple",
     "20/10/2016",         125L,  "Apple",
      "4/05/2017",         125L,  "Apple",
     "15/05/2017",         125L, "Banana",
     "18/04/2017",         125L, "Banana",
     "18/04/2017",         125L, "Banana",
     "18/04/2017",         125L, "Banana"
  )

My Attempt:

library(tidyverse)

sample_1 %>% 
  distinct() %>% 
  arrange(Purchased_Date, Customer_ID, Fruits) %>% 
  group_by(Customer_ID, Fruits) %>% 
  slice(1L)

Expected output:

library(tidyverse)

expected_output <- tibble::tribble(
  ~Purchased_Date, ~Customer_ID,  ~Fruits,
     "24/02/2017",         123L,  "Apple",
      "3/10/2016",         124L, "Banana",
     "30/09/2016",         124L,  "Apple",
     "13/10/2016",         125L,  "Apple",
     "15/05/2017",         125L, "Banana"
  )

As you can see my attempt output is not matching the expected output since something happens with arrange function and doesn't give correct output.

I am hoping to have a solution using tidyverse and without using arrange function.

mfherman · September 30, 2018, 10:49pm

I'm not totally sure if this is what you are looking for, but I think you are having a problem with arrange() because your Purchased_Date is currently a character variable, instead of a date variable.

You can use the lubridate package to easily turn that column into dates, and then your code works:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

sample_1 <- tibble::tribble(
  ~Purchased_Date, ~Customer_ID,  ~Fruits,
  "24/02/2017",         123L,  "Apple",
  "24/02/2017",         123L,  "Apple",
  "3/10/2016",         124L, "Banana",
  "30/09/2016",         124L,  "Apple",
  "25/10/2016",         124L,  "Apple",
  "13/10/2016",         125L,  "Apple",
  "20/10/2016",         125L,  "Apple",
  "4/05/2017",         125L,  "Apple",
  "15/05/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana"
)

sample_1 %>%
  mutate(Purchased_Date = dmy(Purchased_Date)) %>% 
  arrange(Purchased_Date) %>% 
  group_by(Customer_ID, Fruits) %>% 
  slice(1)
#> # A tibble: 5 x 3
#> # Groups:   Customer_ID, Fruits [5]
#>   Purchased_Date Customer_ID Fruits
#>   <date>               <int> <chr> 
#> 1 2017-02-24             123 Apple 
#> 2 2016-09-30             124 Apple 
#> 3 2016-10-03             124 Banana
#> 4 2016-10-13             125 Apple 
#> 5 2017-04-18             125 Banana

^{Created on 2018-09-30 by the reprex package (v0.2.1)}

Rscotty · September 30, 2018, 11:47pm

Ah yes, you're correct - Purchased_Date is a character variable, instead of a date variable. This will give the expected output. Thank you

However, is there a better way of getting the expected result without using arrange function?

mfherman · October 1, 2018, 12:31am

Why would you like to avoid arrange? In any case, I think this should get the job done as well:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

sample_1 <- tibble::tribble(
  ~Purchased_Date, ~Customer_ID,  ~Fruits,
  "24/02/2017",         123L,  "Apple",
  "24/02/2017",         123L,  "Apple",
  "3/10/2016",         124L, "Banana",
  "30/09/2016",         124L,  "Apple",
  "25/10/2016",         124L,  "Apple",
  "13/10/2016",         125L,  "Apple",
  "20/10/2016",         125L,  "Apple",
  "4/05/2017",         125L,  "Apple",
  "15/05/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana",
  "18/04/2017",         125L, "Banana"
)

sample_1 %>%
  mutate(Purchased_Date = dmy(Purchased_Date)) %>%
  group_by(Customer_ID, Fruits) %>% 
  summarize(Purchased_Date = min(Purchased_Date))
#> # A tibble: 5 x 3
#> # Groups:   Customer_ID [?]
#>   Customer_ID Fruits Purchased_Date
#>         <int> <chr>  <date>        
#> 1         123 Apple  2017-02-24    
#> 2         124 Apple  2016-09-30    
#> 3         124 Banana 2016-10-03    
#> 4         125 Apple  2016-10-13    
#> 5         125 Banana 2017-04-18

^{Created on 2018-09-30 by the reprex package (v0.2.1)}