Get only orders where 2 products are bought within

Dear all,

Sorry in advance if my question looks too basic.
But I have a list or transactionid, and for each productID.

What would be the best approach (function, library) if I wish to extract only the order, where product X and Y has been bought but within the same order ?

Example output below is transactionID 123 and 789
Capture

Thanks again for your help,
Bastien

It would be easier to help you if you provide a REPRoducible EXample (reprex), but here is an example of a possible solution.

library(dplyr)
library(stringr)

set.seed(1)
sample_data <- data.frame(transaction_id = sample(1:3, 7, replace = TRUE),
                          product = sample(c("X", "Y", "Z"), 7, replace = TRUE))

sample_data %>% 
    arrange(transaction_id)
#>   transaction_id product
#> 1              1       Y
#> 2              1       Z
#> 3              1       X
#> 4              2       Z
#> 5              3       Y
#> 6              3       X
#> 7              3       X

sample_data %>% 
    arrange(transaction_id) %>% 
    group_by(transaction_id) %>% 
    summarise(product = paste(product, collapse = ",")) %>% 
    filter(str_detect(product, "[XY]{1}.*[XY]{1}"))
#> # A tibble: 2 x 2
#>   transaction_id product
#>            <int> <chr>  
#> 1              1 Y,Z,X  
#> 2              3 Y,X,X

Created on 2019-10-17 by the reprex package (v0.3.0.9000)

2 Likes

Logic is perfect - thanks a lot

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.