Summarizing logic with till slip transaction data

I have data representing transactions from a shop. Each basket or person has a line of data for each item bought. Basket sizes can range from 1 to many. The data is sorted by Till Slip which identifies the unique baskets. Simplistic example shown below. My issue - how to calculate [basically identify baskets] using logic. In this example I have 3 baskets - 1, 2, 3 with basket sizes of 3, 2, 3. I want to know which baskets have items b & c in them and flag them yes or no.
How do I apply logic over selected rows - within basket? Would ideally want to do it within the tidyverse / rstudio space.

Who has products b&c in their repertoire?

Answer TillSlip # 1 & 3
TillSlip Item Flag
1 a yes
1 b yes
1 c yes
2 b no
2 d no
3 a yes
3 c yes
3 b yes
etc

This is one way to do it

library(tidyverse)

sample_df <- data.frame(stringsAsFactors=FALSE,
    TillSlip = c(1, 1, 1, 2, 2, 3, 3, 3),
        Item = c("a", "b", "c", "b", "d", "a", "c", "b"),
        Flag = c("yes", "yes", "yes", "no", "no", "yes", "yes", "yes")
)

sample_df %>% 
    group_by(TillSlip) %>%
    arrange(TillSlip, Item) %>% 
    summarise(Item = paste(Item, collapse = " ")) %>% 
    mutate(flag = str_detect(Item, "b.*c")) %>% 
    separate_rows(Item)
#> # A tibble: 8 x 3
#>   TillSlip Item  flag 
#>      <dbl> <chr> <lgl>
#> 1        1 a     TRUE 
#> 2        1 b     TRUE 
#> 3        1 c     TRUE 
#> 4        2 b     FALSE
#> 5        2 d     FALSE
#> 6        3 a     TRUE 
#> 7        3 b     TRUE 
#> 8        3 c     TRUE

Created on 2019-12-13 by the reprex package (v0.3.0.9000)

Many thanks for this.
I follow it all except the *c in the str_detect section.
What does the * do? a.c without the * also works.
Regards

This is a regular expression, in this context, "." and "*" are metacharacters with special meaning, "." means any character including an empty space and "*" is a quantifier which means cero or more times.

I have other variables in the data frame, for example Sex.
library(tidyverse)

sample_df <- data.frame(stringsAsFactors=FALSE,
TillSlip = c(1, 1, 1, 2, 2, 3, 3, 3),
Item = c("a", "b", "c", "b", "d", "a", "c", "b"),
Sex = c(1,2,2,2,1,2,1,2),
Flag = c("yes", "yes", "yes", "no", "no", "yes", "yes", "yes")
)
Is there a way that sex, and any other variables, can remain in the new data frame but not used in the str_detect.
I want to determine things like there are 80% females in item structure a.c , or 40% of residents of XXX have a structure of b.c.
Thanks again.

I think I need a little more information to help you with this, why a single TillSlip can have two values for Sex?
It seems a little odd and the answer might be different if this is just a mistake, but if it's not, then you can do something like this

library(tidyverse)

sample_df <- data.frame(stringsAsFactors=FALSE,
                        TillSlip = c(1, 1, 1, 2, 2, 3, 3, 3),
                        Item = c("a", "b", "c", "b", "d", "a", "c", "b"),
                        Sex = c(1,2,2,2,1,2,1,2),
                        Flag = c("yes", "yes", "yes", "no", "no", "yes", "yes", "yes")
)

sample_df %>% 
    group_by(TillSlip) %>%
    arrange(TillSlip, Item) %>% 
    summarise(Item = paste(Item, collapse = " "), 
              Sex = paste(Sex, collapse = " ")) %>% 
    mutate(flag = str_detect(Item, "b.*c")) %>% 
    separate_rows(Item, Sex)
#> # A tibble: 8 x 4
#>   TillSlip Item  Sex   flag 
#>      <dbl> <chr> <chr> <lgl>
#> 1        1 a     1     TRUE 
#> 2        1 b     2     TRUE 
#> 3        1 c     2     TRUE 
#> 4        2 b     2     FALSE
#> 5        2 d     1     FALSE
#> 6        3 a     2     TRUE 
#> 7        3 b     2     TRUE 
#> 8        3 c     1     TRUE

Thanks again - I follow the logic now - bring each variable into the summarize/collapse logic.
This works perfectly for my requirements.
Just one more query. What does the * in the b.*c do?
Things seem to work without it, but for you to use it there must be a reason.
Regards

I already answered to that, check on my previous answer.

Sorry - see your response now. Thanks

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.