Count matches between two categories of columns

I exploded items from two categories (basket and shelf) using separate into separate columns, with one item per column. Now, I am trying to count the number of matches between the two categories, regardless of which column order they are in. I thought about using map2_lgl in a mutate but I am not sure how to get it to work and how to broadcast each column to all other columns in the other category. Appreciate any advice!

df <- tibble::tribble(
    ~basket_1, ~basket_2,  ~basket_3,  ~shelf_1,    ~shelf_2,     ~shelf_3,
      "apple",  "orange",    "mango",  "orange",     "mango", NA,
       "milk", "sausage",     "beef",    "beef",      "coke",      "towel",
  "batteries", "lighter", "gasoline", "lighter", "batteries",   "gasoline"
  )

df

# number of matches: 2, 1, 3 
# A tibble: 3 x 6
  basket_1  basket_2 basket_3 shelf_1 shelf_2   shelf_3 
  <chr>     <chr>    <chr>    <chr>   <chr>     <chr>   
1 apple     orange   mango    orange  mango     NA      
2 milk      sausage  beef     beef    coke      towel   
3 batteries lighter  gasoline lighter batteries gasoline

# I thought about something like this (of course, this does not work): 
df %>% mutate(matches = map2_lgl(starts_with("basket_"), 
                                 starts_with("shelf_"), 
                                 ~str_detect(.x, .y) %>% sum())

I do not know how to do this with purrr.

library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tibble::tribble(
  ~basket_1, ~basket_2,  ~basket_3,  ~shelf_1,    ~shelf_2,     ~shelf_3,
  "apple",  "orange",    "mango",  "orange",     "mango", NA,
  "milk", "sausage",     "beef",    "beef",      "coke",      "towel",
  "batteries", "lighter", "gasoline", "lighter", "batteries",   "gasoline"
)

baskets <- df %>% select(starts_with("basket")) 
shelves <-  df %>% select(starts_with("shelf")) 
Vec <- vector("numeric",length = nrow(baskets))  
for(i in 1:nrow(baskets)) Vec[i] <- sum(baskets[i,] %in% shelves[i,])
Vec
#> [1] 2 1 3

Created on 2021-09-04 by the reprex package (v0.3.0)

Thanks @FJCC. I found another way to do it inside mutate, though it is unwieldy as I have to type the columns out.

df %>% 
  rowwise() %>% 
  mutate(matches = match(c(basket_1, basket_2, basket_3),
                         c(shelf_1, shelf_2, shelf_3),
                         incomparables = NA
                         ) %>% n_distinct(na.rm = TRUE)
         ) %>% ungroup()

Would be great to extract the column names then convert the vector of strings to a vector of objects. I tried sapply(df %>% select(starts_with("basket_")) %>% colnames(), get) and eval(parse(text = df %>% select(starts_with("basket_")) %>% colnames()) to no avail though.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.