rely to Conditional tidy_comb_all, and reinserting fields?

@CKaplonski - did you ever find a solution to this problem ?
Conditional tidy_comb_all, and reinserting fields? - General - RStudio Community

I have a very similar issue and wondered what approach you took in the end?

many thanks,
Natalie

HI Natalie - Short answer: I did something completely different. Things are a bit hectic right now, but I'll get back to you by the end of the week with the code and more details. Just wanted to acknowledge your post right now.

Chris

library(tidyverse)

library(tidystringdist)

them <- tribble(~name,~producer,~price,
                "14 Year Old 2006 - Single Grain Collection", "Port Dundas", "42.95", 
                "8 Year Old 2013 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Staoisha", "35.95", 
                "13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Teaninich", "36.95", 
                "10 Year Old 2011 (cask 386) - Un-Chillfiltered Collection", "Edradour", "39.95", 
                "12 Year Old 2009 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Benrinnes", "54.95", 
                "25 Year Old 1996 (cask 962101) - Celebration of the Cask", "Benrinnes", "252.95" )

us <- tribble(~name,~producer,~price,
              "14 Year Old 2006 - Single Grain Collection", "Benrinnes", "47.95", 
              "9 Year Old 2012 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Laphroig", "36.95", 
              "13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Staoisha", "36.95", 
              "10 Year Old 2011 (cask 386)", "Teaninich", "39.95", 
              "10 Year Old 2011 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Edradour", "54.95", 
              "25 Year Old 1996 (cask 962103) - Celebration of the Cask", "Benrinnes", "252.95")



(names_to_match <- expand_grid(V1=pull(them,name),
                                V2=pull(us,name)))

(matched_names <- tidy_stringdist(names_to_match,method="osa"))

(all_together <- inner_join(them,
          matched_names,by=c("name"="V1")) |> 
  inner_join(us,
             by=c("V2"="name")))


(best_matched_names <- group_by(all_together,
                               name) |>  slice_min(osa,n=1) |> ungroup())

(my_ordering_fac <- as_factor(them$name))

(result <- mutate(best_matched_names,
                  name=factor(name,levels=levels(my_ordering_fac))) |> 
    arrange(name))

them
result

That is probably more elegant than I did. But for completeness sake, here's the core part of what I did.

them <- tribble(~their_name,~producer,~their_price,
               "14 Year Old 2006 - Single Grain Collection", "Port Dundas", "42.95", 
               "8 Year Old 2013 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Staoisha", "35.95", 
               "13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Teaninich", "36.95", 
               "10 Year Old 2011 (cask 386) - Un-Chillfiltered Collection", "Edradour", "39.95", 
               "12 Year Old 2009 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Benrinnes", "54.95", 
               "25 Year Old 1996 (cask 962101) - Celebration of the Cask", "Benrinnes", "252.95" )

us <- tribble(~our_name,~producer,~our_price,
              "14 Year Old 2006 - Single Grain Collection", "Benrinnes", "47.95", 
              "9 Year Old 2012 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Laphroig", "36.95", 
              "13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Staoisha", "36.95", 
              "10 Year Old 2011 (cask 386)", "Teaninich", "39.95", 
              "10 Year Old 2011 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Edradour", "54.95", 
              "25 Year Old 1996 (cask 962103) - Celebration of the Cask", "Benrinnes", "252.95")

combo <- full_join(cwm, viv, by = "producer")
combo <- group_by(combo, producer)
combo2 <- complete(combo, our_name, their_name)
                 
# Works out a similarity indicator for the names. 
combo2 <- mutate(combo2, similar = stringsim(our_name, their_name))

# You can set similar to different levels to give more or less strict matching
combo2 <- dplyr::filter(combo2, similar > 0.6)
view(combo2)

In the code I'm currently using, there are other fields I was able to scrape and compare, such as vintages for wines. I also then do a column based on similarity to flag whether or not it's a probable match. For this example, you'd need to set it at 0.80 for it to flag them properly.

combo2 <- mutate(combo2, Product_Same = ifelse(similar>0.74,"Probably","Probably Not"))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.