Conditional tidy_comb_all, and reinserting fields?

Hello,

Let me start with the disclaimer that I'm still relatively new to R, and teaching myself as I go. So apologies in advance if I've missed something basic either in terms of what I should know, or question etiquette.

I'm trying to do a fuzzy match of sorts, between a scraped data set and our own stock, "them" and "us" in my example below. There's almost no chance there will be an exact match in any case, given the flexibility people use in names . So, I just want a list of things that are close. With my full data set, I can then filter by osa range, to narrow down my possible matches.

I have two questions, one quite possibly is something obvious that I'm missing. At any rate:

  1. Is there a way to do a conditional tidy_comb_all? By which I mean, do a group_by and then effectively tidy_comb_all within the group? I could achieve this by breaking down my original data files into separate files based on producer, but I was wondering if there was a way to do it that would involve that.

  2. This is the stupid / obvious question I'm probably missing. Can I 'carry along' other fields in a tidy_comb? To put it another way: once I have my possible pairings and matches, I need to re-add the price data, which is the point of this exercise - to be able to compare us & them on prices. What's the best way to re-add columns?

Many thanks!

Chris

library(tidyverse)

library(tidystringdist)

them <- tribble(~name,~producer,~price,
"14 Year Old 2006 - Single Grain Collection", "Port Dundas", "42.95", 
"8 Year Old 2013 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Staoisha", "35.95", 
"13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Teaninich", "36.95", 
"10 Year Old 2011 (cask 386) - Un-Chillfiltered Collection", "Edradour", "39.95", 
"12 Year Old 2009 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Benrinnes", "54.95", 
"25 Year Old 1996 (cask 962101) - Celebration of the Cask", "Benrinnes", "252.95" )

us <- tribble(~name,~producer,~price,
"14 Year Old 2006 - Single Grain Collection", "Benrinnes", "47.95", 
"9 Year Old 2012 (casks 900052 & 900059) - Un-Chilfiltered Collection", "Laphroig", "36.95", 
"13 Year Old 2008 (casks 715728 & 715734) - Un-Chillfiltered Collection", "Staoisha", "36.95", 
"10 Year Old 2011 (cask 386)", "Teaninich", "39.95", 
"10 Year Old 2011 (casks 305117 & 305118) - Un-Chillfiltered Collection", "Edradour", "54.95", 
"25 Year Old 1996 (cask 962103) - Celebration of the Cask", "Benrinnes", "252.95")
combo <- tidy_comb_all(them$name, us$name)
view(combo)
combo2 <- tidy_stringdist(combo, method = "osa" )
view(combo2)

OK - It turns out -- as most of you will immediately realise -- that I misunderstood how tidy_comb works. Stupid me. Anyway, I will leave this up here for a day or two, to see if people might have suggestion on how to approach the problem of fuzzy matching across two different data frames. (I've had no luck with fuzzy_join, although that's almost certainly me.)

So, any pointers gratefully received.

chris

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.