Join 2 dataframes using fuzzy lookup (partial/pattern matching)

I have 2 tables:

df1

+-------+--------------+  
|   Id  |  Title       |
+-------+--------------+  
|   1   |   AAA12345   |
|   2   |   BBB3245    |
|   3   |   CCC876     |
+-------+--------------+

df2

+-------+---------------+------------------------------------+
|   Id  |      Sub      |               Body                 |
+-------+---------------+------------------------------------+  
|   1   |   CC878       | some mail body CC some text here   |
|   2   |   AZ5464      | some text here AZ continues here   |
|   3   |   BB93245     | some text BB present here          |
|   4   |   some sub4   | AAA string is present here also    |
|   5   |   some sub5   | CCC string is present here         |
+-------+---------------+------------------------------------+

I want to match the Title from df1 with the Sub column of the df2 with partial or closest matching pattern.

The output data frame should be like:

df3

+--------------+---------------+------------------------------------+
|   Title      |      Sub      |               Body                 |
+--------------+---------------+------------------------------------+  
|   CCC876     |   CC878       | some mail body CC some text here   |
|   BBB3245    |   BB93245     | some text here BB continues here  |
+--------------+---------------+------------------------------------+

Some additional information: I receive data with a lot of typos and need to match their product/sub description to their closest match.

Much appreciated.

You can use this:
GitHub - dgrtwo/fuzzyjoin: Join tables together on inexact matching

1 Like

I would appreciate if there is a working example or a code snippet. I can progress further from there.

The link provides examples which you should be able to adapt.

Thanks. I got it resolved

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.