Text identification

Hi there
I have 2 dataframes

The first one has two rows:

Id model
1 Renault Laguna 1.9 DCI 120cv luxe privilege
2 Seat Tarraco 2.0 business Intelligence

The second dataframes has the following rows:

Id model
X1 Rena Laguna 1.9dci 120 lux priv
X2 Rena Laguna 19dci 120 luxe
X3 Rena Laguna 1927 CCM 119 cv base
X4 Seat Tarraco 2000 CCM inteligence
X5 Seat Tarraco 2.0 bus. Intell.
X6 Seat Tarraco 2.0 business.. Intel..

So, I would like connect the two vehicles oof the first dataframe with the best vehicles in the second one.
The result should be 2 columns, the first with the 2 vehicles of the dataframe 1 and the second column with the best vehicles from the dataframe 2.

Thank you

What the criteria is to consider a vehicle "the best"?

For this you can use the fuzzyjoins package

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

Hi there,

I would like to intersect the dataframe "x" and "y", through a common data site in the vector "z".
The final result should be connect the 2 vehicles in the dataframe "y" with the most coincident vehicles site in the dataframe "x"

The idea is use the vector "z" as a help for both dataframe to look for the coincident strings between the both dataframes.

Dataframe x:
X.Description
1 PEUGEOT 308 5P Active 1.6 BlueHDi 73KW (100CV)
2 PEUGEOT 308 5P Business 1.6 BlueHDi 73KW (100CV)
3 PEUGEOT 308 5P ACTIVE 1.6 BlueHDi 84KW (115CV)
4 PEUGEOT 308 5P Allure 1.6 BlueHDi 73KW (100CV)
5 PEUGEOT 308 5P Allure 1.6 BlueHDi 84KW (114CV)
Dataframe y:

    Description

1 PEUGEOT 308 5P Active 1.6 BlueHDi 73KW (100CV) manual gear box
2 PEUGEOT 308 5P Allure 1.6 BlueHDi 84KW (114CV) manual gear box

Vector "z":
z <- c("allure", "active", "business", "1.6", "BlueHDi",
"84KW", "73KW", "Peugeot", "308", "5p")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.