Sapply to deliver new column with similar text

Hi guys,
I have 2 dataframes. I need add a new column in the dataframe x, using the more similar data in the dataframe Y.

x<- data.frame(customer=c("2BOSA7A6T"))

y<- data.frame(supplier=c("2BOS A7A6T;SC4","2BOS A7A6T;SL4", 
                          "2BOS A7M6T;SC4", "2BOS A7M6T;SL4"))

### the result would be that:

x<- data.frame(customer=c("2BOSA7A6T"),
               supplier=c("2BOS A7A6T;SC4",
                         "2BOS A7A6T;SL4"))

### I think sapply function could help

I think you need to perform a join instead.

library(tidyverse)

x <- data.frame(customer = "2BOSA7A6T")
y <- data.frame(supplier = c("2BOS A7A6T;SC4","2BOS A7A6T;SL4", "2BOS A7M6T;SC4", "2BOS A7M6T;SL4"))

y %>% 
  separate(supplier, into = c("customer", "supplier_id"), sep = ";", remove = FALSE) %>% 
  mutate(customer = str_replace_all(customer, "\\s", "")) %>% 
  inner_join(x, by = "customer") %>% 
  select(customer, supplier)
#>    customer       supplier
#> 1 2BOSA7A6T 2BOS A7A6T;SC4
#> 2 2BOSA7A6T 2BOS A7A6T;SL4

Created on 2020-06-04 by the reprex package (v0.3.0)

Hi again,
Thank you for quick response.
I keep having issues becuase it is not runing for the following example:

x <- data.frame(customer = "SHBUF7WF2ZIEZ221T")
y <- data.frame(supplier = c("SHBUF7;WF2,ZIE,Z22,1T9","SHBUF7;WF2,ZIE,Z22,1T8",
                             "SHBUF7;WF2,ZIE,Z22,1T", "SHB  UF7;WF2,ZIE,Z22",
                             "SHBUF7;WF2,ZIE,Z22,1T9999999"))

#### The solution would be

x<- data.frame(customer=c("SHBUF7WF2ZIEZ221T"),
               supplier=c("SHBUF7;WF2,ZIE,Z22,1T9",
                          "SHBUF7;WF2,ZIE,Z22,1T8","SHBUF7;WF2,ZIE,Z22,1T",
                          "SHBUF7;WF2,ZIE,Z22,1T9999999"))

![image|526x193](upload://88aUapbWNpgtCjaF4wLISvy1KZC.png) 

Thanks

It won't because the join I've designed earlier assumes that the entire customer ID precedes the semi-colon which it doesn't in your second example. It's even more problematic because in your second example, the customer ID appears to be separated by a semi-colon which is used to distinguish between the customer and supplied ID in the first example.

If your data contains both these cases, it'll be hard to create a generalized solution.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.