Sapply to deliver new column with similar text

Hi guys,
I have 2 dataframes. I need add a new column in the dataframe x, using the more similar data in the dataframe Y.

x<- data.frame(customer=c("2BOSA7A6T"))

y<- data.frame(supplier=c("2BOS A7A6T;SC4","2BOS A7A6T;SL4", 
                          "2BOS A7M6T;SC4", "2BOS A7M6T;SL4"))

### the result would be that:

x<- data.frame(customer=c("2BOSA7A6T"),
               supplier=c("2BOS A7A6T;SC4",
                         "2BOS A7A6T;SL4"))

### I think sapply function could help

I think you need to perform a join instead.

library(tidyverse)

x <- data.frame(customer = "2BOSA7A6T")
y <- data.frame(supplier = c("2BOS A7A6T;SC4","2BOS A7A6T;SL4", "2BOS A7M6T;SC4", "2BOS A7M6T;SL4"))

y %>% 
  separate(supplier, into = c("customer", "supplier_id"), sep = ";", remove = FALSE) %>% 
  mutate(customer = str_replace_all(customer, "\\s", "")) %>% 
  inner_join(x, by = "customer") %>% 
  select(customer, supplier)
#>    customer       supplier
#> 1 2BOSA7A6T 2BOS A7A6T;SC4
#> 2 2BOSA7A6T 2BOS A7A6T;SL4

Created on 2020-06-04 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

It won't because the join I've designed earlier assumes that the entire customer ID precedes the semi-colon which it doesn't in your second example. It's even more problematic because in your second example, the customer ID appears to be separated by a semi-colon which is used to distinguish between the customer and supplied ID in the first example.

If your data contains both these cases, it'll be hard to create a generalized solution.

Hi again,
Thank you for quick response.
I keep having issues becuase it is not runing for the following example:

x <- data.frame(customer = "SHBUF7WF2ZIEZ221T")
y <- data.frame(supplier = c("SHBUF7;WF2,ZIE,Z22,1T9","SHBUF7;WF2,ZIE,Z22,1T8",
                             "SHBUF7;WF2,ZIE,Z22,1T", "SHB  UF7;WF2,ZIE,Z22",
                             "SHBUF7;WF2,ZIE,Z22,1T9999999"))

#### The solution would be

x<- data.frame(customer=c("SHBUF7WF2ZIEZ221T"),
               supplier=c("SHBUF7;WF2,ZIE,Z22,1T9",
                          "SHBUF7;WF2,ZIE,Z22,1T8","SHBUF7;WF2,ZIE,Z22,1T",
                          "SHBUF7;WF2,ZIE,Z22,1T9999999"))

![image|526x193](upload://88aUapbWNpgtCjaF4wLISvy1KZC.png) 

Thanks