how to replace words based on similarity with words in the another list

Hi
I would appreciate to help me this:
how to replace words (file1.txt) using their similarity from another list (file2.txt) in a batch mode for hundreds of file? I mean how to replace AAAAAAA with AAAAAAA_dddddd and than replace BBBBBBB with BBBBBBB_eeeeee and so on for thousands of the words. example of files:
file1.txt looks like this:
AAAAAAA
BBBBBBB
CCCCCCC

file2.txt looks like this:
AAAAAAA_dddddd
BBBBBBB_eeeeee
CCCCCCC_ffffffffff

thanks

Are file1.txt and file2.txt identical except for _xxxxxx and in the same order with the same number of lines? Otherwise, need more realistic example.

The package fuzzy_join allows to match strings based on similarities.

However, if the first part (before the underscore) matches the entry in file1 exactly it would be easier just to split the names in the file2 and match this against the entries in file1.

library(tidyverse)

file1 = data.frame("name_source" = c("AAAAAAA",
                                "BBBBBBB",
                                "CCCCCCC"))

file2 = data.frame("name_target" = c("AAAAAAA_dddddd",
                                "BBBBBBB_eeeeee",
                                "CCCCCCC_ffffffffff"))

file2_sep = separate(file2,
           name_target, into =c("name_source", "appendix"),
           remove = FALSE)

combined = full_join(file1, file2_sep)
combined
  name_source        name_target   appendix
1     AAAAAAA     AAAAAAA_dddddd     dddddd
2     BBBBBBB     BBBBBBB_eeeeee     eeeeee
3     CCCCCCC CCCCCCC_ffffffffff ffffffffff
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.