ReGex drives me nuts. Subtitled "Replacing a misspelling in the middle of multiple unique strings"

Hello everyone,

I have a ReGex, and, I think, a Tidyverse question.

I need to correct the spelling of "Missouri" in multiple unique strings. My goal is to keep the text surrounding the misspelling but replace only the misspelling ("old misosuri bank" to "old missouri bank").

Where I'm coming up short is how to accomplish this when the misspelling is flanked by differing strings without resorting to writing a specific gsub() for each unique string. See "#specific gsub of a type that I'd like to avoid using#" in the code chunk.

There has to be a way to use ReGex and Tidyverse to tell Rstudio, "Look for this misspelling in the Name variable regardless of where it occurs within a string and the identity of strings/characters flanking the misspelling. Then, replace only the misspelling while leaving the flanking strings/characters intact.", but I'll be darned if I can figure it out. (Will someone please tell the ReGex demons to stop that infernal cackling?)

I'd appreciate any suggestions you might have.

Thanks is advance.

Linda

#incorrect spellings
incorrect_MO_spell  <- c("old misosuri bank", "old misosuri bank", 
"southwest misosuri bank", "security bank of southwest misosuri",
"regional misosuri bank","old misosuri bank",
"first national bank of nevada misosuri","farmers state bank of northern misosuri", 
"missouri is spelled correctly")
  incorrect_MO_spell <- as_tibble(incorrect_MO_spell)
      colnames(incorrect_MO_spell) = c("Name")

#specific gsub of a type that I'd like to avoid using#
  incorrect_MO_spell$value <-
    gsub("old misosuri bank",
         "old missouri bank",incorrect_MO_spell$value)

#output with corrected spellings that I want
correct_MO_spell  <- c("old missouri bank", "old missouri bank", 
"southwest missouri bank", "security bank of southwest missouri",
"regional missouri bank","old missouri bank",
"first national bank of nevada missouri","farmers state bank of northern missouri", 
"missouri is spelled correctly")
  correct_MO_spell <- as_tibble(correct_MO_spell)
    colnames(correct_MO_spell) = c("Name")

Here are two solutions.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
incorrect_MO_spell  <- c("old misosuri bank", "old misosuri bank", 
                         "southwest misosuri bank", "security bank of southwest misosuri",
                         "regional misosuri bank","old misosuri bank",
                         "first national bank of nevada misosuri","farmers state bank of northern misosuri", 
                         "missouri is spelled correctly")
incorrect_MO_spell <- as_tibble(incorrect_MO_spell)
colnames(incorrect_MO_spell) = c("Name")
incorrect_MO_spell
#> # A tibble: 9 x 1
#>   Name                                   
#>   <chr>                                  
#> 1 old misosuri bank                      
#> 2 old misosuri bank                      
#> 3 southwest misosuri bank                
#> 4 security bank of southwest misosuri    
#> 5 regional misosuri bank                 
#> 6 old misosuri bank                      
#> 7 first national bank of nevada misosuri 
#> 8 farmers state bank of northern misosuri
#> 9 missouri is spelled correctly
incorrect_MO_spell$Name <- gsub("misosuri", 
                                "missouri", 
                                incorrect_MO_spell$Name) 

incorrect_MO_spell
#> # A tibble: 9 x 1
#>   Name                                   
#>   <chr>                                  
#> 1 old missouri bank                      
#> 2 old missouri bank                      
#> 3 southwest missouri bank                
#> 4 security bank of southwest missouri    
#> 5 regional missouri bank                 
#> 6 old missouri bank                      
#> 7 first national bank of nevada missouri 
#> 8 farmers state bank of northern missouri
#> 9 missouri is spelled correctly

#Method2
incorrect_MO_spell  <- c("old misosuri bank", "old misosuri bank", 
                         "southwest misosuri bank", "security bank of southwest misosuri",
                         "regional misosuri bank","old misosuri bank",
                         "first national bank of nevada misosuri","farmers state bank of northern misosuri", 
                         "missouri is spelled correctly")
incorrect_MO_spell <- as_tibble(incorrect_MO_spell)
colnames(incorrect_MO_spell) = c("Name")
incorrect_MO_spell <- incorrect_MO_spell |> 
  mutate(Name = str_replace(Name, "misosuri", "missouri"))
incorrect_MO_spell
#> # A tibble: 9 x 1
#>   Name                                   
#>   <chr>                                  
#> 1 old missouri bank                      
#> 2 old missouri bank                      
#> 3 southwest missouri bank                
#> 4 security bank of southwest missouri    
#> 5 regional missouri bank                 
#> 6 old missouri bank                      
#> 7 first national bank of nevada missouri 
#> 8 farmers state bank of northern missouri
#> 9 missouri is spelled correctly

Created on 2022-04-09 by the reprex package (v2.0.1)

1 Like

@FJCC , Thanks for your reply and suggestions. I appreciate it.

In my original working script, I tried what I thought was equivalent to both of these solutions, but my result ended up with the entire string being replaced with "Missouri", rather than just replacing the misspelling of Missouri.

I must have a typo in my original script code somewhere. I'm searching for it....

I KNEW this wasn't that hard. It's reassuring that I was in the ballpark, even if I didn't have the exact script.

Thanks!