Extract strings using fuzzy LR patterns in R

I am struggling for long time.

I manage to extract everything between my Right and Left patterns in a string as you can see in the following example.

library(tidyverse)

data=c("everything will be ok one day")

str_extract(string = data, pattern = "(?<=thing).*(?=ok one)")
#> [1] " will be "

Created on 2022-01-26 by the reprex package (v2.0.1)

As you notice in the code, I extract everything between "thing" and "ok one".

I need to incorporate the possibility of mismatches inside these patterns.
I want to allow a maximum of two mismatches and consider indels and insertions.


PS:
This is just a simplified example. My actual data does not contain gaps, and it's complicated. I am looking forward to receiving your help and guidance.

I haven't tried these, but there is base::agrep() for approximate string matching, plus packages like fuzzyjoin that you could try: GitHub - dgrtwo/fuzzyjoin: Join tables together on inexact matching

2 Likes

This thread may be helpful.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.