Extract strings using fuzzy LR patterns in R

I am struggling for long time.

I manage to extract everything between my Right and Left patterns in a string as you can see in the following example.


data=c("everything will be ok one day")

str_extract(string = data, pattern = "(?<=thing).*(?=ok one)")
#> [1] " will be "

Created on 2022-01-26 by the reprex package (v2.0.1)

As you notice in the code, I extract everything between "thing" and "ok one".

I need to incorporate the possibility of mismatches inside these patterns.
I want to allow a maximum of two mismatches and consider indels and insertions.

This is just a simplified example. My actual data does not contain gaps, and it's complicated. I am looking forward to receiving your help and guidance.

I haven't tried these, but there is base::agrep() for approximate string matching, plus packages like fuzzyjoin that you could try: GitHub - dgrtwo/fuzzyjoin: Join tables together on inexact matching


This thread may be helpful.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.