Regex Replacement

I have a string "545 1st ave", and I want to be able to remove the "st" after the 1. So far, I've been doing str_replace("545 1st ave", "[0-9]st", " ") but the output is "545 ave". I want the output to be "545 1 ave". How do I correctly do the regex?

You can use a positive look-behind or just remove "st"

str_remove("545 1st ave", "(?<=[0-9])st")
#> [1] "545 1 ave"
str_remove("545 1st ave", "st")
#> [1] "545 1 ave"

On a tablet so I can't test it, but what I like to use (probably because I used to be a perl person when perl was a thing) would be something like
str_replace("545 1st ave", " ([0-9])st ", " \\1 ")
where the parens say "save this for later" and \\1 says "put in the first thing you saved". I also wrapped the expression in blanks so that it will only key off "number"st as an isolated thing. Can't remember if the parens need to be protected with backslash or not.
Note that for 21st, 31st, etc could do ([0-9]+)st or (\\d+)st

The \\b special marker matches boundaries between word characters ([a-zA-Z0-9_]) and a non-word characters. This includes the start and end of strings:

str_replace(c("545 1st ave", "1st ave, 545", "545 1st"), "\\b([0-9])st\\b", "\\1")
# [1] "545 1 ave"  "1 ave, 545" "545 1"

Good point. Good regular expression writing is something that requires practise and care. It is distressingly easy for an expression to hit a string you didn't know about and make changes you did not intend. The more the target can be restricted and defined, the better. Using something like grep to display the targets first can be useful.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.