Strictly answering your question @Hassanhijazi , what you are looking for is a regular expression with a negative lookahead:
Negative lookahead is indispensable if you want to match something not followed by something else.
Which is exactly what you are asking for, "match something" (e.g. '98'), "not followed by something else" (e.g. '.1').
So your first regular expression must match '98', but must include a "negative lookahead" to explicit that the matching pattern '98' must not be followed by '.1'. Therefore, your "lookup" vector must look like:
lookup <- c('98(?!\\.1)' = 'A', '98.1' = 'B', '56' = 'C')
Let's analyze the "weird element" here, the '98(?!\\.1)'
string:
98
is the pattern you want to match; without further specification, it will match that pattern, no matter what.
(?!<string>)
is the "negative lookahead" assertion; it specifies that "the previous pattern must match anything "except when it is followed by <string>
.
\\.1
is <string>
, the negative lookahead pattern; it specifies what the matching pattern must not be followed by (without making it part of the match).
\\.
is the pattern that matches a dot (.
). As a dot itself is a special character in regular expressions, it must be escaped by a backslash.
\\
is a "backslash"; as the backslash itself is a escape character in the R string syntax, it must be also escaped.
Here you have a reprex with the solution:
library(stringr)
str <- "98-98.1-56"
lookup <- c('98(?!\\.1)' = 'A', '98.1' = 'B', '56' = 'C')
str |> str_replace_all(lookup)
#> [1] "A-B-C"
There are of course other solutions; for example, you can specify that the '98' must be followed by a dash (-
) character with a "positive lookahead", (i.e., '98(?=-)'
), but I think the solution I proposed is the one that most meaningfully represents "98 as a whole number" (because it explicitly says "not followed by a decimal marker and the digit '1'
), and also it's the easiest to generalize to other cases (e.g. "any whole number, not followed by a decimal marker and another digit").
Hope it helps!