str_replace_all() and its arguments

I have a question:
The rules for the arguments of str_replace_all() are not fully clear to me. Because I would like to find different patterns in a character vector, truncate the match and replace the match with the first part of the match:
For example:
(words <- c( "axapplexe", "baxanaxanaxa", "coxocoxonuxut", "froxoppylaxand"))
replacement=unlist(str_split("\\1", "x"))[1])

with the result hopefully being c("apple", "banana", "coconut", "froppyland")

so i want to split the match at a certain letter and use the part before that certain letter as replacement. But this doesnt seem to work. Does anyone have a tip about this? I have also tried for example other replacements like replacement = tolower("\\1"), which also didnt work.

Thanks a lot

1 Like

I suspect I am not understanding exactly what you need to do. To get from your initial text to the desired result, I would do this

words <- c( "axapplexe", "baxanaxanaxa", "coxocoxonuxut", "froxoppylaxand")
str_replace_all(string = words, pattern = "([aeiou])x[aeiou]", replacement = "\\1")
[1] "apple"      "banana"     "coconut"    "froppyland"
1 Like

Ok this seems to work somehow, but isn't the "\\1" regular expression the sign for "the last match", so in the first case then "axa" ? why does this replace the match with only the first letter of the match?

The regular expression \\1 means "the first part of the pattern that is enclosed in parentheses". That is the first vowel in the pattern I used. If my pattern had been [aeiou](x)[aeiou], the \\1 would refer to the x. If there are two parts of the pattern enclosed in parentheses, you can use \\1 and \\2 to refer to them.

1 Like

ok, and how is it if i use the or | operator? lets say "([aeiou])x[aeiou] | ([bcd])x[bcd] | (y)k " ? would this mean whatever is matched, the first thing in parentheses can be referred to as "\\1"?

I have never tried anything like that before but the following results suggest that two items within parentheses separated by | are still numbered as \\1 and \\2. However, the unmatched one is treated as empty in the returned string, so placing them next to each other effectively serves as \\1 OR \\2.

[1] "a#zzz" "#bzzz"
[1] "azzz" "bzzz

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.