separating a column into two columns based on the last occurrence of the separator

smrts · January 12, 2022, 10:28am

I have a column (the left cotext in a concordance list) which I would like to separate into two columns based on a tag, say, "QQ". However, some rows contain more than one of these tags. I want to delete everything on the left of the final QQ in each cell, but by default the rows are split at the first QQ. Is there a way to make it select the final occurrence of the separator?

xvalda · January 12, 2022, 11:23am

Hi @smrts , welcome to the community.

This should be in the form of:

separate(data, col, into, sep = "QQ(?!.*QQ)")

The regex reads: QQ not followed by QQ (with any character zero or more times in between).
A couple of examples:

text1 <- "12345QQ78910"
text2 <- "123QQjabcQQd fQQ hijQQ789"
str_split(c(text1, text2), "QQ(?!.*QQ)")

#> [[1]]
#> [1] "12345" "78910"
#> 
#> [[2]]
#> [1] "123QQjabcQQd fQQ hij" "789"

I guess you'll know how to use separate(), if any problem post a reprex (your code and example data).
Hope it helps.

smrts · January 12, 2022, 11:37am

Thank you very much! The regex was all I needed, really. Works like a charm!

system · January 19, 2022, 11:38am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.