regex str_extract assistance

hi I have been struggling to make a regex pattern for the following sample data

data                data_manip

cat_loves_dog           dog
dog_love_cat            cat
me_ow_sith_love        love
animal_cat_do           "" (empty but not NA)
dog_love_kiss           kiss
monkey_see_do         "" empty but not NA

I want to make a new column that keep all the words after the last "_" (hyphen). However, if the string ends with the word "do" I want to make sure it is just empty "" like it is a blank cell. any help is appreciated thanks.

I do not know of a way to use str_extract but to return an empty string in some cases. Is it sufficient to wrap str_extract() in str_remove()?

library(stringr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(Text= c("cat_loves_dog",
                      "dog_love_cat",
                      "me_ow_sith_love",
                      "animal_cat_do",
                      "dog_love_kiss",
                      "monkey_see_do"))
DF <- DF |> 
  mutate(data_manip = str_remove(str_extract(Text, "(?<=_)[^_]+$"),"^do$"))
DF
#>              Text data_manip
#> 1   cat_loves_dog        dog
#> 2    dog_love_cat        cat
#> 3 me_ow_sith_love       love
#> 4   animal_cat_do           
#> 5   dog_love_kiss       kiss
#> 6   monkey_see_do

Created on 2022-08-29 with reprex v2.0.2

3 Likes

this makes lot of sense. thank u Fjcc

Hi.. I have a question. is it possible to keep the "_" in cases where it does not end in "do"?

DF
#>              Text data_manip
#> 1   cat_loves_dog        _dog
#> 2    dog_love_cat       _ cat
#> 3 me_ow_sith_love      _ love
#> 4   animal_cat_do           
#> 5   dog_love_kiss       _kiss
#> 6   monkey_see_do
library(stringr)
library(dplyr)

DF <- data.frame(Text= c("cat_loves_dog",
                          "dog_love_cat",
                          "me_ow_sith_love",
                          "animal_cat_do",
                          "dog_love_kiss",
                          "monkey_see_do"))
DF <- DF |> 
   mutate(data_manip = str_remove(str_extract(Text, "_[^_]+$"),"^_do$"))
DF
             Text data_manip
1   cat_loves_dog       _dog
2    dog_love_cat       _cat
3 me_ow_sith_love      _love
4   animal_cat_do           
5   dog_love_kiss      _kiss
6   monkey_see_do
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.