Extracting string before and including the first appearance of a word (str_extract)

Hello,

I've been really struggling with extracting the body of water when given a column of station names in a data frame. Here's an example set of data.


df <- data.frame(Station = c("Santa Cruz Creek at Doe Ave", 
                             "Mendocino Creek below Oroville Reservoir", 
                             "Banos Stream along Foothill Drive", 
                             "San Diego Creek by San Diego Creek Trail", "San Mateo Creek at Creekside", 
                             "Los Angeles River below Santa Clara River"))


I've been using this piece of code to extract the body of water.



df %>% mutate(waterbody = str_extract(Station, "[\\w\\s]+(Creek|Stream|River)"))

Most of the time, it works pretty well in extracting the body of water's name and designation, but unfortunately, when there's more than one instance of creek/stream/river in the station name, it has a tendency to capture too much.

As you can see, it works well for the first three stations, but for the last three stations, it captures everything before the last Creek/Stream/River. I've really been struggling to find a solution to this issue and was wondering how I could fix my str_extract to obtain the proper information.

Try a non-greedy dot-star

df %>%
  mutate(waterbody = str_extract(Station, "^(.*?)(Creek|Stream|River)"))
##                                     Station         waterbody
## 1               Santa Cruz Creek at Doe Ave  Santa Cruz Creek
## 2  Mendocino Creek below Oroville Reservoir   Mendocino Creek
## 3         Banos Stream along Foothill Drive      Banos Stream
## 4  San Diego Creek by San Diego Creek Trail   San Diego Creek
## 5              San Mateo Creek at Creekside   San Mateo Creek
## 6 Los Angeles River below Santa Clara River Los Angeles River
1 Like

Thank you! This works perfectly.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.