df<-data.frame(products=c('1 kg pears','appears to be a dog','a pear','apples red','red apple','1 kg
anana','1 kg banana'))
and I have a vector of products:
vector<-c('pear','apple','banana','anana')
I need to classify each product in df, based on the words in the vector. I was thinking about something like
df$class<-NA
for(i in 1:length(vector)){
rows_product<-which(grepl(vector[[i]],df[[1]]))
df$class[rows_product]<-vector[[i]]
}
But I realized I need to look for the words to start like the words in the vector, so if I am looking to match 'pear' it does not match 'appears', or if I am looking for 'anana' does not match 'banana'.
There is any way I can do this? I think there might be a way to do it with regex but i could not find how.
Hi! are you looking for something like this? Use '\s' to match any preceding white spaces. Below I omitted "banana" from the strings to show that it 'anana' does indeed not match 'banana' as you want.
JW
library(tidyverse)
df<-data.frame(products=c('1 kg pears',
'appears to be a dog',
'a pear','apples red',
'red apple',
'1 kg anana',
'1 kg banana'))
# add \s to match any preceding white space (note: extra \ to escape...)
vector<-c('\\spear','apple','\\sanana') #took out anana
df %>% mutate(match = str_detect(products , paste(vector, collapse = "|")))
# products match
# 1 1 kg pears TRUE
# 2 appears to be a dog FALSE
# 3 a pear TRUE
# 4 apples red TRUE
# 5 red apple TRUE
# 6 1 kg anana TRUE
# 7 1 kg banana FALSE