Hello everyone, would you like to convert this function in R language? Because I want to match text body by it's keyword then labeled it. Here the function with Excel:
The solution below does not use str_detect and case_when, but it does categorize the expenses as in the example. The approach takes each expense text string, separates it into a row for each word, joins the categories, and then keeps those rows with a match.
library(tidyverse)
expenses = data.frame(
Expense = c('DEBIT PURCHASE AT SHELL', 'NETFLIX Payment', 'MERCHANT KROGER', 'CENTRAL WATER PAYMENT')
)
expenses
#> Expense
#> 1 DEBIT PURCHASE AT SHELL
#> 2 NETFLIX Payment
#> 3 MERCHANT KROGER
#> 4 CENTRAL WATER PAYMENT
categories = data.frame(
Keyword = c('chevron', 'costco', 'kroger', 'netflix', 'shell', 'water'),
Category = c('Auto', 'Groceries', 'Groceries', 'Entertainment', 'Auto', 'Utilities')
)
categories
#> Keyword Category
#> 1 chevron Auto
#> 2 costco Groceries
#> 3 kroger Groceries
#> 4 netflix Entertainment
#> 5 shell Auto
#> 6 water Utilities
# categorize the expenses
out = expenses %>%
mutate(Keyword = Expense) %>%
separate_rows(Keyword, sep = ' ') %>%
# make lowercase to match the keywords in cateories data frame
mutate(Keyword = tolower(Keyword)) %>%
left_join(categories) %>%
filter(!is.na(Category)) %>%
select(-Keyword)
#> Joining, by = "Keyword"
out
#> # A tibble: 4 × 2
#> Expense Category
#> <chr> <chr>
#> 1 DEBIT PURCHASE AT SHELL Auto
#> 2 NETFLIX Payment Entertainment
#> 3 MERCHANT KROGER Groceries
#> 4 CENTRAL WATER PAYMENT Utilities
Created on 2022-08-31 with reprex v2.0.2.9000v
Hi,
Just to answer the title question:
#Using str_detect
#*****************
library(stringr)
#Test if the word "some" is present
text = "this is some text"
str_detect(text, "some")
#> [1] TRUE
#Use RegEx to see if the string ends with a period
text = "this is some text"
str_detect(text, "\\.$")
#> [1] FALSE
#Using case_when
#*****************
library(dplyr)
x = 80
case_when(
x < 50 ~ "low",
x >= 50 & x < 100 ~ "medium",
TRUE ~ "high"
)
#> [1] "medium"
Created on 2022-08-31 by the reprex package (v2.0.1)
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.