Hi, amazing RStudio Community!
Please find below a reprex of my "issue". But first, let me explain:
- In
my_tbl
I have a column with text - of which the important parts are the numbers. - I want to manipulate that column such that I get a factor with just the numbers as shown on
target_tbl
.
I was able to do this, but with lots of steps and my guess is that there's a much better and faster way.
An important question I have is: how to extract the nth occurrence of a match with regex and stringr
?
library(tidyverse)
library(reprex)
# --- My data ---
my_tbl <- tribble(
~ my_text,
"1. Up to 15 USD",
"2. More than 15 USD and up to 50 USD",
"3. More than 50 USD and up to 100 USD",
"4. More than 100 USD and up to 250 USD",
"5. More than 250 USD and up to 500 USD",
"6. More than 500 USD"
)
# --- Desired output ---
target_tbl <- tribble(
~ my_text,
"<=15",
"+15-50",
"+50-100",
"+100-250",
"+250-500",
"> 500"
) %>% mutate(my_text = as_factor(my_text))
# --- my attempt ---
my_tbl %>%
mutate(
# Extract two or more digits. Concatenate a "+" at the beginning.
first_digits = str_extract(my_text, "\\d{2,}") %>% str_c("+", .),
# I want to extract the second group of digits, but have to remove the first one first
second_digits = str_remove(my_text, "\\d{2,}") %>% str_extract("\\d{2,}")
) %>%
# Then join the digits and separate by "-"
unite(col = "my_factor", first_digits, second_digits, sep = "-") %>%
# Change "NA" cases
mutate(my_factor = case_when(my_factor == "+15-NA" ~ "<=15",
my_factor == "+500-NA" ~ "> 500",
TRUE ~ my_factor),
# Convert to factor
my_factor = as_factor(my_factor)
)
#> # A tibble: 6 x 2
#> my_text my_factor
#> <chr> <fct>
#> 1 1. Up to 15 USD <=15
#> 2 2. More than 15 USD and up to 50 USD +15-50
#> 3 3. More than 50 USD and up to 100 USD +50-100
#> 4 4. More than 100 USD and up to 250 USD +100-250
#> 5 5. More than 250 USD and up to 500 USD +250-500
#> 6 6. More than 500 USD > 500
Created on 2020-10-18 by the reprex package (v0.3.0)
If anyone had any input on this... I'd appreciate it very much.
Thank you in advance!
Alexis