Hello I have a sample dataset as follows.
# Load the tidyverse package
library(tidyverse)
# Create the dataset
id <- 1:6
model <- c("0RB3211", NA, "0RB4191",
NA, "0RB4033", NA)
UPC <- c("805289119081", "DK_0RB3447CP_RBCP 50", "8053672006360",
"Green_Classic_G-15_Polar_1.67_PREM_SV", "805289044604",
"DK_0RB2132CP_RBCP 55")
df <- tibble(id, model, UPC)
For the missing values in 'model' column, if its corresponding UPC starts with DK, I need to extract 7-digit number and letter after first underscore and then put it into the column 'model'. For example, for the second row, I need to put "0RB3447" into the 'model' column, for the fourth row, I need to delete the whole row and for the last row, I need to put "0RB2132" into the 'model' column.
Here is my expected result:
# Manipulate the dataset
df_cleaned <- df %>%
rowwise() %>%
mutate(model = ifelse(is.na(model) & str_detect(UPC, "^DK"),
str_extract(UPC, "\\d{2}RB\\d{4}"),
model)) %>%
ungroup() %>%
filter(!(is.na(model) & str_detect(UPC, "[^0-9]")))
# Display the cleaned dataset
print(df_cleaned)
However, it only returns this wrong result.
How can I modify my previous code?
Really appreciate it.