I have thousands of descriptions with numbers and text combined in a column EX: AB CD PINE_1_200_CL_EB_7X240 and BLS MAN_3_200_LD_BN_10X27_KD. In one column i need just the first number underscore and second number EX: 1_200 and 3_200. In a different column i need just the third number X and fourth number EX: 7X240 and 10X27. I have tried several different methods and it either extracts all of the digits or only some of the digits. I'm trying to specify a way to indicate "digit_digit" for the first column and "digitXdigit" for the second column.
TEXT <- data.frame(Strings= c("AB CD PINE_1_200_CL_EB_7X240",
"BLS MAN_3_200_LD_BN_10X27_KD"))
library(stringr)
TEXT$First <- str_extract(TEXT$Strings,"\\d+_\\d+")
TEXT$SECOND <- str_extract(TEXT$Strings,"\\d+X\\d+")
TEXT
#> Strings First SECOND
#> 1 AB CD PINE_1_200_CL_EB_7X240 1_200 7X240
#> 2 BLS MAN_3_200_LD_BN_10X27_KD 3_200 10X27
This worked for the vast majority of my data. I am having one problem though, there are a few data points that are "BLS MAN_3x6_200_LD_BN_10X27_KD" which in the first column is returning an "NA" and in the second column is returning "3x6". Not sure if there is a fix for this by being able to add in an additional statement to the first column extract such as str_extract(TEXT$Strings,"\d+\d+" & "\d+X\d+\_").
If not, is there a way to have the second column value be filled with NA ONLY IF the first column value is NA?
Please post the result of running my code. We seem to be getting different results from the same code. Run each line of my code and when you are done, copy the resulting console output.