Hi community
Im want to extract the specific string of this data from web scraping.
Im need only the string that start with G and next are numbers, some one could finish with letter.
library(tidyverse)
datos_pi <- structure(list(num_pi = c("PI 093817", "PI 113367", "PI 131426",
"PI 151393", "PI 299387", "PI 416424"),
accession_name1 = c("G3 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture | | 65-033-00223 | Type: Other or unclassified name",
"G19 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture | | No. 305 | Type: Donor identifier",
"GRANDA OHNE FADEN | Type: Local name | translates: \"LARGE ONE WITHOUT STRING\" | | No. 2756 | Type: Developer identifier",
"Guarzo de Arbol | Type: Local name | translates: \"BIRD OF THE TREE\" (a climbing bean) | | No. 9 | Type: Donor identifier | | G18717 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture",
"Preta Rajada | Type: Local name | translates: \"BLACK GUST(of wind)\"",
"G14095 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture | | 65-153-01735 | Type: Donor identifier | Evans, K.H. USDA Regional Pulse Improvement Project"
), accession_name2 = c("V-2223 | Type: Other or unclassified name | | G19957 | Type: CGIAR International Center Identifier | Group: CIAT | possibly a selection from PI 93817. | International Center for Tropical Agriculture",
"ASIATIC EXPEDITION NO.305 | Type: Duplicate accession name",
"G2938 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture",
"G18717A | Type: CGIAR International Center Identifier | Group: CIAT | a CIAT selection from PI 151393 | International Center for Tropical Agriculture | | G18717B | Type: CGIAR International Center Identifier | Group: CIAT | a CIAT selection from PI 151393 | International Center for Tropical Agriculture",
"G25187 | Type: CGIAR International Center Identifier | Group: CIAT | Centro Internacional de Agricultura Tropical | International Center for Tropical Agriculture",
"G14095A | Type: CGIAR International Center Identifier | Group: CIAT | a CIAT selection from PI 416424 | International Center for Tropical Agriculture | | Turkey Adapazari 1735 | Type: CGIAR International Center Identifier | Group: CIAT | Name came from the CIAT database"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
datos_pi$name1 <- str_extract(datos_pi$accession_name1, "G[:digit:]");datos_pi
datos_pi$name2 <- str_extract(datos_pi$accession_name2, "G[:digit:]");datos_pi # dont get all digits
# name1 name2
# <chr> <chr>
#1 G3 G1
#2 G1 NA
#3 NA G2
#4 G1 G1
#5 NA G2
#6 G1 G1
#Disered output
# name1 name2 name3 # put a column for each string
# G3 G19957
# G19 NA
# NA G2938
# G18717 G18717A G18717B
# NA G25187
# NA G2938
# G3 G19957
The idea is obtain any convination that start with G.
This was the options:
G1
G12
G123
G1234
G12345
G12345A # or any letter.
Tnks!