I have this dataset and I want to delete the red squared portion in all rows. how can I do this? Thank you.
Hi @ridwna,
I would recommend using either gsub
or stringr::str_remove
. If you are looking to remove that text verbatim, something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS_processed_downsamp.bam.cip")
If you are wanting to remove everything after the initial ID (i.e. PGDX...
), something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS.+")
@FJCC But I got another problem, on this data I want to remove this red squared portion from all rows, how can I do this?
If you want to remove everything starting at the first underscore, you can to this. I invented a tiny data set for the illustration.
DF <- data.frame(WGS.ID = c("PGDX5881P_WGS_blah",
"PGDX5882P_WGS_blah"),
OtherColumn = 1:2)
DF
#> WGS.ID OtherColumn
#> 1 PGDX5881P_WGS_blah 1
#> 2 PGDX5882P_WGS_blah 2
DF$WGS.ID <- gsub("([^_]+)_.+", "\\1", DF$WGS.ID)
DF
#> WGS.ID OtherColumn
#> 1 PGDX5881P 1
#> 2 PGDX5882P 2
Created on 2022-03-18 by the reprex package (v2.0.1)
The regular expression ([^_]+)_.+"
means "one or more characters that are not an underscore, an underscore, one or more of any character".
([^_]+)
means "one or more characters that are not an underscore" and the parentheses define that as the first group, which will be used later.
The _ is simply an underscore.
.+
means "one or more of any character"
The replacement argument of gsub is \\1
, which means "use the first group defined in the regular expression".
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.