I have this dataset and I want to delete the red squared portion in all rows. how can I do this? Thank you.
Hi @ridwna,
I would recommend using either gsub
or stringr::str_remove
. If you are looking to remove that text verbatim, something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS_processed_downsamp.bam.cip")
If you are wanting to remove everything after the initial ID (i.e. PGDX...
), something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS.+")
@FJCC But I got another problem, on this data I want to remove this red squared portion from all rows, how can I do this?
If you want to remove everything starting at the first underscore, you can to this. I invented a tiny data set for the illustration.
DF <- data.frame(WGS.ID = c("PGDX5881P_WGS_blah",
"PGDX5882P_WGS_blah"),
OtherColumn = 1:2)
DF
#> WGS.ID OtherColumn
#> 1 PGDX5881P_WGS_blah 1
#> 2 PGDX5882P_WGS_blah 2
DF$WGS.ID <- gsub("([^_]+)_.+", "\\1", DF$WGS.ID)
DF
#> WGS.ID OtherColumn
#> 1 PGDX5881P 1
#> 2 PGDX5882P 2
Created on 2022-03-18 by the reprex package (v2.0.1)
The regular expression ([^_]+)_.+"
means "one or more characters that are not an underscore, an underscore, one or more of any character".
([^_]+)
means "one or more characters that are not an underscore" and the parentheses define that as the first group, which will be used later.
The _ is simply an underscore.
.+
means "one or more of any character"
The replacement argument of gsub is \\1
, which means "use the first group defined in the regular expression".
@FJCC Thank you very much. It worked.
I am very happy.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.