isolating label

Hi,
currently I am using the following code to extract "phyla" part of a long label in a new column of dataset ITS_counts.

ITS_counts3 <- ITS_counts |> mutate(Phyla = str_extract(taxonomy, "(?<=;p__).+;c"))

this allows me to isolate the part of the taxonomy column that I want, but leaves the ;c on the end, which I want to get rid of. How would I do this? Thanks.

The regular expression [^;]+ means "one or more characters that are not a semicolon".

library(stringr)
taxonomy <- "sdfljfsldj;p__ThePhylum;c__lskdflsdjf"
str_extract(taxonomy, "(?<=;p__)[^;]+")
#> [1] "ThePhylum"

Created on 2023-04-08 with reprex v2.0.2

that has worked thank you!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.