Phyloseq: phylum name [Thermi] - how to change it?

Hello,

I hope someone can help me. I am fairly new to R and I just finished my first coding.
Phyloseq gave me one phylum [Thermi] because of the square brackets it is not possible for me to subsample it (or do anything else with it).
Is there any method to change [Thermi] into Thermi?

Thank you!

This will seem like magic at first. Applying it to your object will depend on whether it's a freestanding character string, as shown below, or part of a data frame.

library(stringr)
target <- "[Thermi]"
target
#> [1] "[Thermi]"
pattern <- "[:alpha:]+"
pattern
#> [1] "[:alpha:]+"
str_extract(target,pattern)
#> [1] "Thermi"

Created on 2020-11-02 by the reprex package (v0.3.0.9001)

Thank you!

The magic worked with your example, but unfortunately not with my data. Usually that happens, when I did something wrong or somewhere is a typo, but I can't figure it out:

pattern <- "[:alpha:]+"
control <- phyloseq::subset_samples(gps_phylum, treatment ="Control")
control_otu <- data.frame(phyloseq::otu_table(control))
str_extract(control_otu, pattern)
#> [1] "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c"

Warning:
In stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
argument is not an atomic vector; coercing

head(control_otu)
#>[Thermi]                    0              0              0              0
#>Chloroflexi                 0              0              0              0
#>Actinobacteria              0             24              3              0

I thought that might be a problem. What does

str(control_otu)

display?

str(control_otu)
#>'data.frame':	12 obs. of  11 variables:
 $ Hauck.001.1A16: num  0 0 0 6 155 ...
 $ Hauck.002.1B16: num  0 0 24 0 44 0 0 0 2 654 ...
 $ Hauck.003.1C16: num  0 0 3 0 133 ...
 $ Hauck.004.2A16: num  0 0 0 8 88 ...
 $ Hauck.005.2B16: num  0 5 0 203 101 ...
 $ Hauck.006.2C16: num  0 0 9 129 105 ...
 $ Hauck.007.3A16: num  0 0 0 0 69 ...
 $ Hauck.008.3B16: num  0 0 25 3 60 ...
 $ Hauck.009.3C16: num  0 0 0 4 37 ...
 $ Hauck.011.4B16: num  0 0 2 374 109 ...
 $ Hauck.012.4C16: num  2 0 7 0 67 0 0 0 6 878 ...
``

I had hoped this would show the name of the column with the phyla. Assuming there is one by that name, this works (reverses the selection logic now to exclude punctuation)

suppressPackageStartupMessages({
  library(dplyr)
  library(stringr)
  })
target <- "[Thermi]"
pattern <- "[:punct:]"
my_data <- data.frame(phyla = c("[Thermi]", "Bryozoa"))
my_data %>% mutate(phyla = str_remove_all(phyla,pattern))
#>     phyla
#> 1  Thermi
#> 2 Bryozoa

Created on 2020-11-02 by the reprex package (v0.3.0.9001)

I am so sorry! It didn't work.

  library(dplyr)
  library(stringr)
})
pattern <- "[:punct:]"
control_otu <- data.frame(phyla = c("[Thermi]", "Actinobacteria", "Bacteroides", "Chlamydiae",
                                "Chloroflexi", "Cyanobacteria", "Firmicutes", "Planctomyces", "Proteobacteria",
                                "Tenericutes", "TM6", "Verrucomicrobia"))
control_otu %>% mutate(phyla = str_remove_all(phyla,pattern))
#> phyla
1           Thermi
2   Actinobacteria
3      Bacteroides
4       Chlamydiae
5      Chloroflexi
6    Cyanobacteria
7       Firmicutes
head(control_otu)
#> head(control_otu)
           phyla
1       [Thermi]
2 Actinobacteria
3    Bacteroides
4     Chlamydiae
5    Chloroflexi
6  Cyanobacteria

control_otu before:

Chloroflexi                 0              0              0              0
Actinobacteria              0             24              3              0
Cyanobacteria               6              0              0              8
Proteobacteria            155             44            133             88

and after the [Thermi]-to-Thermi procedure:

           phyla
1         Thermi
2 Actinobacteria
3    Bacteroides
4     Chlamydiae
5    Chloroflexi
6  Cyanobacteria

I noticed, that the [Thermi] show up also in the step before:

gps_phylum <- phyloseq::tax_glom(gps, "Phylum")
phyloseq::taxa_names(gps_phylum) <- phyloseq::tax_table(gps_phylum)[, "Phylum"]
phyloseq::otu_table(gps_phylum)[1:5, 1:5]

#Melt and plot
phyloseq::psmelt(gps_phylum) %>%
  ggplot(data = ., aes(group=group, x = group, y = Abundance)) +
  geom_boxplot(outlier.shape  = NA) +
  geom_jitter(aes(color = OTU), height = 0, width = .2) +
  labs(x = "", y = "Abundance\n") +
  facet_wrap(~ OTU, scales = "free")

This code gives me already otu [Thermi] in the plot.

It did. It just needs to be saved:

control_otu %>% mutate(phyla = str_remove_all(phyla,pattern)) -> control_otu
1 Like

Worked!
But now I'm stuck at the next step:

control_otu <- control_otu %>%
  t(.) %>%
  as.data.frame(.) %>%
  mutate(Other = Thermi + Chlamydiae + Chloroflexi + Cyanobacteria + Verrucomicrobia) %>%
  dplyr::select(-Thermi, -Chlamydiae, -Chloroflexi, -Cyanobacteria, -Verrucomicrobia)
#Error: Problem with `mutate()` input `Other`.
#x Object 'Thermi' not found
#ℹ Input `Other` is `Thermi + Chlamydiae + Chloroflexi + Cyanobacteria + #Verrucomicrobia`.
#Run `rlang::last_error()` to see where the error occurred.

control_otu before:

[Thermi]                    0              0              0              0
Chloroflexi                 0              0              0              0
Actinobacteria              0             24              3              0
Cyanobacteria               6              0              0              8
Proteobacteria            155             44            133             88

and after the [Thermi]-to-Thermi procedure:

1         Thermi
2 Actinobacteria
3    Bacteroides
4     Chlamydiae
5    Chloroflexi
6  Cyanobacteria

I'm not sure how you intend to handle these. One way is to simply filter them out

suppressPackageStartupMessages({
  library(dplyr)
  library(stringr)
})

pattern <- "[:punct:]"
control_otu <- data.frame(phyla = c("[Thermi]", "Actinobacteria", "Bacteroides", "Chlamydiae", "Chloroflexi", "Cyanobacteria", "Firmicutes", "Planctomyces", "Proteobacteria", "Tenericutes", "TM6", "Verrucomicrobia"))
control_otu %>% mutate(phyla = str_remove_all(phyla, pattern)) -> control_otu

the_others <- c("Thermi", "Chlamydiae", "Chloroflexi", "Cyanobacteria", "Verrucomicrobia")
control_otu %>% filter(!(phyla %in% the_others))
#>            phyla
#> 1 Actinobacteria
#> 2    Bacteroides
#> 3     Firmicutes
#> 4   Planctomyces
#> 5 Proteobacteria
#> 6    Tenericutes
#> 7            TM6

Created on 2020-11-02 by the reprex package (v0.3.0.9001)

If you want to retain the records, but just classify those taxa to "Other"

suppressPackageStartupMessages({
  library(dplyr)
  library(stringr)
})

pattern <- "[:punct:]"
control_otu <- data.frame(phyla = c("[Thermi]", "Actinobacteria", "Bacteroides", "Chlamydiae", "Chloroflexi", "Cyanobacteria", "Firmicutes", "Planctomyces", "Proteobacteria", "Tenericutes", "TM6", "Verrucomicrobia"))
control_otu %>% mutate(phyla = str_remove_all(phyla, pattern)) -> control_otu

the_others <- c("Thermi", "Chlamydiae", "Chloroflexi", "Cyanobacteria", "Verrucomicrobia")

control_otu %>% mutate(phyla = ifelse(phyla %in% the_others,"Other",phyla))
#>             phyla
#> 1           Other
#> 2  Actinobacteria
#> 3     Bacteroides
#> 4           Other
#> 5           Other
#> 6           Other
#> 7      Firmicutes
#> 8    Planctomyces
#> 9  Proteobacteria
#> 10    Tenericutes
#> 11            TM6
#> 12          Other

Created on 2020-11-02 by the reprex package (v0.3.0.9001)

In this step the data is not only selected, but also converted:

control_otu <- control_otu %>%
  t(.) %>%
  as.data.frame(.) %>%
  mutate(Other = Thermi + Chlamydiae + Chloroflexi + Cyanobacteria + Verrucomicrobia) %>%
  dplyr::select(-Thermi, -Chlamydiae, -Chloroflexi, -Cyanobacteria, -Verrucomicrobia)
head(control_otu)
#  [Thermi] Chloroflexi Actinobacteria Cyanobacteria Proteobacteria Chlamydiae
1        0           0              0             6            155          0
2        0           0             24             0             44          0
3        0           0              3             0            133          0

After mutating and selecting comes the HMP test:

#HMP test
group_data <- list(control_otu, cocci_otu, ST_otu, both_otu)

xdc <- Xdc.sevsample(group_data)
xdc
1 - pchisq(.2769004, 5)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.