Hello,
I have a tibble with a variable containing a code assigned to two variable containing species names of fungi.
tibble(otu_id = c("4875_0", "4875_4", "4875_3", "4875_32", "4875_1", "4875_5", "4875_9", "4875_8", "4875_7", "4875_12"),
genus = c("Cladosporium_7681", "Vishniacozyma_813272", "Phoma_9358", "Fomitopsis_17612", "Coniochaeta_1209", "Resinicium_18453", "Alternaria_7106", "Heterobasidion_17745", "Botrytis_7435", "Sporobolomyces_10025"),
species = c("Cladosporium_cladosporioides_294915", "Vishniacozyma_victoriae_813285", "Phoma_herbarum_171008", "Fomitopsis_pinicola_101927", NA, "Resinicium_bicolor_338261", NA, "Heterobasidion_annosum_119859", "Botrytis_cinerea_217312", "Sporobolomyces_lactosus_357887"))
I would like to perform the following tasks:
remove the code which follows the names.
Example: Cladosporium_7681 must be just Cladosporium.
add ";otu_id_names" to each names in both the column of "genus" and "species".
Example: Cladosporium_7681 must be ";0Cladosporium"
or Heterobasidion_annosum_119859 must be ";8Heterobasidion_annosum"
replace NA in species column with the names present in genus.
Example: row number 5 must have ";1Coniochaeta" in species variable.
Thanks for the help.
library(tidyverse)
sample_df <- tibble(otu_id = c("4875_0", "4875_4", "4875_3", "4875_32", "4875_1", "4875_5", "4875_9", "4875_8", "4875_7", "4875_12"),
genus = c("Cladosporium_7681", "Vishniacozyma_813272", "Phoma_9358", "Fomitopsis_17612", "Coniochaeta_1209", "Resinicium_18453", "Alternaria_7106", "Heterobasidion_17745", "Botrytis_7435", "Sporobolomyces_10025"),
species = c("Cladosporium_cladosporioides_294915", "Vishniacozyma_victoriae_813285", "Phoma_herbarum_171008", "Fomitopsis_pinicola_101927", NA, "Resinicium_bicolor_338261", NA, "Heterobasidion_annosum_119859", "Botrytis_cinerea_217312", "Sporobolomyces_lactosus_357887"))
sample_df %>%
mutate_at(vars(genus, species), ~ str_remove(., pattern = "_\\d+$")) %>%
mutate(species = if_else(is.na(species), genus, species)) %>%
mutate_at(vars(genus, species), ~ paste0(";", str_extract(otu_id, "(?<=_)\\d+$"), .))
#> # A tibble: 10 x 3
#> otu_id genus species
#> <chr> <chr> <chr>
#> 1 4875_0 ;0Cladosporium ;0Cladosporium_cladosporioides
#> 2 4875_4 ;4Vishniacozyma ;4Vishniacozyma_victoriae
#> 3 4875_3 ;3Phoma ;3Phoma_herbarum
#> 4 4875_32 ;32Fomitopsis ;32Fomitopsis_pinicola
#> 5 4875_1 ;1Coniochaeta ;1Coniochaeta
#> 6 4875_5 ;5Resinicium ;5Resinicium_bicolor
#> 7 4875_9 ;9Alternaria ;9Alternaria
#> 8 4875_8 ;8Heterobasidion ;8Heterobasidion_annosum
#> 9 4875_7 ;7Botrytis ;7Botrytis_cinerea
#> 10 4875_12 ;12Sporobolomyces ;12Sporobolomyces_lactosus
Created on 2020-11-20 by the reprex package (v0.3.0.9001)
2 Likes
I still miss the " " symbols at the start and end of the character. I would like to display them.
Example ```
;0Cladosporium_cladosporioides must been showed as ";0Cladosporium_cladosporioides"
And another task:
4) after the number add an underscore
Example ";0Cladosporium_cladosporioides" must be ";0_Cladosporium_cladosporioides"
I think you already have a good starting point to finish the job yourself, good luck!
2 Likes
system
Closed
December 11, 2020, 11:53am
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.